fix(tenancy): scope rerank model sync

fix(cloud): unblock tenant CI and enforce knowledge quotas
fix(security): resolve M-1, M-2, M-3 security findings
2026-08-03 10:06:06 +00:00 · 2026-07-30 13:25:28 +00:00 · 2026-07-30 13:20:07 +00:00 · 2026-07-30 04:46:31 +00:00 · 2026-07-30 04:40:37 +00:00 · 2026-07-30 04:31:49 +00:00
506 changed files with 82205 additions and 13612 deletions
@@ -37,6 +37,10 @@ jobs:
        working-directory: web
        run: pnpm install --frozen-lockfile

+      - name: Run frontend unit tests
+        working-directory: web
+        run: pnpm test:unit
+
      - name: Install Playwright browsers
        working-directory: web
        run: pnpm exec playwright install --with-deps chromium
@@ -44,7 +44,9 @@ jobs:
    runs-on: ubuntu-latest
    services:
      postgres:
-        image: postgres:16
+        # Release migration 0013 installs the pgvector extension in the shared
+        # business database; CI must exercise the same extension availability.
+        image: pgvector/pgvector:pg16
        env:
          POSTGRES_USER: langbot
          POSTGRES_PASSWORD: langbot
@@ -75,4 +77,10 @@ jobs:
      - name: Run PostgreSQL migration tests
        env:
          TEST_POSTGRES_URL: postgresql+asyncpg://langbot:langbot@localhost:5432/langbot_test
-        run: uv run pytest tests/integration/persistence/test_migrations_postgres.py -q --tb=short
+        run: >-
+          uv run pytest
+          tests/integration/persistence/test_migrations_postgres.py
+          tests/integration/persistence/test_pgvector_postgres.py
+          tests/integration/persistence/test_release_migration_postgres.py
+          tests/integration/persistence/test_plugin_identity_migration.py
+          -q --tb=short
@@ -178,6 +178,12 @@ In this repo:
 - `pkg/provider/tools/loaders/native.py`, `mcp_stdio.py`, and skill loaders depend on Box availability.
 - `pkg/skill/manager.py` loads skills from the Box runtime, falling back to local `data/skills` when needed.

+Durable Box Workspace storage is shared across placement generations, but
+sandbox sessions and managed processes are generation-scoped. LangBot validates
+the current execution binding before an MCP stdio relay attach and sends the
+Workspace/generation binding in authenticated headers, so a placement cutover
+retires stale processes and closes already-attached relays.
+
 In `langbot-plugin-sdk`:

 - `src/langbot_plugin/box/server.py` implements `lbp box` and the WebSocket endpoints on `:5410`.
@@ -38,7 +38,7 @@ COPY --from=node /app/web/dist ./web/dist
 COPY --from=nsjail-build /usr/local/bin/nsjail /usr/local/bin/nsjail

 RUN apt-get update \
-    && apt-get install -y --no-install-recommends gcc ca-certificates curl gnupg \
+    && apt-get install -y --no-install-recommends gcc ca-certificates curl git gnupg \
    # nsjail runtime libraries (the build toolchain stays in the nsjail-build
    # stage; only these shared libs are needed to execute the binary).
    && apt-get install -y --no-install-recommends libprotobuf32 libnl-route-3-200 \
@@ -63,8 +63,8 @@ RUN apt-get update \
    && rm -f /tmp/nodesource_setup.sh \
    && python -m pip install --no-cache-dir uv \
    && uv sync \
-    && apt-get purge -y --auto-remove curl gnupg \
+    && apt-get purge -y --auto-remove curl git gnupg \
    && rm -rf /var/lib/apt/lists/* \
    && touch /.dockerenv

-CMD [ "uv", "run", "--no-sync", "main.py" ]
+CMD [ "uv", "run", "--no-sync", "main.py" ]
@@ -92,6 +92,17 @@ docker compose --profile all up -d

 ---

+## Live Demo
+
+**Try it now:** https://demo.langbot.dev/
+
+- Email: `demo@langbot.app`
+- Password: `langbot123456`
+
+_Note: Public demo environment. Do not enter sensitive information._
+
+---
+
 ## Supported Platforms

 | Platform | Status | Notes |
@@ -167,17 +178,6 @@ LangBot is **agent-friendly by design** — your coding agents (Claude Code, Cod

 ---

-## Live Demo
-
-**Try it now:** https://demo.langbot.dev/
-
- Email: `demo@langbot.app`
- Password: `langbot123456`
-
-_Note: Public demo environment. Do not enter sensitive information._
-
---
-
 ## Community

 [![Discord](https://img.shields.io/discord/1335141740050649118?logo=discord&label=Discord)](https://discord.gg/wdNEHETs87)
@@ -186,12 +186,6 @@ _Note: Public demo environment. Do not enter sensitive information._

 ---

-## Star History
-
-[![Star History Chart](https://api.star-history.com/svg?repos=langbot-app/LangBot&type=Date)](https://star-history.com/#langbot-app/LangBot&Date)
-
---
-
 ## Contributors

 Thanks to all [contributors](https://github.com/langbot-app/LangBot/graphs/contributors) who have helped make LangBot better:
@@ -92,6 +92,16 @@ docker compose --profile all up -d

 ---

+## 在线演示
+
+**立即体验：** https://demo.langbot.dev/
+- 邮箱：`demo@langbot.app`
+- 密码：`langbot123456`
+
+*注意：公开演示环境，请不要在其中填入任何敏感信息。*
+
+---
+
 ## 支持的平台

 | 平台 | 状态 | 备注 |
@@ -170,16 +180,6 @@ docker compose --profile all up -d

 ---

-## 在线演示
-
-**立即体验：** https://demo.langbot.dev/
- 邮箱：`demo@langbot.app`
- 密码：`langbot123456`
-
-*注意：公开演示环境，请不要在其中填入任何敏感信息。*
-
---
-
 ## 为 AI Agent 而生 🤖

 LangBot **从设计上就对 Agent 友好** —— 你的编码 Agent（Claude Code、Codex、Copilot、Cursor 等）可以一等公民般地操作、扩展和部署 LangBot：
@@ -203,12 +203,6 @@ LangBot **从设计上就对 Agent 友好** —— 你的编码 Agent（Claude C

 ---

-## Star 趋势
-
-[![Star History Chart](https://api.star-history.com/svg?repos=langbot-app/LangBot&type=Date)](https://star-history.com/#langbot-app/LangBot&Date)
-
---
-
 ## 贡献者

 感谢所有[贡献者](https://github.com/langbot-app/LangBot/graphs/contributors)对 LangBot 的帮助：
@@ -91,6 +91,16 @@ docker compose --profile all up -d

 ---

+## Demo en Vivo
+
+**Pruébelo ahora:** https://demo.langbot.dev/
+- Correo electrónico: `demo@langbot.app`
+- Contraseña: `langbot123456`
+
+*Nota: Entorno de demostración público. No ingrese información confidencial.*
+
+---
+
 ## Plataformas Soportadas

 | Plataforma | Estado | Notas |
@@ -153,14 +163,6 @@ docker compose --profile all up -d

 ---

-## Demo en Vivo
-
-**Pruébelo ahora:** https://demo.langbot.dev/
- Correo electrónico: `demo@langbot.app`
- Contraseña: `langbot123456`
-
-*Nota: Entorno de demostración público. No ingrese información confidencial.*
-
 ## Diseñado para Agentes de IA 🤖

 LangBot es **agent-friendly por diseño** —— tus agentes de codificación (Claude Code, Codex, Copilot, Cursor, …) pueden operar, extender y desplegar LangBot con soporte de primera clase:
@@ -182,12 +184,6 @@ LangBot es **agent-friendly por diseño** —— tus agentes de codificación (C

 ---

-## Historial de Stars
-
-[![Star History Chart](https://api.star-history.com/svg?repos=langbot-app/LangBot&type=Date)](https://star-history.com/#langbot-app/LangBot&Date)
-
---
-
 ## Colaboradores

 Gracias a todos los [colaboradores](https://github.com/langbot-app/LangBot/graphs/contributors) que han ayudado a mejorar LangBot:
@@ -91,6 +91,16 @@ docker compose --profile all up -d

 ---

+## Démo en Ligne
+
+**Essayez maintenant :** https://demo.langbot.dev/
+- Email : `demo@langbot.app`
+- Mot de passe : `langbot123456`
+
+*Note : Environnement de démonstration public. Ne saisissez pas d'informations sensibles.*
+
+---
+
 ## Plateformes Supportées

 | Plateforme | Statut | Notes |
@@ -153,14 +163,6 @@ docker compose --profile all up -d

 ---

-## Démo en Ligne
-
-**Essayez maintenant :** https://demo.langbot.dev/
- Email : `demo@langbot.app`
- Mot de passe : `langbot123456`
-
-*Note : Environnement de démonstration public. Ne saisissez pas d'informations sensibles.*
-
 ## Conçu pour les agents IA 🤖

 LangBot est **agent-friendly par conception** —— vos agents de codage (Claude Code, Codex, Copilot, Cursor, …) peuvent exploiter, étendre et déployer LangBot avec un support de premier ordre :
@@ -182,12 +184,6 @@ LangBot est **agent-friendly par conception** —— vos agents de codage (Claud

 ---

-## Historique des Stars
-
-[![Star History Chart](https://api.star-history.com/svg?repos=langbot-app/LangBot&type=Date)](https://star-history.com/#langbot-app/LangBot&Date)
-
---
-
 ## Contributeurs

 Merci à tous les [contributeurs](https://github.com/langbot-app/LangBot/graphs/contributors) qui ont aidé à améliorer LangBot :
@@ -91,6 +91,16 @@ docker compose --profile all up -d

 ---

+## ライブデモ
+
+**今すぐ試す:** https://demo.langbot.dev/
+- メール: `demo@langbot.app`
+- パスワード: `langbot123456`
+
+*注意: 公開デモ環境です。機密情報を入力しないでください。*
+
+---
+
 ## 対応プラットフォーム

 | プラットフォーム | ステータス | 備考 |
@@ -153,14 +163,6 @@ docker compose --profile all up -d

 ---

-## ライブデモ
-
-**今すぐ試す:** https://demo.langbot.dev/
- メール: `demo@langbot.app`
- パスワード: `langbot123456`
-
-*注意: 公開デモ環境です。機密情報を入力しないでください。*
-
 ## AI エージェントのために 🤖

 LangBot は **設計段階からエージェントフレンドリー** です。お使いのコーディングエージェント（Claude Code、Codex、Copilot、Cursor など）が、ファーストクラスのサポートで LangBot を操作・拡張・デプロイできます：
@@ -182,12 +184,6 @@ LangBot は **設計段階からエージェントフレンドリー** です。

 ---

-## Star 推移
-
-[![Star History Chart](https://api.star-history.com/svg?repos=langbot-app/LangBot&type=Date)](https://star-history.com/#langbot-app/LangBot&Date)
-
---
-
 ## コントリビューター

 LangBot をより良くするために貢献してくださったすべての[コントリビューター](https://github.com/langbot-app/LangBot/graphs/contributors)に感謝します:
@@ -91,6 +91,16 @@ docker compose --profile all up -d

 ---

+## 라이브 데모
+
+**지금 체험:** https://demo.langbot.dev/
+- 이메일: `demo@langbot.app`
+- 비밀번호: `langbot123456`
+
+*참고: 공개 데모 환경입니다. 민감한 정보를 입력하지 마세요.*
+
+---
+
 ## 지원 플랫폼

 | 플랫폼 | 상태 | 비고 |
@@ -153,14 +163,6 @@ docker compose --profile all up -d

 ---

-## 라이브 데모
-
-**지금 체험:** https://demo.langbot.dev/
- 이메일: `demo@langbot.app`
- 비밀번호: `langbot123456`
-
-*참고: 공개 데모 환경입니다. 민감한 정보를 입력하지 마세요.*
-
 ## AI 에이전트를 위한 설계 🤖

 LangBot은 **설계 단계부터 에이전트 친화적**입니다 —— 코딩 에이전트(Claude Code, Codex, Copilot, Cursor 등)가 일급 지원으로 LangBot을 운영·확장·배포할 수 있습니다:
@@ -182,12 +184,6 @@ LangBot은 **설계 단계부터 에이전트 친화적**입니다 —— 코딩

 ---

-## Star 추이
-
-[![Star History Chart](https://api.star-history.com/svg?repos=langbot-app/LangBot&type=Date)](https://star-history.com/#langbot-app/LangBot&Date)
-
---
-
 ## 기여자

 LangBot을 더 나은 프로젝트로 만들어 주신 모든 [기여자](https://github.com/langbot-app/LangBot/graphs/contributors)분들께 감사드립니다:
@@ -91,6 +91,16 @@ docker compose --profile all up -d

 ---

+## Демо
+
+**Попробуйте прямо сейчас:** https://demo.langbot.dev/
+- Email: `demo@langbot.app`
+- Пароль: `langbot123456`
+
+*Примечание: Публичная демо-среда. Не вводите конфиденциальную информацию.*
+
+---
+
 ## Поддерживаемые платформы

 | Платформа | Статус | Примечания |
@@ -153,14 +163,6 @@ docker compose --profile all up -d

 ---

-## Демо
-
-**Попробуйте прямо сейчас:** https://demo.langbot.dev/
- Email: `demo@langbot.app`
- Пароль: `langbot123456`
-
-*Примечание: Публичная демо-среда. Не вводите конфиденциальную информацию.*
-
 ## Создано для ИИ-агентов 🤖

 LangBot **дружелюбен к агентам по своей архитектуре** —— ваши кодинг-агенты (Claude Code, Codex, Copilot, Cursor и др.) могут управлять, расширять и развёртывать LangBot с первоклассной поддержкой:
@@ -182,12 +184,6 @@ LangBot **дружелюбен к агентам по своей архитек

 ---

-## История Stars
-
-[![Star History Chart](https://api.star-history.com/svg?repos=langbot-app/LangBot&type=Date)](https://star-history.com/#langbot-app/LangBot&Date)
-
---
-
 ## Участники

 Спасибо всем [участникам](https://github.com/langbot-app/LangBot/graphs/contributors), которые помогли сделать LangBot лучше:
@@ -93,6 +93,16 @@ docker compose --profile all up -d

 ---

+## 線上演示
+
+**立即體驗：** https://demo.langbot.dev/
+- 信箱：`demo@langbot.app`
+- 密碼：`langbot123456`
+
+*注意：公開演示環境，請不要在其中填入任何敏感資訊。*
+
+---
+
 ## 支援的平台

 | 平台 | 狀態 | 備註 |
@@ -169,14 +179,6 @@ docker compose --profile all up -d

 ---

-## 線上演示
-
-**立即體驗：** https://demo.langbot.dev/
- 信箱：`demo@langbot.app`
- 密碼：`langbot123456`
-
-*注意：公開演示環境，請不要在其中填入任何敏感資訊。*
-
 ## 為 AI Agent 而生 🤖

 LangBot **從設計上就對 Agent 友善** —— 你的編碼 Agent（Claude Code、Codex、Copilot、Cursor 等）可以一等公民般地操作、擴充和部署 LangBot：
@@ -200,12 +202,6 @@ LangBot **從設計上就對 Agent 友善** —— 你的編碼 Agent（Claude C

 ---

-## Star 趨勢
-
-[![Star History Chart](https://api.star-history.com/svg?repos=langbot-app/LangBot&type=Date)](https://star-history.com/#langbot-app/LangBot&Date)
-
---
-
 ## 貢獻者

 感謝所有[貢獻者](https://github.com/langbot-app/LangBot/graphs/contributors)對 LangBot 的幫助：
@@ -91,6 +91,16 @@ docker compose --profile all up -d

 ---

+## Demo trực tuyến
+
+**Thử ngay:** https://demo.langbot.dev/
+- Email: `demo@langbot.app`
+- Mật khẩu: `langbot123456`
+
+*Lưu ý: Môi trường demo công khai. Không nhập thông tin nhạy cảm.*
+
+---
+
 ## Nền tảng được hỗ trợ

 | Nền tảng | Trạng thái | Ghi chú |
@@ -153,14 +163,6 @@ docker compose --profile all up -d

 ---

-## Demo trực tuyến
-
-**Thử ngay:** https://demo.langbot.dev/
- Email: `demo@langbot.app`
- Mật khẩu: `langbot123456`
-
-*Lưu ý: Môi trường demo công khai. Không nhập thông tin nhạy cảm.*
-
 ## Được xây dựng cho AI Agent 🤖

 LangBot **thân thiện với agent ngay từ thiết kế** —— các coding agent của bạn (Claude Code, Codex, Copilot, Cursor, …) có thể vận hành, mở rộng và triển khai LangBot với sự hỗ trợ hạng nhất:
@@ -182,12 +184,6 @@ LangBot **thân thiện với agent ngay từ thiết kế** —— các coding

 ---

-## Lịch sử Star
-
-[![Star History Chart](https://api.star-history.com/svg?repos=langbot-app/LangBot&type=Date)](https://star-history.com/#langbot-app/LangBot&Date)
-
---
-
 ## Người đóng góp

 Cảm ơn tất cả [người đóng góp](https://github.com/langbot-app/LangBot/graphs/contributors) đã giúp LangBot trở nên tốt hơn:
@@ -14,6 +14,13 @@ services:
    restart: on-failure
    environment:
      - TZ=Asia/Shanghai
+      # Shared with the langbot service and sent only as a WebSocket handshake
+      # header. Generate with: openssl rand -hex 32
+      - LANGBOT_PLUGIN_RUNTIME_CONTROL_TOKEN=${LANGBOT_PLUGIN_RUNTIME_CONTROL_TOKEN:-}
+      # Process-wide admission for every asyncio.to_thread() call.
+      - LANGBOT_BLOCKING_EXECUTOR_MAX_WORKERS=${LANGBOT_BLOCKING_EXECUTOR_MAX_WORKERS:-8}
+      - LANGBOT_BLOCKING_EXECUTOR_MAX_PENDING=${LANGBOT_BLOCKING_EXECUTOR_MAX_PENDING:-128}
+      - LANGBOT_BLOCKING_EXECUTOR_MAX_INFLIGHT_PER_SCOPE=${LANGBOT_BLOCKING_EXECUTOR_MAX_INFLIGHT_PER_SCOPE:-4}
    command: ["uv", "run", "--no-sync", "-m", "langbot_plugin.cli.__init__", "rt"]
    networks:
      - langbot_network
@@ -40,9 +47,19 @@ services:
    restart: on-failure
    environment:
      - TZ=Asia/Shanghai
+      # Shared control-plane secret used to authenticate both the RPC socket
+      # and managed-process relay. Generate once (for example with
+      # ``openssl rand -hex 32``) and export it before enabling this profile.
+      # An empty value is accepted by Compose so Box can remain optional, but
+      # the Box runtime itself fails closed when the profile is started.
+      - LANGBOT_BOX_CONTROL_TOKEN=${LANGBOT_BOX_CONTROL_TOKEN:-}
+      # Box has its own process-wide blocking-work budget.
+      - LANGBOT_BLOCKING_EXECUTOR_MAX_WORKERS=${LANGBOT_BLOCKING_EXECUTOR_MAX_WORKERS:-8}
+      - LANGBOT_BLOCKING_EXECUTOR_MAX_PENDING=${LANGBOT_BLOCKING_EXECUTOR_MAX_PENDING:-128}
+      - LANGBOT_BLOCKING_EXECUTOR_MAX_INFLIGHT_PER_SCOPE=${LANGBOT_BLOCKING_EXECUTOR_MAX_INFLIGHT_PER_SCOPE:-4}
      # The Box runtime does NOT read box.local.* from config.yaml or env; it
-      # receives its configuration from LangBot via the INIT RPC action.
-      # Do not add LANGBOT_BOX_* / BOX__* here — they would be silently ignored.
+      # receives its functional configuration from LangBot via the INIT RPC
+      # action. Do not add BOX__* here because those would be ignored.
    # Launched through the same CLI entry point as the plugin runtime
    # (`langbot_plugin.cli.__init__ <subcommand>`). WebSocket is the default
    # control transport — mirrors `rt`, which also runs with no flag. Pass
@@ -60,6 +77,17 @@ services:
    restart: on-failure
    environment:
      - TZ=Asia/Shanghai
+      # Must match langbot_plugin_runtime. Empty/missing values make the
+      # external control channel fail closed.
+      - LANGBOT_PLUGIN_RUNTIME_CONTROL_TOKEN=${LANGBOT_PLUGIN_RUNTIME_CONTROL_TOKEN:-}
+      # Must match the value supplied to langbot_box. The token is sent only
+      # in WebSocket handshake headers, never in URLs or action payloads.
+      - LANGBOT_BOX_CONTROL_TOKEN=${LANGBOT_BOX_CONTROL_TOKEN:-}
+      # Core process-wide blocking-work admission. These are native config
+      # overrides and are persisted with the effective data/config.yaml.
+      - SYSTEM__BLOCKING_EXECUTOR__MAX_WORKERS=${LANGBOT_BLOCKING_EXECUTOR_MAX_WORKERS:-8}
+      - SYSTEM__BLOCKING_EXECUTOR__MAX_PENDING=${LANGBOT_BLOCKING_EXECUTOR_MAX_PENDING:-128}
+      - SYSTEM__BLOCKING_EXECUTOR__MAX_INFLIGHT_PER_SCOPE=${LANGBOT_BLOCKING_EXECUTOR_MAX_INFLIGHT_PER_SCOPE:-4}
      # Unified env-override convention: SECTION__SUBSECTION__KEY overrides the
      # matching config.yaml field (see LoadConfigStage). These map onto
      # box.* and are forwarded to the Box runtime via INIT RPC.
@@ -4,6 +4,10 @@
 # Full deployment guide (zh/en/ja): https://docs.langbot.app -> Installation -> Kubernetes
 #
 # Usage:
+#   kubectl -n langbot create secret generic langbot-plugin-runtime-control \
+#     --from-literal=token="$(openssl rand -hex 32)"
+#   kubectl -n langbot create secret generic langbot-box-control \
+#     --from-literal=token="$(openssl rand -hex 32)"
 #   kubectl apply -f kubernetes.yaml
 #
 # Prerequisites:
@@ -87,6 +91,12 @@ metadata:
 data:
  TZ: "Asia/Shanghai"
  PLUGIN__RUNTIME_WS_URL: "ws://langbot-plugin-runtime:5400/control/ws"
+  SYSTEM__BLOCKING_EXECUTOR__MAX_WORKERS: "8"
+  SYSTEM__BLOCKING_EXECUTOR__MAX_PENDING: "128"
+  SYSTEM__BLOCKING_EXECUTOR__MAX_INFLIGHT_PER_SCOPE: "4"
+  LANGBOT_BLOCKING_EXECUTOR_MAX_WORKERS: "8"
+  LANGBOT_BLOCKING_EXECUTOR_MAX_PENDING: "128"
+  LANGBOT_BLOCKING_EXECUTOR_MAX_INFLIGHT_PER_SCOPE: "4"
  # Box sandbox runtime endpoint. LangBot connects to the Box runtime over
  # WebSocket. The hostname MUST match the langbot-box Service name. Note the
  # in-container default ("langbot_box") uses an underscore, which is an
@@ -127,6 +137,26 @@ spec:
            configMapKeyRef:
              name: langbot-config
              key: TZ
+        - name: LANGBOT_PLUGIN_RUNTIME_CONTROL_TOKEN
+          valueFrom:
+            secretKeyRef:
+              name: langbot-plugin-runtime-control
+              key: token
+        - name: LANGBOT_BLOCKING_EXECUTOR_MAX_WORKERS
+          valueFrom:
+            configMapKeyRef:
+              name: langbot-config
+              key: LANGBOT_BLOCKING_EXECUTOR_MAX_WORKERS
+        - name: LANGBOT_BLOCKING_EXECUTOR_MAX_PENDING
+          valueFrom:
+            configMapKeyRef:
+              name: langbot-config
+              key: LANGBOT_BLOCKING_EXECUTOR_MAX_PENDING
+        - name: LANGBOT_BLOCKING_EXECUTOR_MAX_INFLIGHT_PER_SCOPE
+          valueFrom:
+            configMapKeyRef:
+              name: langbot-config
+              key: LANGBOT_BLOCKING_EXECUTOR_MAX_INFLIGHT_PER_SCOPE
        volumeMounts:
        - name: plugin-data
          mountPath: /app/data/plugins
@@ -139,7 +169,8 @@ spec:
            cpu: "1000m"
        # Liveness probe to restart container if it becomes unresponsive
        livenessProbe:
-          tcpSocket:
+          httpGet:
+            path: /healthz
            port: 5400
          initialDelaySeconds: 30
          periodSeconds: 10
@@ -147,7 +178,8 @@ spec:
          failureThreshold: 3
        # Readiness probe to know when container is ready to accept traffic
        readinessProbe:
-          tcpSocket:
+          httpGet:
+            path: /healthz
            port: 5400
          initialDelaySeconds: 10
          periodSeconds: 5
@@ -246,9 +278,28 @@ spec:
            configMapKeyRef:
              name: langbot-config
              key: TZ
+        - name: LANGBOT_BOX_CONTROL_TOKEN
+          valueFrom:
+            secretKeyRef:
+              name: langbot-box-control
+              key: token
+        - name: LANGBOT_BLOCKING_EXECUTOR_MAX_WORKERS
+          valueFrom:
+            configMapKeyRef:
+              name: langbot-config
+              key: LANGBOT_BLOCKING_EXECUTOR_MAX_WORKERS
+        - name: LANGBOT_BLOCKING_EXECUTOR_MAX_PENDING
+          valueFrom:
+            configMapKeyRef:
+              name: langbot-config
+              key: LANGBOT_BLOCKING_EXECUTOR_MAX_PENDING
+        - name: LANGBOT_BLOCKING_EXECUTOR_MAX_INFLIGHT_PER_SCOPE
+          valueFrom:
+            configMapKeyRef:
+              name: langbot-config
+              key: LANGBOT_BLOCKING_EXECUTOR_MAX_INFLIGHT_PER_SCOPE
        # The Box runtime does NOT read box.local.* / BOX__* from its own env;
-        # it receives its configuration from LangBot via the INIT RPC action.
-        # Do not add BOX__* here — they would be silently ignored.
+        # it receives its functional configuration from LangBot via INIT.
        volumeMounts:
        # Box workspace root — identical path on node, box, and sandbox
        # containers (see the IMPORTANT note above).
@@ -265,14 +316,18 @@ spec:
            memory: "1Gi"
            cpu: "1000m"
        livenessProbe:
-          tcpSocket:
+          httpGet:
+            path: /healthz
            port: 5410
          initialDelaySeconds: 20
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
-          tcpSocket:
+          httpGet:
+            # Unlike liveness, readiness validates the configured backend and
+            # all strict managed-mode isolation guarantees.
+            path: /readyz
            port: 5410
          initialDelaySeconds: 10
          periodSeconds: 5
@@ -319,6 +374,10 @@ metadata:
    app: langbot
 spec:
  replicas: 1
+  # Plugin Runtime has a single active LangBot control owner. Recreate avoids
+  # two LangBot pods fighting over that connection during a rolling update.
+  strategy:
+    type: Recreate
  selector:
    matchLabels:
      app: langbot
@@ -352,6 +411,26 @@ spec:
            configMapKeyRef:
              name: langbot-config
              key: PLUGIN__RUNTIME_WS_URL
+        - name: LANGBOT_PLUGIN_RUNTIME_CONTROL_TOKEN
+          valueFrom:
+            secretKeyRef:
+              name: langbot-plugin-runtime-control
+              key: token
+        - name: SYSTEM__BLOCKING_EXECUTOR__MAX_WORKERS
+          valueFrom:
+            configMapKeyRef:
+              name: langbot-config
+              key: SYSTEM__BLOCKING_EXECUTOR__MAX_WORKERS
+        - name: SYSTEM__BLOCKING_EXECUTOR__MAX_PENDING
+          valueFrom:
+            configMapKeyRef:
+              name: langbot-config
+              key: SYSTEM__BLOCKING_EXECUTOR__MAX_PENDING
+        - name: SYSTEM__BLOCKING_EXECUTOR__MAX_INFLIGHT_PER_SCOPE
+          valueFrom:
+            configMapKeyRef:
+              name: langbot-config
+              key: SYSTEM__BLOCKING_EXECUTOR__MAX_INFLIGHT_PER_SCOPE
        # Box (sandbox) runtime endpoint. Connects LangBot to the langbot-box
        # Service over WebSocket. Remove this (and the langbot-box Deployment)
        # and set BOX__ENABLED=false if you do not want the sandbox.
@@ -360,6 +439,13 @@ spec:
            configMapKeyRef:
              name: langbot-config
              key: BOX__RUNTIME__ENDPOINT
+        # Same Secret as langbot-box. It authenticates the RPC and managed-
+        # process relay handshakes and is never put in a URL or RPC payload.
+        - name: LANGBOT_BOX_CONTROL_TOKEN
+          valueFrom:
+            secretKeyRef:
+              name: langbot-box-control
+              key: token
        # box.local.* config — forwarded to the Box runtime via INIT RPC. The
        # host_root MUST match the box-root hostPath mountPath below AND the box
        # Deployment's box-root mountPath, so that skill package paths resolve
@@ -392,7 +478,7 @@ spec:
        # Liveness probe to restart container if it becomes unresponsive
        livenessProbe:
          httpGet:
-            path: /
+            path: /healthz
            port: 5300
          initialDelaySeconds: 60
          periodSeconds: 10
@@ -401,7 +487,7 @@ spec:
        # Readiness probe to know when container is ready to accept traffic
        readinessProbe:
          httpGet:
-            path: /
+            path: /healthz
            port: 5300
          initialDelaySeconds: 30
          periodSeconds: 5
@@ -8,13 +8,21 @@ API keys can be managed through the web interface:

 1. Log in to the LangBot web interface
 2. Click the "API Keys" button at the bottom of the sidebar
-3. Create, view, copy, or delete API keys as needed
+3. Create an API key and copy its secret immediately
+4. Revoke keys that are no longer needed
+
+Database-backed API-key secrets are returned exactly once. LangBot stores only
+a SHA-256 lookup hash, so an existing secret cannot be displayed or recovered
+later. Each key belongs to one Workspace, has explicit permission scopes, and
+may have an expiry. The Workspace is derived from the authenticated key; an
+`X-Workspace-Id` header cannot redirect it to another tenant.

 ## Global API Key (config.yaml)

 In addition to web-UI-created keys (stored in the database, prefixed `lbk_`),
 LangBot supports a **global API key** defined directly in `data/config.yaml`.
-This is useful for automated deployments, infrastructure-as-code, and AI agents
+This is a Community-edition bootstrap option for automated deployments,
+infrastructure-as-code, and AI agents
 that need API/MCP access **without a login session and without creating a
 database record first**.

@@ -27,10 +35,12 @@ api:

 Behavior:

- When `api.global_api_key` is a non-empty string, that exact value is accepted
-  anywhere a normal API key is accepted — the `X-API-Key` header or
-  `Authorization: Bearer <key>` — across the HTTP service API **and the MCP
-  server**.
+- In Community edition's singleton Workspace, a non-empty
+  `api.global_api_key` is bound to that Workspace and accepted across the HTTP
+  service API and the MCP server.
+- The global config key is rejected when multi-Workspace SaaS mode is enabled;
+  SaaS automation must use a database-backed Workspace key or a closed control
+  plane credential.
 - The global key does **not** require the `lbk_` prefix; use any sufficiently
  strong secret.
 - Leave it empty (`''`, the default) to disable it entirely; only database-backed
@@ -38,9 +48,10 @@ Behavior:
 - Existing installs are unaffected until you add the key — config completion only
  backfills top-level keys, and the lookup is defensive when the field is absent.

-> **Security:** the global key is stored in plaintext in `config.yaml`. Only
-> enable it on trusted/internal deployments, keep the file permissions tight,
-> always serve over HTTPS, and rotate the value if it may have leaked.
+> **Security:** the global key is stored in plaintext in `config.yaml` and has
+> the singleton Workspace's full fixed permission set. Only enable it on
+> trusted/internal Community deployments, keep file permissions tight, always
+> serve over HTTPS, and rotate it if it may have leaked.

 ## Using API Keys

@@ -60,7 +71,9 @@ Authorization: Bearer lbk_your_api_key_here

 ## Available APIs

-All existing LangBot APIs now support **both user token and API key authentication**. This means you can use API keys to access:
+Endpoints that declare API-key authentication accept either a user token or a
+Workspace API key. The key must include the permission required by the route.
+This includes:

 - **Model Management** - `/api/v1/provider/models/llm` and `/api/v1/provider/models/embedding`
 - **Bot Management** - `/api/v1/platform/bots`
@@ -227,6 +240,11 @@ or
 }
 ```

+### 403 Forbidden
+
+The key is valid for its Workspace but does not include the fixed permission
+required by the route.
+
 ### 500 Internal Server Error

 ```json
@@ -240,7 +258,7 @@ or

 1. **Keep API keys secure**: Store them securely and never commit them to version control
 2. **Use HTTPS**: Always use HTTPS in production to encrypt API key transmission
-3. **Rotate keys regularly**: Create new API keys periodically and delete old ones
+3. **Rotate keys regularly**: Create new API keys periodically and revoke old ones
 4. **Use descriptive names**: Give your API keys meaningful names to track their usage
 5. **Delete unused keys**: Remove API keys that are no longer needed
 6. **Use X-API-Key header**: Prefer using the `X-API-Key` header for clarity
@@ -317,7 +335,6 @@ curl -X POST \

 ## Notes

- The same endpoints work for both the web UI (with user tokens) and external services (with API keys)
+- API-key-enabled endpoints use the same resource shapes as the web UI
 - No need to learn different API paths - use the existing API documentation with API key authentication
- All endpoints that previously required user authentication now also accept API keys
-
+- API keys never select a Workspace from a request header; their persisted binding is authoritative
@@ -31,8 +31,10 @@ Both are compatible with LangBot.

 ## Installation

-Valkey Search support is included when you install LangBot — the `valkey-glide` dependency is
-declared in `pyproject.toml`. To install manually:
+Valkey Search support is included automatically on Linux and macOS. The official `valkey-glide`
+client does not currently publish a Windows package, so LangBot skips this optional dependency on
+Windows; LangBot remains usable there, but the Valkey Search backend is unavailable. To install the
+client manually on a supported platform:

 ```bash
 pip install 'valkey-glide>=2.4.1,<3.0.0'
@@ -0,0 +1,166 @@
+# LangBot 多租户数据库迁移指南
+
+## 概述
+
+LangBot 从单租户 OSS 架构迁移到多租户 SaaS 架构，需要执行 7 个数据库迁移（0009-0015）。
+
+## 迁移序列
+
+```
+0009_workspace_tenancy_kernel → 创建 Workspace、成员、邀请表
+0010_scope_tenant_resources → 所有业务表添加 workspace_uuid
+0011_postgres_tenant_rls → PostgreSQL 行级安全策略
+0012_plugin_installation_identity → 插件实例租户绑定
+0013_tenant_pgvector → RAG 向量存储隔离
+0014_cloud_directory_projection → Cloud 控制平面同步
+0015_cloud_core_collaboration → 协作和权限功能
+```
+
+## 执行步骤
+
+### 1. 备份（必须）
+
+```bash
+# SQLite
+cp ~/.langbot/data/langbot.db ~/.langbot/data/langbot.db.backup-$(date +%Y%m%d)
+
+# PostgreSQL  
+pg_dump -U langbot_user -d langbot_db -F c -f langbot_backup_$(date +%Y%m%d).dump
+```
+
+### 2. 执行迁移
+
+```bash
+# 停止服务
+sudo -S -p '' systemctl stop langbot
+
+# 执行迁移
+python -m langbot.pkg.persistence.migration upgrade head
+
+# 验证
+python -m langbot.pkg.persistence.migration current
+# 预期: 0015_cloud_core_collaboration
+
+# 启动服务
+sudo -S -p '' systemctl start langbot
+```
+
+### 3. OSS 单租户自动迁移
+
+迁移会自动：
+- 创建默认 Workspace（名称："Default Workspace"）
+- 第一个用户成为 Owner
+- 所有现有资源绑定到该 Workspace
+
+### 4. 验证检查
+
+```bash
+# 检查 Workspace
+python << EOF
+from langbot.pkg.persistence import manager
+import sqlalchemy as sa
+
+with manager.engine.connect() as conn:
+    ws = conn.execute(sa.text("SELECT uuid, name FROM workspaces LIMIT 1")).first()
+    print(f"Workspace: {ws[1]} ({ws[0]})")
+    
+    # 检查资源绑定
+    bot_count = conn.execute(sa.text(
+        f"SELECT COUNT(*) FROM bots WHERE workspace_uuid='{ws[0]}'"
+    )).scalar()
+    print(f"Bots: {bot_count}")
+EOF
+```
+
+## 回滚方案
+
+### 完全回滚（丢失多租户数据）
+
+```bash
+# 1. 停止服务
+sudo -S -p '' systemctl stop langbot
+
+# 2. 恢复备份
+cp ~/.langbot/data/langbot.db.backup-YYYYMMDD ~/.langbot/data/langbot.db
+
+# 3. 回退代码
+git checkout v4.10.x
+pip install -e .
+
+# 4. 启动
+sudo -S -p '' systemctl start langbot
+```
+
+### 降级迁移（保留数据但移除多租户）
+
+```bash
+# 警告：会移除 Workspace 表但保留资源
+python -m langbot.pkg.persistence.migration downgrade 0008_mcp_resource_prefs
+```
+
+## 常见问题
+
+### Q: 迁移后无法登录
+
+```bash
+# 检查用户 UUID
+python << EOF
+from langbot.pkg.persistence import manager
+import sqlalchemy as sa
+
+with manager.engine.connect() as conn:
+    users = conn.execute(sa.text("SELECT id, user, uuid, status FROM users")).all()
+    for u in users:
+        print(f"{u[1]}: UUID={u[2]}, Status={u[3]}")
+EOF
+```
+
+### Q: 资源看不见了
+
+检查 Workspace 上下文：
+```bash
+# 前端请求需要带 X-Workspace-ID header
+curl -H "Authorization: Bearer $TOKEN" \
+     -H "X-Workspace-ID: $WORKSPACE_UUID" \
+     http://localhost:5200/api/v1/platform/bots
+```
+
+### Q: 迁移速度慢
+
+```bash
+# SQLite 优化
+sqlite3 ~/.langbot/data/langbot.db << EOF
+PRAGMA journal_mode=WAL;
+PRAGMA synchronous=NORMAL;
+VACUUM;
+EOF
+```
+
+## 性能调优
+
+### PostgreSQL 索引
+
+```sql
+-- 迁移后创建
+CREATE INDEX CONCURRENTLY idx_model_providers_workspace 
+    ON model_providers(workspace_uuid);
+
+CREATE INDEX CONCURRENTLY idx_bots_workspace 
+    ON bots(workspace_uuid);
+
+CREATE INDEX CONCURRENTLY idx_pipelines_workspace 
+    ON pipelines(workspace_uuid);
+```
+
+## 预估时间
+
+- SQLite < 100MB: 2-5 分钟
+- SQLite 100MB-1GB: 5-15 分钟  
+- PostgreSQL: < 5 分钟（取决于数据量）
+
+## 支持
+
+问题反馈：https://github.com/langbot-app/LangBot/issues
+
+**版本**: 1.0  
+**最后更新**: 2026-07-30
@@ -0,0 +1,114 @@
+# LangBot Cloud 24 小时资源 Soak 门禁
+
+`scripts/cloud_runtime_soak.py` 是生产候选拓扑的最终资源稳定性门禁。它不替代单元测试、历史 churn 探针或 nsjail 隔离测试；它把以下三类证据按同一时间轴采集并给出可机读的 pass/fail：
+
+- Core、Plugin Runtime 和 Box Runtime 的 HTTP liveness/readiness。
+- 三个 Python 进程的 event-loop recent max/p95 调度延迟。
+- Linux `/proc` 进程树的 current RSS、累计 CPU、线程、文件描述符和子进程数。
+- cgroup v2 的 `memory.current/peak/events`、swap、CPU usage/throttling、PID current/events 和实际硬限制。
+
+生产批准必须使用 cgroup 证据。`--pid` 只适合本地诊断，因为进程指标无法证明 OOM kill、PID limit 或 CPU throttling。
+
+## 运行位置
+
+建议把采集器放在独立的 node agent 或监控 sidecar 中，并只读挂载三个目标容器的 cgroup 路径。不要把采集器和样本文件放进被测容器自己的 cgroup/数据卷，否则采集器的 CPU、内存和 page cache 会污染目标数据。
+
+Kubernetes/containerd 生成的 cgroup 路径不是稳定 API。每次生产候选部署都必须从实际 pod/container ID 解析，不能从 pod 名猜路径。传入的每个目录都必须至少可读：
+
+- `memory.current`、`memory.events`
+- `cpu.stat`、`cpu.max`
+- `pids.current`、`pids.events`
+
+最终门禁应加 `--require-hard-limits`。该选项要求每个目标 cgroup 都能观察到有限的 CPU quota、memory、swap 和 PID 上限；任一值为 `max` 都失败。
+
+## 标准 24 小时命令
+
+```bash
+uv run python scripts/cloud_runtime_soak.py \
+  --duration 24h \
+  --startup-grace 5m \
+  --sample-interval 15s \
+  --cooldown 30m \
+  --analysis-window 30m \
+  --http-timeout 5s \
+  --max-memory-growth-mib 64 \
+  --max-memory-slope-mib-per-hour 32 \
+  --max-tail-cpu-cores 0.5 \
+  --max-throttled-period-ratio 0.25 \
+  --max-event-loop-lag-ms 1000 \
+  --max-event-loop-p95-lag-ms 250 \
+  --require-hard-limits \
+  --endpoint core=http://langbot:5300/healthz \
+  --endpoint plugin=http://langbot-plugin-runtime:5400/healthz \
+  --endpoint box=http://langbot-box:5410/readyz \
+  --cgroup core=/host-cgroup/CURRENT_CORE_CONTAINER \
+  --cgroup plugin=/host-cgroup/CURRENT_PLUGIN_RUNTIME_CONTAINER \
+  --cgroup box=/host-cgroup/CURRENT_BOX_RUNTIME_CONTAINER \
+  --samples-file artifacts/cloud-soak-samples.jsonl \
+  --report-file artifacts/cloud-soak-report.json \
+  --workload uv run python tests/load/cloud_candidate_workload.py
+```
+
+`--duration` 是包含启动观察、负载和冷却期的最大墙钟时间。工作负载必须在截止时间前退出并至少留出 30 分钟冷却；否则门禁会终止负载并失败。若负载由外部系统控制，可以省略 `--workload`，但必须保证最后 `--analysis-window` 完全无测试流量，该窗口才可解释为空闲尾段。
+
+工作负载命令的 stdout/stderr 会转发到采集器 stderr，不会混入 stdout 的最终 JSON 报告。命令以独立 process group 启动；超时或中断时整组收到 TERM，10 秒后仍未退出则收到 KILL。
+
+凭据只能通过 workload 进程环境或 secret mount 注入，不能放在命令参数中。最终报告只记录可执行文件名和参数个数，不保存参数正文；采集器也拒绝带 userinfo、query 或 fragment 的健康 URL。
+
+## 必须覆盖的负载
+
+同一候选版本至少要覆盖：
+
+1. 大批 Workspace 注册、成员邀请、登录和 entitlement 刷新。
+2. Plugin installation reconcile、依赖准备、正常调用、进程崩溃与重启。
+3. Dashboard/Embed/平台 WebSocket 建连、突发消息和批量断连。
+4. Box session、文件同步、并发 exec、managed-process 输出和清理。
+5. PostgreSQL pool 接近容量、事务超时和恢复。
+6. Core、Plugin Runtime、Box 分别收到 SIGTERM 后的优雅重启。
+
+工作负载不能把 API 过载拒绝当作成功吞掉。默认情况下，Core health 中 blocking executor 的 global/scope rejection counter 只要增长，门禁即失败；只有专门验证“过载会正确返回 429”的独立测试才可以使用 `--allow-rejections`，该次运行不能作为生产批准证据。
+
+## 判定规则
+
+整个有效观察期内出现以下任一情况即失败：
+
+- 健康接口请求失败、非 2xx、Core `code != 0`，或 Box `ready=false`。
+- `memory.events.high/max/oom/oom_kill/oom_group_kill` 增长。
+- `pids.events.max` 增长。
+- cgroup 单调计数器回退，表示目标很可能发生了未记录的重启或 cgroup 替换。
+- CPU throttled-period ratio 超过配置阈值。
+- 任一健康采样窗口的 event-loop recent max 超过 1 秒，或冷却尾段 recent p95 超过 250 ms。
+- 健康接口缺少 event-loop monitor、monitor 未持续运行，或其 sample counter 回退。
+- blocking executor rejection counter 增长。
+- Plugin Runtime restart circuit 的累计打开次数增长。
+- Core 目录 active Workspace、最近 snapshot/delta Workspace 或 membership 基数
+  超过各自配置上限，或 PostgreSQL `checked_out` 超过配置 pool 容量；相关 current/max
+  指标只出现一半或 max 非法也失败。
+
+负载结束后的冷却尾段还必须满足：
+
+- `memory.current`/RSS 的稳健首尾增长和线性斜率不能同时超过阈值。
+- 平均 CPU 核数不超过 `--max-tail-cpu-cores`。
+- event-loop recent p95 不超过 `--max-event-loop-p95-lag-ms`。
+- blocking executor `pending` 至少回到过零；不能整个尾段持续积压。
+- Plugin Runtime restart coordinator 的 active launch、half-open probe 和
+  circuit open remaining time 必须回到零，`gate_waiters` 必须至少归零一次。
+- Core 的 MCP projection retirement queue/worker 和 message aggregation
+  buffer/scope 必须至少归零一次。
+- telemetry、QueryPool、MCP host/dispatch、Box creating/closing/background 等临时 gauge 不能继续增长。
+
+内存判定要求“增长量”和“斜率”同时越界，避免几 MiB allocator/page-cache 噪声在短窗口被外推成很大的每小时斜率。最终报告仍保留实际增长与斜率，人工审查时不能只看 verdict。
+
+## 产物与退出码
+
+- `--samples-file`：逐样本 JSONL，写入后立即 flush，供时序图和故障定位。
+- `--report-file`：最终汇总、阈值、资源硬限制、OOM/PID/throttle delta、尾段斜率和 workload 状态。
+- stdout：与 report 文件相同的最终 JSON；workload 日志只写 stderr。
+
+退出码：
+
+- `0`：全部门禁通过。
+- `1`：采样完成但资源门禁失败。
+- `2`：CLI 参数或目标配置错误。
+
+必须保存原始 JSONL、最终报告、三个镜像 digest、LangBot/SDK commit、生产配置摘要和工作负载版本。滚动更新、节点迁移或镜像变化后，旧报告不能继续作为新候选版本的批准证据。
@@ -0,0 +1,273 @@
+# Cloud v2 仍待验证事项
+
+状态：`NOT APPROVED FOR SAAS ACTIVATION`
+
+更新日期：2026-07-29
+
+本文是 Cloud v2 首期上线前的剩余验证清单。它只记录尚不能由当前代码审查、
+单元测试、集成测试、合成容量探针或短时 Linux 容器实验替代的证据。这里的项目
+不属于 2026-07-29 代码与本地测试资源审查的完成条件，也不会让该审查持续保持未完成；
+它们只在准备最终 SaaS 激活时重新进入验收范围。
+
+相关文档：
+
+- [多租户架构决策](./pending-architecture-decisions.md)
+- [实现决策记录](./implementation-decisions.md)
+- [Runtime 资源安全审查](./runtime-resource-audit-2026-07-28.md)
+- [24 小时资源 Soak 门禁](./cloud-runtime-soak-gate.md)
+
+## 1. 当前已形成的交付基线
+
+- LangBot Core 全量 `2855 passed, 33 skipped`，Plugin SDK 全量
+  `1328 passed`，闭源适配器 `40 passed`，Space Go 全量测试通过；三仓格式、
+  静态检查和 `git diff --check` 已通过。
+- Plugin Runtime 和 Box Runtime 的公开健康接口、event-loop lag 与有界
+  blocking executor 指标已经过真实进程短时验证。
+- 仓库 Dockerfile 构建的 Linux/cgroup v2 短时探针已证明 CPU、memory、
+  swap 和 PID 限制代码路径可工作。
+- PostgreSQL 16 + RLS 的 1,000 Workspace 真实启动测试，以及 5,000
+  Workspace 三代替换合成探针已通过。
+- Core 已精确钉住 Plugin SDK 提交
+  `1d65ed301a6afc52150a998043f73cd6032c8162`。最终验证必须使用包含该提交的
+  Core、Plugin Runtime 和 Box Runtime 镜像，不能混用旧 SDK。
+- 独立资源复核已经移除 Cloud MCP 每会话 5 秒查询执行绑定的轮询，改由签名目录
+  投影提交后向一个合并回收任务发布代次变化；工具与资源调用前后仍使用数据库
+  execution fence。Plugin restart 冷却等待者、MCP 投影回收、消息聚合 buffer/scope
+  均已纳入健康快照和 soak 归零门禁。
+- 单实例目录现在有一致的操作容量契约：Space 在注册事务内通过 PostgreSQL
+  advisory lock 串行执行 active Workspace check-and-create；Space 全量快照只返回
+  active Workspace，并在查询阶段限制 Workspace/membership 数量；闭源适配器限制
+  解压后的 HTTP 响应字节和签名目录基数；Core 在持有目录投影行锁的事务内再次
+  COUNT active Workspace，超限时整批回滚且不推进 cursor。任何一层都不截断权威数据。
+- Core Cloud PostgreSQL pool 的 `pool_size + max_overflow` 有绝对上限 100，
+  Cloud runtime 连接默认强制 60 秒 statement/idle-transaction timeout 和 5 秒
+  lock timeout；pool 使用量、超时累计数与目录 active/max 基数进入 `/healthz`。
+  Box Runtime 的 session、process、admission record、RPC 文件和 completed retention
+  配置也有不可被实例配置放大的绝对上限。
+- Space 的 concurrent registration 容量准入已在一次性 PostgreSQL 16 上真实执行：
+  两个 Account 同时争用最后一个 Workspace 槽位时，精确一个事务成功、一个事务
+  得到 capacity error，最终 active Workspace 数为 1。active-only snapshot 与
+  archived delta tombstone 的同一真实 PostgreSQL 集成流程也通过。
+- Core Cloud manager 已连接一次性 PostgreSQL 16，并从 `pg_settings` 读回
+  `statement_timeout=60000ms`、`lock_timeout=5000ms` 和
+  `idle_in_transaction_session_timeout=60000ms`；测试结束后引擎已显式 dispose。
+- 独立异常路径复核已补齐 HTTPX 超限/取消时的底层流关闭；Monitoring 查询、导出和
+  detail 物化量均有实例上限与绝对上限，detail 统计使用数据库聚合。Token statistics
+  不再拉取全部历史 LLM call 在 Python 中分桶，而由 PostgreSQL/SQLite 聚合并只返回
+  有界的最新时间桶和模型分组，截断状态在响应中显式可见。邀请、Monitoring 和 Storage
+  周期清理已合并为一个先等待首个 interval 的调度器，同一周期只进行一次 Workspace
+  discovery；数据库删除批次和本地/S3 文件候选也有每轮硬上限。
+
+以上结果是进入生产候选验证的前提，不是 SaaS 上线批准。
+
+## 2. 尚有实现前置条件的阻断项
+
+以下项目不是“再跑一次测试”即可关闭。必须先完成实现，再执行对应验收。
+
+| 编号 | 阻断项 | 完成实现后的最低验收证据 |
+| --- | --- | --- |
+| B-01 | Cloud 插件缺少生产 egress policy | 证明插件只能访问允许的公网目标，不能访问 Core/Box/数据库、其他内部服务、loopback、link-local 或云 metadata endpoint |
+| B-02 | Plugin installation 与 Box Workspace/Skill/root/tmp/home 缺少真实的 byte 和 inode 硬配额 provider | 在写入边界原子拒绝超额；并发写入、重启和配额耗尽后不能越界，也不能用目录扫描或事后清理冒充硬配额 |
+| B-03 | 普通业务写入尚未具备贯穿 commit 的 generation-aware fence、同事务 business outbox，以及 generation cutover 后稳定的 durable-object 引用 | 在旧 generation 与新 generation 并发、事务提交竞态和重复投递下，旧 owner 不产生业务写入或外部副作用，outbox 可幂等恢复 |
+
+任一 B 类项目未关闭时，不得把 24 小时 soak 的通过结果解释为可以上线。
+
+## 3. 最终部署环境验证
+
+### V-01：Plugin Runtime 与 Box 的 Linux 隔离
+
+必须在最终 Cloud Pod security context、容器 runtime 和 cgroup 拓扑中验证，
+不能使用开发机或权限不同的一次性容器替代。
+
+验证内容：
+
+1. nsjail 可以建立 mount、PID、IPC、UTS namespace 和 private `/proc`。
+2. delegated cgroup v2 对每个插件进程和 sandbox 强制 CPU、memory、
+   `memory.swap.max=0` 和 PID 上限。
+3. open files、process 和单文件大小 rlimit 生效。
+4. 插件不能枚举、读取或 signal Runtime 及其他 installation 的进程，
+   不能读取其他 installation 的 home/tmp/data，也不能修改共享只读
+   artifact/environment。
+5. 超额只杀死或拒绝当前 installation/sandbox；Runtime、其他租户和健康接口
+   继续工作。
+6. 进程退出、取消、超时、generation 切换和 Pod SIGTERM 后，cgroup、nsjail
+   目录、子进程和文件描述符均被回收。
+7. 硬限制或 namespace 能力缺失时 readiness 失败关闭，不允许降级成普通进程。
+
+通过证据必须包含实际容器安全配置、cgroup 文件值、探针原始输出和失败注入结果。
+
+### V-02：Box 持久卷与硬存储配额
+
+在 B-02 的 quota provider 实现后，必须验证：
+
+1. Core 与 Box Runtime 通过随机 marker challenge 证明使用同一共享持久卷。
+2. Workspace、Skill store、ephemeral root/tmp/home 的 byte 和 inode quota
+   都在写入点生效。
+3. 并发写、压缩包展开、文件同步、重启恢复和删除重建不能绕过配额。
+4. 配额耗尽只影响目标 Workspace，其他 Workspace 仍能执行。
+5. 任一硬存储能力缺失时 `/readyz` 返回非 2xx，Pod 不进入就绪流量。
+
+### V-03：PostgreSQL 与 pgvector 生产边界
+
+必须在最终 PostgreSQL endpoint、凭据和网络策略下验证：
+
+1. migrator 与 runtime 使用不同 role；runtime role 无 superuser、
+   `BYPASSRLS`、DDL、对象所有权、role membership 或额外 schema 权限。
+2. runtime credential 只能连接目标 business database。需要专用
+   cluster/endpoint，或经过测试的 HBA/proxy policy；database 内 catalog
+   audit 本身不能证明这一点。
+3. release migration Job 的 advisory lock、失败重试、回滚和精确 Alembic head
+   校验有效；应用启动角色不能执行 migration 或其他 DDL。
+4. 若使用 PgBouncer，transaction pooling、异常回滚和连接复用不会残留
+   tenant context。
+5. 故意遗漏应用层 Workspace filter 时，RLS 仍阻止跨租户读写。
+6. 两个 Workspace 使用相同 `vector_id`、猜测其他 Workspace ID、后台任务和
+   连接复用时，pgvector CRUD 均不能越权。
+7. dimension mismatch、extension/schema/ACL drift 或 runtime audit 失败时，
+   Core 启动失败且不回退到其他向量后端。
+
+### V-04：最终镜像和配置一致性
+
+生产候选验证前必须固定：
+
+- Core、Plugin Runtime、Box Runtime 的不可变镜像 digest；
+- LangBot 与 SDK commit；
+- `data/config.yaml` 的非敏感摘要和所有环境变量覆写；
+- Space 的 `CLOUD_V2_MAX_DIRECTORY_WORKSPACES` 必须与 Core
+  `cloud.directory.max_active_workspaces` 和
+  `cloud.directory.max_snapshot_workspaces` 一致；Space 的 membership 上限必须与
+  Core `cloud.directory.max_snapshot_memberships` 一致；
+- Core 的 PostgreSQL pool、statement/lock/idle-transaction timeout，以及 Plugin
+  Runtime/Box Runtime 的全部实例级资源上限；
+- PostgreSQL migration revision；
+- Cloud Adapter、Space control plane 和 workload 的版本。
+
+滚动更新、节点迁移、配置变化或任一镜像 digest 变化后，旧验证报告失效。
+
+## 4. 多租户行为与故障注入
+
+### V-05：目录、entitlement 与 generation
+
+在真实 Space control plane、闭源 Cloud Adapter 和 Core 之间验证：
+
+1. 注册自动创建个人 Workspace、owner membership、Free subscription、
+   entitlement snapshot 和 outbox 事件，且重试不重复创建。
+2. 邀请、成员变更、套餐变更和 Workspace 撤销只影响目标 Workspace。
+3. 全量快照与增量事件覆盖乱序、重复、断点续传、缺页、签名错误、
+   high-water gap、snapshot coverage 和消费者重启。
+4. Core、Plugin Runtime 或 Box 重启后，权威 desired state 可恢复；
+   本地进程表和缓存不是唯一真相。
+5. generation/revision 切换期间，旧 callback、RPC、WebSocket、Box relay、
+   plugin worker 和缓存写入全部失败关闭。
+6. 为未来多副本预留的 replica-local cursor 语义通过故障注入：
+   一个副本追平不能使另一个副本跳过本地 cache 刷新。
+7. 同时使大量 plugin worker 因系统性故障退出，证明 restart launch 全局并发受限、
+   失败阈值触发 Runtime circuit、冷却后只有一个 half-open probe，且 probe 未稳定前
+   其他 installation 不会继续重启；冷却计时器/状态等待者数量不得超过全局 restart
+   并发，取消 probe 不得把 circuit 永久卡在 half-open；24 小时门禁必须把 circuit
+   打开或 `gate_waiters` 未归零判为失败。
+8. 使用大量空闲 remote MCP session 做 generation 切换，证明目录投影只创建一个
+   合并回收任务，不产生每 session 周期数据库查询、计时器或同时唤醒；旧 session
+   最终关闭，`mcp_projection_retirements` 和
+   `mcp_projection_reconcile_active` 在冷却期归零。
+
+### V-06：套餐、Box 与 stdio MCP
+
+1. Free/非 Pro Workspace 不会自动获得 managed sandbox。
+2. 合资格 Workspace 最多只有一个持久 `global` sandbox，且新增 Workspace
+   不创建专属 Runtime、Pod、PVC、database、schema、role 或连接池。
+3. Cloud 即使 `box.enabled=true`，也不能 create/update/test/start stdio MCP；
+   旧记录和直接 API 调用同样失败关闭，且不创建 `mcp-shared` session。
+4. OSS 默认仍是单 Workspace、多用户，stdio MCP 保持兼容，多租户能力不会被
+   未签名配置或普通环境变量开启。
+
+### V-07：跨租户安全回归
+
+至少使用两个恶意测试 Workspace 验证：
+
+- 同 digest 插件只共享只读代码和依赖；进程、secret、日志及所有可写目录隔离。
+- 同 author/name/version 但 digest 不同的 artifact 不共享目录。
+- Plugin Host API、Box RPC、对象 key、WebSocket、RAG、storage、model/session
+  cache 和平台回调不能接受调用方伪造的 Workspace scope。
+- 撤销 entitlement、删除 installation 或 generation 切换后，已有长连接和
+  in-flight 请求不能继续访问旧权限。
+
+## 5. 生产候选容量与 24 小时门禁
+
+### V-08：真实容量曲线
+
+现有 fake adapter/requester/Plugin handler 探针不能替代真实容量数据。必须使用
+计划上线的平台 SDK、外部 HTTP/WebSocket 连接池、真实插件进程、真实
+PostgreSQL/pgvector 和代表性 Workspace 配置分布，测量：
+
+- 空 Workspace、活跃 Workspace、每个启用插件和每个 Pro sandbox 的边际
+  RSS、线程、文件描述符、连接和 PostgreSQL pool 成本；
+- 启动、目录重放、批量 reconcile 和故障恢复的耗时与峰值；
+- remote MCP 数量增加及目录 generation 批量切换时的数据库 QPS、回收队列和
+  event-loop lag，确认不存在与 session 数量成比例的空闲轮询；
+- 在最大 retention/backlog 和并发 Dashboard 请求下执行不带时间范围的 Monitoring
+  overview/token statistics，验证 SQL 分桶、statement timeout、响应截断和 cleanup
+  追赶不会形成 PostgreSQL CPU 尖峰或 Core RSS 增长；
+- 单实例可批准的 Workspace、活跃 Bot、plugin worker 和 sandbox 上限。
+
+容量上限必须写入生产配置与告警，不能只保留在测试报告中。
+代码中的默认值和绝对上限只是失控配置的最后防线，不等于生产容量结论。V-08 必须
+根据最终镜像的真实曲线把 Space 与 Core 的匹配上限调到已验证容量以内；如果最终
+批准值高于当前默认 1,000 active Workspace，必须重新执行目录启动、故障恢复和
+24 小时门禁。
+
+### V-09：24 小时资源 soak
+
+使用 [标准 24 小时命令](./cloud-runtime-soak-gate.md#标准-24-小时命令)，并强制
+`--require-hard-limits`。工作负载至少覆盖：
+
+1. 注册、邀请、登录和 entitlement 刷新；
+2. plugin reconcile、依赖准备、调用、崩溃与重启；
+3. Dashboard/Embed/平台 WebSocket 建连、突发消息和断连；HTTP Bot 覆盖
+   高基数 session/idempotency、硬容量拒绝、空闲回收及 callback 堵塞；
+   remote MCP 覆盖大量空闲连接、批量 generation 切换和合并回收；
+4. Box session、文件同步、并发 exec、输出与清理；
+5. PostgreSQL pool 接近容量、事务超时和恢复；
+6. Core、Plugin Runtime、Box 分别 SIGTERM 和恢复。
+
+最后至少保留 30 分钟无测试流量冷却。任一健康失败、OOM/memory pressure、
+PID limit、blocking executor rejection、超阈值 CPU throttling/event-loop lag、
+目录 `active_workspaces > max_active_workspaces`、数据库 pool 使用量超过配置容量、
+冷却尾段内存持续增长，或 Plugin restart `gate_waiters`、MCP 投影回收、
+消息聚合 buffer/scope 等临时 gauge 不回落都判为失败。
+标准 soak 工具已自动比较目录 active/最近批次与各自配置上限，并比较 PostgreSQL
+`checked_out` 与配置 pool 容量；名为 `core` 的标准 endpoint 缺少任一容量指标、
+current/max 只出现一半、数值非法或任一样本越界都会直接失败。
+
+必须归档：
+
+- 原始 `cloud-soak-samples.jsonl`；
+- 最终 `cloud-soak-report.json`；
+- 三个镜像 digest、Core/SDK commit；
+- 生产配置摘要、数据库 migration revision 和 workload 版本；
+- 故障注入时间线及关联日志/trace。
+
+## 6. 本轮不作为验收条件的后续事项
+
+以下能力已明确暂缓，不能混入当前验证结果，也不能以“尚未验证”为理由临时发明方案：
+
+- Workspace export、释放、delete、单 Workspace restore 和在线迁移；
+- Workspace 级 BYOK E2B WebUI 配置；
+- 多 Core/Plugin Runtime/Box replica 的 lease store 与调度实现；
+- PostgreSQL 多 shard、dedicated shard 和跨地域部署；
+- 多 CloudInstance、Cell Router 或 Workspace Placement。
+
+这些事项需要后续单独决策。首期实现仍需保留稳定 UUID、generation fence、
+幂等事件和无副本地址泄漏的协议边界。
+
+## 7. 关闭规则
+
+每个 B/V 项只能通过以下方式关闭：
+
+1. 记录被测 commit、镜像 digest、配置摘要和环境拓扑；
+2. 保存可复现命令、原始输出和失败注入证据；
+3. 由报告明确给出 pass/fail，不能只依赖日志中“看起来正常”；
+4. 任一生产候选输入变化后，重跑受影响的验证。
+
+在 B-01 至 B-03 全部实现，且 V-01 至 V-09 均有当前生产候选版本的通过证据前，
+Cloud v2 状态保持 `NOT APPROVED FOR SAAS ACTIVATION`。
@@ -0,0 +1,292 @@
+# Multi-tenant implementation checklist
+
+This checklist turns the Workspace architecture into implementation and
+verification gates. Exact commands and observed results are recorded in the
+[verification report](./verification-report.md).
+
+## Scope guard
+
+- [x] LangBot uses branch feat/multi-tenants.
+- [x] langbot-plugin-sdk uses branch feat/multi-tenants.
+- [x] langbot-space implements the greenfield Cloud v2 modular-monolith control plane without extending the legacy per-account Pod topology; the old Pod UI remains available when Cloud v2 is disabled and only retained Pods appear in the v2 view.
+- [x] Unrelated untracked files in either repository remain untouched.
+- [x] Open-source startup cannot enable SaaS multi-workspace through edition flags or unsigned configuration.
+
+## SaaS activation gates
+
+These items intentionally remain incomplete. Some require additional Core
+transaction/cutover primitives and others require the closed Control Plane or
+deployment. The feature branch delivers the Core isolation kernel, not the
+closed SaaS product or a production Cloud v2 deployment. Checked implementation
+items later in this document do not supersede these gates.
+
+- [x] The closed Control Plane owns the global Account, Workspace, Membership, and Invitation directory.
+- [ ] The closed Control Plane execution-ownership module issues monotonic generations and owner leases for projected Workspaces.
+- [x] Core verifies a signed `InstanceManifest` before the closed bootstrap can inject `CloudWorkspacePolicy`.
+- [ ] Tenant database writes hold a generation-aware shared transaction fence through commit, while execution-owner cutovers take the exclusive fence.
+- [ ] Business writes and non-transactional side effects use a generation-stamped outbox or equivalent publish fence.
+- [ ] Durable object references survive an execution-generation change through stable published keys or an explicitly atomic key/reference migration.
+- [ ] The SaaS runtime pools enforce tenant-safe egress and SSRF controls for Webhooks, providers, MCP servers, and every tenant-configurable outbound URL.
+- [ ] Entitlement checks, usage aggregation, and subscription lifecycle are implemented in the closed Control Plane; production activation still requires provider callback amount/currency/session/expiry binding inside the locked fulfillment transaction.
+- [ ] Account registration persists a `new_api.provision_account` outbox item with the Account and personal Workspace, and an in-process reconciler provisions New API idempotently after commit.
+- [ ] EPay and Stripe callbacks bind provider identity, amount, currency, channel/session, payment status, and expiry to the locked order before entitlement fulfillment.
+- [ ] OAuth state and directory projection use an atomic shared store suitable for horizontally scaled SaaS services.
+- [ ] A greenfield Cloud v2 deployment is designed and validated independently of the legacy Space deployment scheme.
+- [ ] The Plugin Runtime shared profile refuses to run without delegated cgroup v2 CPU, memory-plus-swap, and PID limits, all verified in a real Linux container; production tenant-safe egress remains incomplete.
+- [x] The Plugin Runtime Supervisor automatically restores an unexpectedly exited enabled worker with bounded per-installation backoff.
+- [ ] Jitter, a global restart concurrency limit, and a Runtime-level circuit breaker prevent a systemic failure from creating a cross-tenant restart storm.
+- [ ] Until authenticated Runtime takeover or an owner lease/fence exists, the M0 deployment rolls Core and Plugin Runtime together and forbids an independent Core-only rollout.
+- [ ] Plugin installation data has an operator-owned hard disk quota provider that atomically rejects writes over the limit; directory scans are not accepted as enforcement.
+- [ ] The Box deployment provides an operator-owned quota provider that proves hard byte and inode limits for Workspace, Skill, root, tmp, and home storage.
+- [ ] Core and Box Runtime mount the same durable volume and pass the authenticated marker challenge during startup and reconnect.
+- [ ] Production provisions distinct migrator/runtime credentials and runs the implemented same-host/port/database release command as a one-shot Job, with tested orchestration retry, backup, and rollback procedures.
+- [ ] Production PostgreSQL uses a dedicated cluster/endpoint, or a tested HBA/proxy policy proves the cluster-wide runtime credential can connect only to the target business database.
+- [ ] Any future direct-migrator/pooler-runtime endpoint split is admitted only by a migrator-owned, runtime-read-only database cluster identity that the runtime role cannot spoof.
+- [ ] Legacy pgvector migration failure and retry integration paths prove exact source-table RLS/FORCE restoration; the non-superuser, non-`BYPASSRLS` success path is already covered below.
+- [ ] Multi-workspace is enabled in SaaS only after all closed Control Plane, deployment, and security gates pass.
+
+## 1. Persistence foundation
+
+### Account and directory
+
+- [x] User has a stable, unique account UUID and explicit status.
+- [x] Existing email and password behavior remains compatible during migration.
+- [x] Workspace table represents the instance-local tenant.
+- [x] WorkspaceMembership has a unique Workspace and Account pair.
+- [x] WorkspaceInvitation stores only a token hash and supports expiry, revoke, and one-time accept.
+- [x] WorkspaceExecutionState stores generation, state, source, and write fence.
+- [x] OSS initialization creates exactly one Workspace and one owner membership atomically.
+- [x] OSS refuses a second Workspace while allowing multiple members.
+
+### Migration
+
+- [x] Alembic migration upgrades SQLite.
+- [x] Alembic migration upgrades PostgreSQL.
+- [x] Existing first user becomes owner of the default Workspace.
+- [x] Existing tenant resources are backfilled with the default Workspace UUID.
+- [x] SQLite destructive boundaries create verified, revision-aware backups and atomically restore after failure.
+- [x] Migration can resume safely after interruption.
+- [x] New installs and upgraded installs produce the same tenancy-kernel schema.
+- [x] The first Cloud release pins migrator and runtime sessions to `public` with `current_schemas(false)` containing only that business schema; runtime-role/database `search_path` overrides are rejected.
+- [x] Cloud sessions require `session_replication_role=origin`, `row_security=on`, and `lo_compat_privileges=off`; every persistent `pg_db_role_setting` applicable to the runtime role or current business database is rejected.
+- [x] The release migrator grants the runtime role exact business-table DML, `alembic_version` read-only access, and business-sequence `USAGE/SELECT`, with no `WITH GRANT OPTION` or non-business object grants.
+- [x] Every Cloud runtime startup revalidates the login role, current user, schema, effective/direct ACLs, ownership, memberships in all directions, column ACLs, routines, extensions, foreign objects, parameter ACLs, and other-schema access before serving traffic.
+- [x] The business database requires `vector`, permits only `plpgsql`/`vector` extensions, forbids runtime extension ownership, and contains no foreign data wrapper, foreign server, or user mapping.
+- [x] The runtime role and `PUBLIC` have no explicit routine or parameter ACL; the runtime owns no routine and cannot effectively execute any `SECURITY DEFINER` routine, including extension-owned routines.
+- [x] PostgreSQL's default `PUBLIC TEMP` is documented and tested as a dedicated-business-database v1 compatibility exception; the migrator never grants `TEMP` directly to the runtime role.
+- [x] Legacy pgvector migration succeeds as a non-superuser, non-`BYPASSRLS` source-table owner and restores mixed source-table RLS/FORCE states exactly.
+- [ ] Legacy pgvector migration still needs explicit failure-and-retry integration coverage before SaaS activation.
+
+### Runtime transaction enforcement
+
+- [x] Each tenant UoW owns one task, root transaction, database bind, and transaction-local scope.
+- [x] A scoped Session and every captured bound method become permanently unusable when the owning UoW exits.
+- [x] Public transaction/session control, raw/textual SQL, connection/bind escape, nested transactions, execution/loader options, foreign binds, live results, unapproved functions/operators/casts/types, `INSERT FROM SELECT`, hidden `ON CONFLICT` and batch-value expressions, forced-unquoted identifiers, and custom AST/compiler nodes fail closed and make the UoW rollback-only.
+- [x] ORM `SessionEvents` fail before a registered callback can receive the synchronous Session or transaction connection; rollback cleanup cannot execute the rejected listener.
+- [x] ORM flush, implicit autoflush, and commit reject SQL expressions assigned to mapped attributes before compilation.
+- [x] Tenant relationship loading uses eager loading or explicit async `refresh`; synchronous object-session access and `AsyncAttrs.awaitable_attrs` are not supported tenant APIs.
+- [x] The UoW guard is documented as a trusted-Core misuse boundary rather than an in-process Python sandbox; mapped metadata/compiler registration is trusted, plugins remain out of process, and SQLAlchemy upgrades must rerun the private-container regression suite.
+
+## 2. Authentication and authorization
+
+### Identity
+
+- [x] JWT sub uses account UUID, with a bounded compatibility path for legacy email tokens.
+- [x] Disabled or deleted accounts cannot authenticate.
+- [x] Local password and Space-linked account flows support more than one local Account.
+- [x] Public registration closes after initialization by default.
+- [x] Invitation registration works without requiring SMTP.
+- [x] An unknown Space OAuth subject cannot claim an existing Account by email; explicit account-bound binding is required.
+
+### Request context
+
+- [x] PrincipalContext identifies Account, API Key, or trusted runtime principal.
+- [x] WorkspaceContext contains Workspace, Membership, role, permissions, and revision.
+- [x] RequestContext contains instance UUID, Workspace context, auth type, request ID, and generation.
+- [x] ExecutionContext propagates Workspace and generation to runtime work.
+- [x] SaaS-style requests never fall back to the first or most recent Workspace.
+- [x] OSS may resolve the single Workspace when the selector is omitted.
+- [x] Account-token bootstrap can list only the authenticated Account's active memberships before a Workspace selector exists.
+
+### Fixed RBAC
+
+- [x] owner, admin, developer, operator, and viewer permissions match the architecture matrix.
+- [x] Invitation cannot grant owner.
+- [x] The last owner cannot be removed or demoted.
+- [x] Cross-Workspace resources return 404.
+- [x] Same-Workspace permission failures return 403.
+
+## 3. Workspace and member APIs
+
+- [x] GET /api/v1/workspaces returns the OSS singleton Workspace.
+- [x] POST /api/v1/workspaces returns edition_limit in OSS.
+- [x] Current Workspace endpoint returns the authenticated Membership.
+- [x] Member list is permission scoped.
+- [x] Invitation create, revoke, inspect, and accept are atomic.
+- [x] Member role update and removal enforce owner rules.
+- [x] Invitation tokens travel in a request body and are redacted from logs.
+- [x] Relevant MCP tools and in-repo skills are updated with the same contract.
+
+## 4. Tenant-scoped persistence and services
+
+Each row type must have a non-null Workspace UUID, scoped indexes, scoped uniqueness, and scoped CRUD tests.
+
+- [x] Bots and bot admins.
+- [x] Legacy pipelines and pipeline run records.
+- [x] Model providers.
+- [x] LLM models.
+- [x] Embedding models.
+- [x] Rerank models.
+- [x] Plugin installations, settings, and configuration.
+- [x] MCP servers and resource preferences.
+- [x] Knowledge bases, files, and chunks.
+- [x] Vector collections and handles.
+- [x] Monitoring messages, calls, sessions, errors, embeddings, and feedback.
+- [x] API keys and scopes.
+- [x] Webhooks and public route resolution.
+- [x] Binary storage and Workspace storage.
+- [x] Workspace metadata, separated from system metadata.
+
+### Service and API rules
+
+- [x] Every tenant Service receives RequestContext or an explicit Workspace UUID.
+- [x] No tenant Service treats context None as global access.
+- [x] Every applicable get, list, create, update, delete, copy, export, and bulk operation is scoped.
+- [x] Parent-child references use the same Workspace.
+- [x] API Key authentication derives Workspace from the key, not a header.
+- [x] Webhook and Bot public routes derive Workspace from a trusted resource.
+- [x] Background jobs carry Workspace and generation explicitly.
+
+## 5. Runtime isolation
+
+### Core runtime
+
+- [x] RuntimeBot carries Workspace UUID and execution generation (currently stored in the compatibility field `placement_generation`).
+- [x] RuntimePipeline carries Workspace UUID and execution generation (currently stored in the compatibility field `placement_generation`).
+- [x] Query and Event carry Workspace UUID without making it an authorization source.
+- [x] Session key includes Workspace UUID, Bot UUID, launcher type, and launcher ID.
+- [x] QueryPool and manager indexes cannot collide across Workspaces.
+- [x] Query and aggregation cache keys and locks include Workspace UUID.
+- [x] Runtime transports, cached results, object operations, and long-lived tasks revalidate WorkspaceExecutionState generation at side-effect boundaries.
+- [ ] Ordinary tenant database writes hold the generation fence in the same transaction until commit; this remains a SaaS activation gate.
+
+### Plugin
+
+- [x] Plugin installation and configuration are Workspace scoped.
+- [x] Runtime control actions carry trusted Workspace binding and execution generation (wire-compatible as `placement_generation`).
+- [x] The Plugin Runtime supervisor is instance-scoped and intentionally serves multiple Workspaces.
+- [x] Every plugin process is bound to exactly one Workspace, installation, generation, revision, and verified artifact digest.
+- [x] Same-digest plugin code may be cached once, while worker processes and writable data remain isolated.
+- [x] Same-digest plugin dependencies are prepared once in a Runtime-owned immutable environment and mounted read-only into each isolated worker; dependency failure is surfaced before launch and recorded per installation without blocking other desired-state recovery.
+- [x] Host API derives Workspace from the connection, installation, and trusted action context, not plugin input.
+- [x] Plugin get_bots, models, tools, vector, RAG, configuration, and messaging calls are scoped.
+- [x] Plugin Workspace storage no longer uses owner default.
+- [x] Plugin page APIs check Membership and installation ownership.
+- [x] Local plugin launches use short-lived, one-use registration capabilities bound to manifest identity.
+
+### MCP, RAG, and Box
+
+- [x] MCP runtime key contains instance UUID, Workspace UUID, execution generation, and server UUID.
+- [x] Same-named MCP servers in two Workspaces do not share sessions.
+- [x] Pipeline cannot reference another Workspace's MCP resource.
+- [x] RAG collection names and handles are server-derived and Workspace scoped.
+- [x] Legacy global vector migration is available only to the local OSS singleton Workspace.
+- [x] Object storage paths include instance, Workspace, and execution generation for the fixed-generation OSS runtime.
+- [x] Object storage revalidates generation before touching a provider or resolving an opaque key.
+- [ ] Cloud cutover uses generation-scoped staging plus stable published object references, rather than making the staging generation the durable identity.
+- [x] Box persistent and ephemeral namespaces include the required instance, Workspace, and generation scope.
+- [x] Same-named Box sessions and processes cannot collide across Workspaces or execution generations.
+- [x] Box relay and process I/O reject or retire stale generations.
+- [x] External paths and privileged mounts cannot be supplied by an untrusted plugin.
+- [x] Cloud attachment host I/O uses query UUIDs and link-free dirfd operations with bounded inode traversal.
+- [x] Cloud Skill package paths are Runtime-owned, Workspace-scoped, read-only mounts; Python env/cache stays tenant-writable.
+- [x] Skill ZIP preview/install rejects path escape, links, non-regular files, duplicate entries, excessive compression ratio, entry count, per-file size, and total size.
+- [x] Cloud Box code paths and automated tests require the authenticated marker challenge before startup or reconnect can proceed.
+- [x] Cloud Box readiness fails until hard Workspace, Skill, ephemeral-storage, and inode quota capabilities are available.
+
+## 6. SDK and protocol
+
+- [x] Public Query, Event, Session, and context entities carry backward-compatible Workspace data.
+- [x] Action RPC request models carry trusted Workspace binding where required.
+- [x] Action enums and callers remain consistent.
+- [x] Old plugins continue to deserialize compatible events.
+- [x] Plugins cannot select an arbitrary Workspace through a Host API argument.
+- [x] Runtime storage uses the bound Workspace UUID.
+- [x] SDK API tests pass.
+- [x] Runtime tests pass.
+- [x] Action consistency script passes.
+
+## 7. Frontend
+
+- [x] Every browser tenant API request carries the current Workspace selector after bootstrap.
+- [x] OSS automatically selects the singleton Workspace.
+- [x] OSS does not show Create Workspace or a misleading switcher.
+- [x] Workspace settings show current Workspace information.
+- [x] Members page lists roles and permissions.
+- [x] Invitation creation shows a one-time link when SMTP is unavailable.
+- [x] Invitation acceptance supports a signed-out user flow.
+- [x] Role controls are hidden or disabled consistently with backend permissions.
+- [x] Switching accounts clears stale Workspace query cache and local state.
+- [x] User-facing strings support en_US, zh_Hans, and ja_JP.
+
+## 8. Automated verification
+
+### Persistence and authorization
+
+- [x] SQLite fresh install.
+- [x] SQLite upgrade from pre-tenant schema, including verified failure recovery.
+- [x] PostgreSQL fresh install.
+- [x] PostgreSQL upgrade from pre-tenant schema.
+- [x] All fixed roles have positive and negative permission-matrix tests.
+- [x] Concurrent invitation acceptance creates one Membership.
+- [x] Concurrent owner changes never leave zero owners.
+
+### Cross-tenant isolation
+
+- [x] Two Workspaces are created through a test-only policy.
+- [x] Applicable resource operations and parent-child references have cross-Workspace negative coverage.
+- [x] Resource UUID guessing cannot cross Workspace.
+- [x] API Key cannot cross Workspace.
+- [x] Plugin cannot enumerate or invoke another Workspace's resources.
+- [x] Sessions, caches, locks, MCP, RAG, Box, storage, and monitoring do not collide.
+- [x] Background jobs cannot execute without an explicit Workspace and execution generation.
+
+### Security and revocation
+
+- [x] Space login and binding use purpose-bound, one-time opaque OAuth state; caller-supplied state is rejected.
+- [x] OAuth redirects trust only server-configured WebUI or webhook origins, never request `Host` or `Origin` headers.
+- [x] Dashboard WebSockets revalidate authentication, Membership, resource, permission, and generation per message.
+- [x] Public embed WebSockets re-resolve Bot availability and execution binding per message.
+- [x] Runtime, storage, Plugin Runtime, MCP, RAG, and Box reject a stale execution generation.
+- [x] Unhandled API and webhook failures return a generic error plus request ID without exception text.
+- [x] URL user information and sensitive query parameters are redacted before configuration is serialized or logged.
+
+### Regression
+
+- [x] LangBot unit tests pass.
+- [x] LangBot integration tests pass.
+- [x] Frontend lint completes without errors and the production build passes.
+- [x] SDK focused and full relevant tests pass.
+- [x] LangBot is pinned to the exact pushed SDK commit and cross-repo tests pass against that revision.
+
+## 9. Real browser E2E
+
+- [x] Start from a clean local data directory.
+- [x] First user initializes the singleton Workspace as owner.
+- [x] Owner creates an invitation link.
+- [x] A second signed-out browser identity accepts the invitation and registers.
+- [x] owner, admin, developer, operator, and viewer UI permissions match backend enforcement.
+- [x] Direct API calls cannot bypass hidden controls.
+- [x] Account switch does not expose prior account or Workspace data.
+- [x] Refresh and a new browser tab recover the correct Workspace safely.
+- [x] OSS rejects a second Workspace with `edition_limit`; same-name and same-identifier isolation is covered by the test-only multi-Workspace policy because OSS deliberately has no multi-Workspace browser surface.
+- [x] Explicit error states are visible for expired, revoked, reused, and email-mismatched invitations.
+
+## 10. Completion evidence
+
+- [x] LangBot and SDK branch refs are recorded in the verification report.
+- [x] Space contains the closed adapter package and Cloud v2 control plane, billing, migration, and Workspace UI changes; unrelated pre-existing files remain unstaged.
+- [x] Migration output is captured for SQLite and PostgreSQL.
+- [x] Test commands and results are recorded.
+- [x] Browser E2E actions and observed results are recorded.
+- [x] No remaining tenant table, global Service query, owner default, or unscoped runtime key is found by the final audit.
@@ -0,0 +1,305 @@
+# Multi-tenant implementation decisions
+
+This log records implementation choices made while delivering the Workspace architecture. It is intended to make trade-offs auditable without interrupting implementation for routine decisions.
+
+> Architecture decisions, activation gates, and still-open follow-ups are tracked in
+> [pending-architecture-decisions.md](./pending-architecture-decisions.md). Sections marked as decided there are authoritative;
+> this file records the concrete implementation choices and compatibility names used to realize them.
+
+## 2026-07-18
+
+### OSS remains a singleton Workspace with multiple Accounts
+
+- Decision: Community builds create exactly one Workspace per LangBot instance and allow multiple Accounts through invitations.
+- Reason: This preserves a simple self-hosted deployment while making authorization and ownership explicit. Creating a second Workspace is an edition error, not a hidden fallback.
+- SaaS boundary: Multi-Workspace directory, execution ownership, entitlement, and billing are the responsibility of a separate closed SaaS Control Plane. Core consumes a validated projection and remains the final isolation and authorization enforcement point; it does not become the SaaS system of record or billing engine.
+- Deployment boundary: Cloud v2 is a greenfield deployment design. The previous per-account instance/pod scheme is not migrated or extended and remains available only for existing subscriptions. Existing OAuth, marketplace, and payment rails are reused through explicit adapters where they still fit; new Workspace, subscription, entitlement, directory, and usage modules live alongside the legacy Pod flow in `langbot-space`.
+
+### Workspace selection is trusted only after authentication
+
+- Decision: Browser requests carry `X-Workspace-Id`, but the server resolves it against the authenticated Account membership. API keys, public Bot routes, webhooks, jobs, and plugin calls derive Workspace from their trusted owning resource or binding instead of trusting the header.
+- Reason: A selector is routing input, not authorization evidence.
+- Compatibility: Community builds may select the singleton Workspace when the header is omitted. A multi-Workspace-capable build must reject an omitted selector.
+
+### Stable Account UUID is the token subject
+
+- Decision: New JWTs use the stable Account UUID as `sub`; a bounded compatibility path accepts legacy email-subject tokens and rotates them when checked.
+- Reason: Email can change and therefore cannot be a durable authorization identity.
+
+### Fixed roles are authoritative in Core
+
+- Decision: `owner`, `admin`, `developer`, `operator`, and `viewer` map to a fixed permission matrix in LangBot Core. The last owner cannot be removed or demoted, and invitations cannot create an owner directly.
+- Reason: Core must remain the final authorization boundary in both OSS and SaaS deployments.
+
+### Cross-Workspace access is indistinguishable from absence
+
+- Decision: Resource lookups always include Workspace UUID. A guessed UUID belonging to another Workspace returns 404; a visible resource with insufficient same-Workspace permission returns 403.
+- Reason: This avoids leaking resource existence across tenants while preserving actionable same-tenant errors.
+
+### Plugin Runtime is shared; every plugin process is single-Workspace
+
+- Decision: One instance-scoped Plugin Runtime control plane serves all Workspaces in the logical LangBot instance. Each running plugin installation has its own nsjail worker with an immutable binding containing `instance_uuid`, `workspace_uuid`, `execution_generation` (stored as the compatibility field `placement_generation` until the schema rename), `installation_uuid`, `runtime_revision`, and verified artifact digest; enabled-resident is the desired semantic. A worker never routes actions for another Workspace or installation, and plugin-supplied scope fields are stripped.
+- Isolation: Plugin code is mounted read-only. Home, tmp, and data paths are installation-scoped; process, file-descriptor, file-size, CPU, memory, and PID limits come only from `data/config.yaml` (including native environment overrides), never from a plugin manifest. Cloud requires nsjail and delegated cgroup v2 hard limits or fails closed.
+- Cost boundary: Identical verified package bytes share one digest-addressed code cache. A dependency environment is keyed by the artifact and requirements digests, Python ABI, Runtime version, and installer schema, then atomically published read-only for reuse. Installations and processes are not merged, even for the same plugin and version. Registration creates database desired state only; a worker is launched only for an enabled installation.
+- Recovery: PostgreSQL installation desired state and durable binary storage are authoritative. Runtime reconnect performs an instance-wide full reconciliation, removes stale workers, and can replay a verified package after Runtime-local cache loss. Dependency preparation failure is recorded per installation with `dependency_prepare_failed`; it prevents that worker launch without blocking recovery of other desired installations, and the same revision can be retried. The installation Supervisor now restores an unexpectedly exited enabled worker through a completion callback with bounded exponential backoff. Jitter, global restart concurrency limits, and a Runtime-wide circuit breaker are still required to prove that an infrastructure-wide failure cannot create a cross-tenant restart storm.
+- Compatibility: Older SDK payloads and legacy `data/plugins` remain an OSS-only bridge. Shared mode requires complete bindings and rejects incomplete context.
+- Reason: Sharing the supervisor and immutable code cache removes per-Workspace service cost without turning an untrusted plugin process into a cross-tenant router.
+
+### Invitation delivery does not require SMTP
+
+- Decision: Core returns an invitation secret once for copy-and-share, persists only its hash, and supports expiry, revocation, and one-time acceptance.
+- Reason: Self-hosted OSS must support adding users without an email service while avoiding recoverable invitation secrets at rest.
+- Browser handling: The copyable invitation URL carries the secret in its fragment, which browsers do not send in HTTP requests or Referer headers. The acceptance page immediately removes the fragment and keeps the secret only in `sessionStorage` until login or acceptance completes; it is never placed in a path, query string, analytics event, or persistent local storage.
+
+### Schema rollout is additive before enforcement
+
+- Decision: Add Account/Workspace directory tables first, then add non-null Workspace ownership to every tenant resource with a deterministic default-Workspace backfill. Runtime and service enforcement is enabled only with matching migration and isolation tests.
+- Reason: A Workspace column alone is not isolation, and enforcing queries before data backfill would break upgraded installations.
+
+### Login capability discovery is instance-scoped, not account-scoped
+
+- Decision: The unauthenticated login bootstrap endpoint reports only which login mechanisms the instance supports. It does not inspect the first Account or expose whether that Account has a password. Both password and Space OAuth entry points are available on a multi-user instance; the submitted identity determines which mechanism is valid.
+- Reason: This avoids projecting the original owner's authentication type onto invited users and removes a public Account-state disclosure.
+
+### Space OAuth identity does not choose a SaaS Workspace
+
+- Decision: Space OAuth tokens remain Account credentials. In OSS singleton mode, an OAuth refresh may update the singleton Workspace's Space provider only when that Account's role can manage provider secrets. In SaaS multi-Workspace mode an OAuth callback without an authenticated Workspace selector never guesses which Workspace to mutate; explicit Workspace configuration or the closed control plane owns that linkage.
+- Reason: An Account may belong to several Workspaces, and authentication must not silently mutate a shared tenant secret.
+
+### SaaS execution state is a validated Core projection
+
+- Decision: Core can resolve both local and `cloud_projection` Workspaces, but only from an explicit Workspace UUID and an active, unfenced `WorkspaceExecutionState` for the current instance and matching source. OSS-only bootstrap paths additionally require `source=local`.
+- Reason: The closed control plane owns execution ownership and generation decisions, while Core remains the enforcement point for instance binding, generation, and write fences.
+
+### API-key secrets are one-time and Workspace-bound
+
+- Decision: Database API keys persist only a globally unique SHA-256 hash, an opaque UUID, one Workspace UUID, explicit fixed-permission scopes, status, expiry, creator, and last-used time. The raw secret is returned once. Authentication derives Workspace and generation from the key record and ignores Workspace selectors. Legacy plaintext keys are hashed during migration and receive a compatibility `*` scope. The plaintext config key works only for the OSS singleton Workspace and is disabled in multi-Workspace mode.
+- Reason: A bearer key is an identity and routing credential, not merely a password layered on top of caller-controlled tenant selection.
+
+### MCP tools inherit the authenticated API-key context
+
+- Decision: The MCP ASGI mount authenticates the API key once, binds an immutable per-request `RequestContext`, and every tool checks a fixed permission before calling tenant services with that same context.
+- Reason: Authenticating the transport without propagating Workspace identity into tool calls would leave the direct service path globally scoped.
+
+### Unreleased SDK protocol is pinned reproducibly without publishing
+
+- Decision: The SDK tenancy protocol is versioned as 0.4.18. This task does not create a GitHub release or publish PyPI because the user authorized pushing code, not a package release. After the SDK feature branch is final, LangBot's feature branch temporarily pins the exact pushed SDK Git commit. Before merging to master, the release gate is to publish `langbot-plugin==0.4.18` and replace the Git pin with the registry pin.
+- Reason: The current registry release does not contain the complete tenant action context and shared Runtime hardening. An exact Git commit is reproducible and keeps the feature branch testable without expanding release authority.
+
+### Cloud directory writes stay outside Core
+
+- Decision: The open-source Core startup always installs `SingleWorkspacePolicy`, creates or repairs one local Workspace, and permits local membership/invitation workflows. Changing mutable configuration such as `system.edition` cannot activate multi-Workspace routing. The future closed Cloud bootstrap will install `CloudWorkspacePolicy` only after verifying a signed `InstanceManifest`; that policy requires an explicit projected Workspace selector, does not create Workspaces, and rejects invitation or membership mutations with `control_plane_required`; member reads use the versioned local projection.
+- Ownership split: The closed Control Plane owns the global Account/Workspace/Membership/Invitation directory, execution ownership and generation, entitlements, subscription state, usage aggregation, and billing decisions. Core owns request authorization, resource scoping, execution-generation validation, and fail-closed enforcement. Provisioning and invoice computation do not belong in open-source Core.
+- Reason: The closed control plane is authoritative for SaaS Account, Workspace, Membership, and Invitation state. Allowing Core to mutate the same directory would create split-brain ownership and would make an ownerless compatibility Workspace a dangerous fallback.
+- Release gate: Multi-Workspace activation is deliberately unavailable in the open-source bootstrap. Production Cloud v2 must implement the signed `InstanceManifest` verifier and closed bootstrap described in the architecture document before it can inject `CloudWorkspacePolicy`; `edition=cloud`, an environment variable, or any unsigned local configuration is never a valid activation credential.
+
+### Workspace bootstrap is reactive and ordered before browser resource calls
+
+- Decision: The web application blocks Workspace-owned pages until Account and current Workspace bootstrap completes. A `useSyncExternalStore` Workspace store publishes permission changes to React consumers; direct mutation-only routes and controls are hidden or disabled when the fixed role lacks the required permission.
+- Reason: Mutating a module-level variable after the initial React render did not reliably re-render permission controls, and mounting resource pages before the selector was established could issue tenant requests without `X-Workspace-Id`.
+
+### JWTs are bound to one LangBot instance
+
+- Decision: New Core JWTs require `iss=langbot-core`, an audience derived from the immutable instance UUID, and an expiry. Legacy community tokens are accepted only when they have the historical issuer, carry no audience, and the active policy is the OSS singleton policy.
+- Reason: A token issued by one instance must not authenticate against another instance that happens to share a secret, and a compatibility decoder must not become an alternate path around the SaaS trust boundary.
+
+### Runtime control transports authenticate before protocol dispatch
+
+- Decision: External Plugin Runtime and Box WebSocket control channels require independent strong shared secrets in handshake headers. Locally managed child processes receive ephemeral secrets through their environment; secrets are not placed in URLs, process arguments, request payloads, or logs. Box additionally binds the first authenticated control channel to one trusted instance. Plugin Runtime debug and control credentials remain separate.
+- Reason: Workspace context inside an RPC payload is not trustworthy until the transport peer itself is authenticated. Separating control and debug credentials also limits accidental privilege reuse.
+- Deployment consequence: Docker Compose and Kubernetes wire one shared secret to each host/runtime pair. An empty external-runtime secret fails startup instead of silently exposing an unauthenticated socket.
+
+### Dashboard WebSocket sessions are tenant runtime objects
+
+- Decision: A dashboard WebSocket sends an authentication frame immediately after upgrade. The server validates Account, Membership, permission, Pipeline ownership, instance, Workspace, and execution generation before registering the connection. Connection indexes, sessions, broadcasts, attachments, and resets include the complete execution scope.
+- Reason: Browser WebSocket APIs cannot attach the normal authorization headers, and a process-global `pipeline_uuid` or `session_type` index can collide across Workspaces.
+
+### Read permissions never imply secret permissions
+
+- Decision: `resource.view` responses recursively redact Bot, Plugin, MCP, and provider credentials. Provider secrets require `provider_secret.manage`; Bot and Plugin configuration writes require `resource.manage`. Masked Plugin values can be round-tripped by a manager without overwriting the stored secret. Plugin Runtime debug credentials require `resource.manage`, not the operator-only `runtime.operate` permission.
+- Reason: A multi-user Workspace needs useful viewer access without turning every visible configuration endpoint into credential export. Plugin debug attachment can register executable code and is therefore a resource-management operation.
+
+### Temporary credential exchanges are bound to their initiator
+
+- Decision: Lark, Weixin, DingTalk, WeComBot, and QQOfficial one-click registration sessions require `resource.manage` and store the initiating instance, Workspace, execution generation, and principal. Status and cancellation by any other scope return the same 404 as an unknown session.
+- Reason: Random session IDs reduce guessing probability but do not authorize access to credentials returned by a completed exchange.
+
+### Uploaded images and documents use different storage capabilities
+
+- Decision: Browser images use the scoped `upload_image` owner type and may be resolved only through the opaque public-image route. RAG documents use `upload_document` and can be read, sized, or deleted only by an exact instance, Workspace, generation, and owner-type match. Legacy `upload` objects are cleanup-only.
+- Reason: Treating every upload as a public image made a leaked document key sufficient to bypass authenticated RAG access.
+
+## 2026-07-19
+
+### Space OAuth state is server-issued and single-use
+
+- Decision: Core issues an opaque, cryptographically random OAuth state for each Space login or Account-binding attempt, stores only its digest, and consumes it exactly once within a short expiry. Login and binding states are different capabilities; a binding state is additionally bound to the authenticated Account. Caller-supplied state, including a LangBot JWT, is rejected.
+- Redirect boundary: Callback redirects are accepted only for the known callback path and an origin declared by the server-side `api.webui_url` or `api.webhook_prefix`. Request `Host` and `Origin` headers never expand this allowlist.
+- Current deployment: The OSS state store is bounded and process-local, so a Core restart safely invalidates outstanding attempts. A horizontally scaled SaaS deployment must move this exchange to an atomic, shared Control Plane store before enabling the closed Cloud bootstrap.
+- Reason: OAuth state is a narrow, one-time CSRF and flow-binding capability. Reusing a bearer JWT or trusting caller-controlled Host or Origin data would turn an authorization redirect into an Account-token theft or open-redirect primitive.
+
+### OAuth provider subjects, not email addresses, bind Accounts
+
+- Decision: A known Space `account_uuid` may refresh the credentials of its already-bound local Account. An unknown provider subject that presents an email belonging to an existing Account is rejected, even when the normalized emails match. The Account owner must authenticate locally and use the one-time, account-bound binding flow.
+- Reason: Email is contact and display data, not a stable federated identity key. Email-only auto-linking would let provider verification drift or identity reassignment become a local Account takeover.
+
+### Workspace discovery is an account-only bootstrap capability
+
+- Decision: `ACCOUNT_TOKEN` validates the active Account JWT but intentionally cannot resolve a Workspace, receive `RequestContext`, or declare Workspace permissions. Its narrow bootstrap endpoint returns only active Workspace memberships belonging to that Account and never chooses the first Workspace when several exist. All tenant resource routes still require the explicit selector in multi-Workspace mode.
+- Reason: Requiring a Workspace header to discover the Account's Workspaces creates an authentication deadlock; allowing the bootstrap route to perform tenant actions would create an authorization bypass. Separating the two capabilities resolves the cycle without weakening tenant routes.
+
+### SQLite tenancy migrations have a verified recovery boundary
+
+- Decision: Before each destructive tenant-schema boundary, a file-backed SQLite installation creates an online-consistent backup with its source and target revisions, runs `PRAGMA quick_check`, writes a durable manifest, and fsyncs restrictive-permission files and directories. A failed boundary disposes the engine, removes stale journal sidecars, atomically restores the verified source revision, and verifies the restored database before startup continues.
+- Compatibility: In-memory SQLite cannot provide this recovery guarantee and is rejected for destructive production migration boundaries; it remains usable in tests that create the final schema directly.
+- Reason: SQLite batch table rebuilds can leave an installation between schemas if a process or migration fails. A verified pre-boundary image makes retry behavior recoverable instead of merely idempotent in the happy path.
+
+### Execution generation is an execution revocation capability
+
+- Decision: RuntimeBot, RuntimePipeline, background tasks, object storage, Plugin Runtime, MCP, RAG, and Box operations carry the complete instance, Workspace, and execution-generation scope. The current schema and wire compatibility field remains `placement_generation` until a coordinated rename. They revalidate the active execution binding before accessing a provider or transport; long-running calls validate again before accepting results. A stale generation is fenced before it can read, write, or reuse a cached object.
+- Plugin boundary: Each locally launched plugin receives a short-lived, one-use registration capability bound to the expected manifest identity and execution scope. The production child environment does not inherit the reusable debug credential, and Host APIs derive scope from the trusted connection and action context.
+- Box boundary: Persistent skill content remains Workspace-scoped, while session/process state and relay requests also include execution generation. A generation change retires matching live sessions and closes a stale relay before further stdin, stdout, or file operations.
+- Transaction boundary: Request admission and runtime side effects are fenced in this branch, but ordinary tenant database mutations do not yet hold a generation-aware lock through commit. The closed Cloud bootstrap must remain disabled until Core provides the shared-write/exclusive-cutover transaction primitive and a generation-stamped outbox (or an equivalent atomic publish fence). The OSS singleton policy has a fixed local generation and cannot trigger an execution-owner cutover.
+- Durable-object boundary: Current opaque storage keys include generation and therefore fail closed after a generation change. That is safe for OSS's fixed generation, but a Cloud cutover must not strand durable KB files, images, or plugin references. Cloud v2 must publish stable final object identities from generation-scoped staging, or perform an atomic object-and-reference migration before activating the new generation.
+- Reason: Workspace UUID prevents cross-tenant collisions, but it cannot revoke work after execution ownership changes or is fenced. Execution generation is the monotonic revocation value that makes old runtimes unusable; it does not express membership in a product-level deployment entity.
+
+### Long-lived WebSockets continuously revalidate authority
+
+- Decision: Dashboard WebSockets re-authenticate the Account, Membership, permission, resource ownership, instance, Workspace, and execution generation for every inbound message, not only during the initial frame. A changed role, removed Membership, or fenced execution binding takes effect without waiting for reconnect.
+- Public embed boundary: The embed connection re-resolves its Bot before every message and rejects a Bot that was disabled, deleted, moved, or rebound. The public connection may identify a Bot, but it cannot make the initial Bot object an indefinite authorization capability.
+- Reason: Authorization and resource state can change while a socket remains open. Connection-time validation alone leaves a revocation gap.
+
+### Legacy vector migration is an OSS-local compatibility path
+
+- Decision: Status, backup, execute, dismiss, and background entry points for legacy global vector collections require an active local Workspace binding under `SingleWorkspacePolicy`. A `cloud_projection` Workspace cannot observe or migrate the old global collection, even when it carries a legacy marker.
+- Reason: The legacy collection predates tenant ownership. Treating it as a SaaS fallback would expose one installation's historical vectors to an arbitrary projected Workspace.
+
+### External errors and persisted URLs are redacted centrally
+
+- Decision: Unhandled HTTP and webhook failures return a stable `internal_error` response and request ID, and expose that ID in `X-Request-Id`; the detailed exception is retained only in server logs correlated by the same ID. Explicit domain and validation errors keep their documented status and code.
+- Secret boundary: Shared sanitization removes URL user information and masks sensitive query parameters before provider or MCP configuration is serialized, logged, or shown to a reader. Masked placeholders can be round-tripped by an authorized manager without replacing the stored secret.
+- Reason: Tenant isolation is incomplete if framework exceptions, connection URLs, or configuration reads can export credentials across otherwise authorized interfaces.
+
+### Box is one shared control plane with one admitted sandbox per Workspace
+
+- Decision: The logical instance has one shared Box Runtime control plane, implemented by one Runtime replica in M0. A closed entitlement adapter projects generic `managed_sandbox` capability and `managed_sandbox_sessions` limit; Core and Runtime never branch on a plan name. An eligible Workspace receives at most one persistent logical `global` session, while each ordinary command remains a one-shot nsjail process. Managed processes and network are disabled in the first Cloud release.
+- Storage boundary: Core and Runtime prove they see the same durable volume with an authenticated random-marker challenge. Attachments use opaque query UUID directories and link-free dirfd operations. Skill packages remain in the Runtime-owned Workspace store and enter a sandbox only as a read-only logical-name mount; Python environments and caches live in the tenant's writable Workspace.
+- Resource boundary: Cloud readiness requires cgroup v2 plus hard byte and inode limits for Workspace files, Skill storage, root, tmp, and home. The existing directory scan is only a compatibility soft check. Plain nsjail reports these storage capabilities as unavailable, so Cloud Box intentionally cannot start until the greenfield deployment supplies and verifies a real quota provider.
+- Archive boundary: Skill ZIP processing is bounded by compressed input, entry count, per-entry size, total uncompressed size, and compression ratio, and rejects links, non-regular entries, duplicates, and path escape before streaming extraction.
+- Reason: A shared supervisor removes per-Workspace services and idle control-plane cost, but storage and process admission must still fail closed at the untrusted execution boundary.
+
+### Cloud business data and vectors share one PostgreSQL schema
+
+- Decision: SaaS uses one PostgreSQL business database and shared schema. Every tenant row has an explicit Workspace key; application scope is the first boundary and precise `ENABLE` plus `FORCE ROW LEVEL SECURITY` policies are the second. The Cloud runtime role must be non-owner and have neither superuser nor `BYPASSRLS`.
+- Vector boundary: pgvector is the Cloud default in the same business database. Vectors use `(workspace_uuid, knowledge_base_uuid, vector_id)` identity, an untyped vector column with explicit checked dimension, and release-created partial expression indexes for the enabled dimensions. Cloud never falls back to Chroma or performs vector DDL at runtime.
+- Transaction boundary: A tenant UoW binds `SET LOCAL` and SQL to one transaction. Long-running pipeline and streaming MCP execution carry a trusted transaction-free tenant scope; each database helper opens a short scoped transaction, avoiding a held pool connection during LLM or network waits. Detached tasks start only after commit and create their own short UoW; rollback cancels them.
+- Schema boundary: The first release has exactly one business schema, `public`. Both migrator and runtime sessions must report `current_schema() = 'public'` and `current_schemas(false) = ARRAY['public']`; the runtime role and business database must not carry a `search_path` override. Runtime startup validates this before using the prepared schema and reruns the complete catalog and privilege validation on every process start; it never runs DDL.
+- Session boundary: Both Cloud modes require `session_replication_role = 'origin'`, `row_security = 'on'`, and `lo_compat_privileges = 'off'`. Every persistent setting applicable to the runtime role or current business database in `pg_db_role_setting` is rejected, even if its present value appears safe; tenant context remains transaction-local application state rather than a persistent role/database override.
+- Grant boundary: The migrator grants the runtime role direct `CONNECT` on the dedicated business database and `USAGE` on `public`; exact `SELECT, INSERT, UPDATE, DELETE` on every allowlisted business table; `SELECT` only on `alembic_version`; and exact `USAGE, SELECT` on business-owned sequences. It grants neither `CREATE`, `TRUNCATE`, `REFERENCES`, `TRIGGER`, sequence `UPDATE`, nor any privilege with `WITH GRANT OPTION`, and grants nothing on other relations or schemas.
+- Role boundary: The runtime identity is a `LOGIN` role with no superuser, `BYPASSRLS`, `CREATEDB`, `CREATEROLE`, or replication attribute; no role membership in any direction, including acting as grantor; no ownership of the business database, `public` schema, relations, sequences, routines, or extensions; no column ACLs; and no use, create, or ownership in another non-system schema. Neither the runtime role nor `PUBLIC` may have an explicit routine or parameter ACL, and the runtime role may not effectively execute any `SECURITY DEFINER` routine, including an extension-owned one. PostgreSQL's default `TEMP` privilege inherited from `PUBLIC` is an explicit first-release compatibility decision for this dedicated business database, not a direct runtime-role grant.
+- Catalog boundary: The business database must contain `vector` and may contain no extension other than `plpgsql` and `vector`; the runtime role owns neither. It contains no foreign data wrapper, foreign server, or user mapping. These checks remove catalog-level escape paths without forbidding the ordinary implicit execution of non-`SECURITY DEFINER` built-in routines.
+- Migration boundary: In the first release, the migrator and runtime URLs must name the same normalized PostgreSQL host, port, and database while using different roles. The migrator owns the application schema, establishes the exact allowlist above, and validates both required access and every prohibited escalation path before releasing the advisory lock. An exact Alembic head, RLS checks, and pgvector table/index/constraint validation remain mandatory; concurrent Jobs fail explicitly and are retried by orchestration.
+- Deployment boundary: PostgreSQL roles are cluster-wide, while the in-database audit proves only the target business database contract. SaaS production must therefore use a dedicated PostgreSQL cluster or endpoint that exposes only this business database to the runtime credential, or enforce and test an HBA/proxy policy proving that the credential cannot connect to any other database. This external connectivity proof is still an incomplete SaaS activation gate.
+- Endpoint evolution: A future deployment may use a direct endpoint for migrations and a pooler endpoint for runtime traffic. That topology may relax literal host/port equality only after both endpoints are proven to reach the same database through a database-internal, migrator-owned cluster identity that the runtime role can read but cannot create, alter, or spoof.
+- Legacy pgvector boundary: Revision 0013 records the exact `ENABLE` and `FORCE ROW LEVEL SECURITY` state of each RLS-protected source table, temporarily suspends those source policies as their table owner inside the migration transaction, and restores every table to its recorded state in `finally`. The migrator does not require superuser or `BYPASSRLS` for this data move.
+- Activation gate: The shared schema, pgvector adapter, and database-local runtime audit are implemented, but the external cluster/endpoint or HBA/proxy connectivity proof remains deployment work. Ordinary business writes also do not yet hold a generation-aware fence through commit; a generation-stamped outbox (or equivalent atomic publish fence) and stable durable-object references across generation cutover remain required before SaaS activation.
+- Reason: Sharing one database and pool keeps marginal Workspace cost low, while transaction-local context and RLS prevent that shared storage from becoming shared authority.
+
+### stdio MCP has an independent deployment gate
+
+- Decision: `mcp.stdio.enabled` is independent of Box availability and entitlement. OSS defaults it on for compatibility; Cloud requires it off at bootstrap and enforces the same gate on create, update, test, startup loading, and final runtime execution.
+- Reason: Treating Box availability as stdio permission would silently create another persistent `mcp-shared` sandbox for each Workspace and bypass the one-sandbox subscription and cost boundary.
+
+## 2026-07-20
+
+### The tenant UoW owns its task, root transaction, bind, and scope
+
+- Decision: A tenant UoW creates one task-owned `TenantScopedAsyncSession` and one root transaction. Public commit, rollback, close, connection, bind, nested-transaction, synchronous-Session, live-streaming, raw SQL, public execution options, and public `set_config` paths fail closed and mark the transaction rollback-only. ORM objects cannot expose a usable synchronous Session, captured methods cannot run in child tasks, an explicit foreign bind is rejected, and a captured Session is permanently retired when its UoW exits rather than being reset for reuse. Tenant scope is installed only through a private UoW capability; pgvector index-plan `SET LOCAL`/`EXPLAIN` diagnostics use a test/operator connection rather than the business Session API.
+- SQL boundary: Public UoW calls accept only structured SQLAlchemy query and DML trees. `TextClause`, literal SQL columns, textual labels, prefixes/suffixes/hints, statement execution options, `VALUES` roots, `INSERT FROM SELECT`, `EXTRACT`, literal-execute parameters, unknown/custom AST nodes, forced-unquoted identifiers, named `ON CONFLICT` constraints, unknown dialect post-values clauses, and untrusted casts/types fail closed. PostgreSQL/SQLite `ON CONFLICT DO UPDATE` and batch-insert containers are traversed explicitly because SQLAlchemy's standard visitor omits their executable values. Function classes are exactly allowlisted as `count`, `coalesce`, `sum`, `now`, `length`, and `nullif`; the only custom operator/cast admitted is the validated pgvector cosine operator and `Vector` cast.
+- Legacy migration boundary: The local-only RAG backup restore uses explicit table and column objects, never raw SQL, but deliberately leaves legacy values untyped. This preserves SQLite's string-valued `DATETIME` rows and both the historical PostgreSQL `TEXT` and fresh-schema `JSON` settings columns while keeping every value bound rather than interpolated.
+- ORM boundary: SQLAlchemy `SessionEvents` are unsupported on a tenant-scoped Session. If a listener is registered before or during a UoW, the operation fails before the callback executes and cleanup proceeds against an empty dispatch surface. Public `get`, `get_one`, `refresh`, and `merge` reject caller-supplied loader, bind, lock, shard, and execution options. Flush, implicit autoflush, and commit reject a SQL expression assigned to a mapped attribute before it can reach the compiler. Tenant code uses the async Session directly; relationships use eager loading or explicit `await session.refresh(entity, [attribute])`. LangBot's persistence base does not expose `AsyncAttrs.awaitable_attrs` as a supported tenant API.
+- Compiler trust boundary: This guard prevents accidental scope/transaction escape by trusted LangBot Core code; it is not an in-process Python sandbox. Registered SQLAlchemy compilers and mapped schema metadata are trusted boot-time code. The fail-closed traversal of dialect containers necessarily covers SQLAlchemy private fields, so dependency upgrades require the regression suite and remain pinned until verified. Untrusted plugins cannot import or call this Session because they remain isolated in Plugin Runtime child processes.
+- Result boundary: A caught database or boundary failure rolls back the root transaction and cancels after-commit work. Buffered results contain only already-authorized rows and no live connection; live database results cannot escape the UoW operation.
+- Reason: `SET LOCAL` plus RLS protects a tenant only while every statement stays on the same owned connection and callers cannot end the transaction, replace the GUC, recover the synchronous proxy, or route a statement through another bind.
+
+### Parallel request work re-enters tenant scope explicitly
+
+- Decision: Child coroutines created by request-level `asyncio.gather` open their own explicit transaction-free tenant scope before calling persistence-backed Plugin, MCP, or Skill operations. They never inherit the parent's active database Session.
+- Reason: Python copies ContextVars into child tasks, but SQLAlchemy Sessions are not task-safe and task identity is part of the tenant UoW boundary.
+
+### Ordinary monitoring is readable; audit and export remain privileged
+
+- Decision: Workspace monitoring dashboards, Bot logs, sessions, messages, calls, errors, and feedback require `resource.view`. Monitoring export requires `data.export`; system/runtime audit logs keep `audit.view`. Frontend tabs and controls use the same split.
+- Reason: A Viewer needs useful read-only product observability, while bulk data extraction and privileged runtime/system logs are separate capabilities.
+
+### Invitation failures survive login without contradictory success state
+
+- Decision: Invitation terminal codes map to stable browser states. A login-mediated email mismatch preserves the fragment-captured secret only in session storage, returns to the acceptance page with the stable error code, and suppresses the generic login-success toast. A transient acceptance failure retains the authenticated session and offers the same one-time invitation for retry. Invalid Bearer tokens on acceptance map to the normal authentication error instead of an internal failure.
+- Reason: Invitation acceptance crosses signed-out and authenticated states; losing or masking the domain error makes recovery ambiguous and can present contradictory UI feedback.
+
+### Shared Plugin Runtime starts only from verified desired state
+
+- Decision: SDK shared mode waits for immutable runtime configuration before inspecting plugin state, never scans or launches legacy `data/plugins`, and rejects legacy install/restart/delete/upgrade control actions. Worker RPC files use installation-private directories with aggregate size enforcement, and resident nsjail workers explicitly disable the default 600-second wall-time limit.
+- Remaining gate: completion-callback recovery, jittered per-installation backoff, globally bounded restart launch admission, Runtime-level circuit breaking, one half-open probe, and worker ready timeout are implemented. Production cross-tenant fault injection must still prove the restart-storm controls. Hard installation disk quota and production egress policy remain Cloud activation requirements; Linux nsjail/cgroup CPU, memory-plus-swap, PID, namespace, and cgroup-reaping behavior have real-container evidence.
+- Reason: A shared supervisor reduces per-Workspace services only if legacy global paths, writable transfer state, and lifecycle defaults cannot bypass installation isolation.
+
+## 2026-07-24
+
+### Cloud v2 remains a modular monolith with one closed adapter
+
+- Decision: Workspace directory, plans, subscriptions, payment fulfillment, entitlements, usage, and signed runtime feeds are implemented as modules in the existing Space service and PostgreSQL database. The only separately packaged closed component is the thin Core bootstrap/control-plane adapter; it verifies signed data and owns no business state.
+- Compatibility: The legacy Pod card and fulfillment path remain visible only to accounts with old subscriptions. New purchases create Workspace subscriptions and never provision a per-user LangBot Pod.
+- Reason: A separate tenancy or billing service would add deployment, queue, network, and consistency cost without providing an isolation boundary. The signed adapter keeps SaaS logic closed while Core retains the ORM, RLS, authorization, and runtime enforcement boundary.
+
+### Registration creates data, not infrastructure
+
+- Decision: Account registration and personal Workspace creation share one transaction and include an active owner membership, Free subscription, entitlement snapshot, and outbox notifications. Repair is idempotent for older accounts. Empty Workspace creation starts no Plugin worker, Box sandbox, database, queue, bucket, or tenant service.
+- Activation: This automatic Cloud v2 transaction is enabled only when Space has an explicit valid `CLOUD_V2_INSTANCE_UUID`. A legacy marketplace-only Space deployment with no Cloud v2 instance configured keeps its existing Account registration path; Cloud v2 internal endpoints still fail closed instead of inventing an instance identity.
+- Billing boundary: Core and runtimes consume only generic capability and numeric-limit entitlements. They never branch on `free` or `pro`; plan names, prices, payment providers, and fulfillment stay in Space.
+- Reason: This keeps the marginal cost of a new user close to a few PostgreSQL rows while preserving an immediate, usable Workspace.
+
+### Directory bootstrap is full; steady-state projection is per Workspace
+
+- Decision: Space generates the initial signed directory snapshot and its outbox high-water mark in one read-only PostgreSQL `REPEATABLE READ` transaction. After bootstrap, each event page names affected Workspaces and Core fetches one signed `directory.delta` for that set. Missing requested Workspaces are tombstones; unrelated Workspaces are untouched.
+- Cursor boundary: A delta carries no event cursor. Each signed event page carries the transaction-consistent current high-water mark, while Core advances only to the final event it actually consumed, even when the authoritative delta already contains a later revision. Event payload Workspace and revision fields must exactly match the signed event envelope; readiness is renewed only after the replica-local cursor reaches the signed high-water and the shared projection is not ahead.
+- Replica boundary: Every Core replica owns a process-local consumer cursor because its verified entitlement cache is process-local. Replicas share the PostgreSQL projection high-water mark and inbox. The state separately records which cursor range was atomically subsumed by a full snapshot, so a lagging replica can add missing receipts within that coverage and refresh its local caches without repeating tenant mutations; a missing receipt beyond snapshot coverage fails closed.
+- Reason: A shared consumer cursor would let one replica starve another replica's local cache, while a full snapshot on every change would make steady-state cost grow with every registered Workspace.
+
+### Cloud OAuth can authenticate only an existing projected Account
+
+- Decision: In multi-Workspace Cloud mode, Space OAuth may refresh tokens only for an active `cloud_projection` Account whose Space subject UUID and normalized email exactly match. It cannot create an Account, relink by email, mutate directory identity, or choose a Workspace.
+- Redirect boundary: When Cloud v2 configures its Core public URL, Space issues authorization codes only to that exact callback origin or the explicitly retained legacy managed-Pod domain. Community installations remain dynamic OAuth clients because they cannot pre-register with the public Space service: the consent screen displays their hostname, and only HTTPS or loopback HTTP with the fixed callback path is accepted. Remote HTTP, userinfo, fragments, arbitrary callback paths, and unrecognized query parameters always fail closed.
+- Reason: Space is the SaaS identity and directory authority, but Core remains the authentication enforcement point. Projected-only matching avoids split-brain identities; redirect allowlisting prevents bearer authorization-code exfiltration.
+
+## 2026-07-29
+
+### Directory capacity is one instance-level admission contract
+
+- Decision: Space and Core share an explicitly configured operational ceiling for active
+  Workspace and snapshot membership cardinality. Space serializes new personal-Workspace
+  creation across replicas with a PostgreSQL transaction advisory lock and rejects the
+  registration transaction before the limit is crossed. Core independently counts the
+  projected active Workspace set while holding the per-instance directory projection row
+  lock. Exceeding any limit rolls back the complete projection and does not advance its
+  cursor; neither side truncates authoritative data.
+- Snapshot boundary: A full snapshot is current desired state and contains active
+  Workspaces only. Archived Workspace revisions are retained as bounded, targeted deltas
+  so Core can validate and apply monotonic tombstones without every bootstrap carrying
+  unbounded history.
+- Memory boundary: Space bounds database result cardinality before signing. The closed
+  adapter bounds decompressed response bytes before JSON/JWS parsing and validates
+  Workspace/membership cardinality before entitlement-cache fan-out. Core schema models
+  have absolute list ceilings, validate duplicates in one pass, and bulk-read Accounts in
+  bounded chunks rather than issuing two serial queries per Account. Manifest and
+  entitlement responses have smaller endpoint-specific byte ceilings; entitlement refresh
+  validates and releases batches of at most 16 raw responses instead of retaining the
+  entire directory fan-out.
+- Operations boundary: Core health exposes aggregate active/max directory cardinality and
+  PostgreSQL pool occupancy/timeouts. The default active Workspace ceiling is 1,000, with
+  a hard code ceiling of 5,000; this is a safety stop, not a production capacity claim.
+  Space and Core configuration must match, and the approved production value comes from
+  the real V-08 capacity curve plus the V-09 24-hour soak.
+- Reason: One LangBot instance should admit new tenants at the cheapest data-only boundary,
+  while still preventing legitimate registration growth or a malformed control-plane
+  response from causing an unbounded startup allocation, database connection storm, or
+  CPU spike.
@@ -0,0 +1,525 @@
+# Cloud v2 多租户架构决策与待决策项
+
+状态：`DECIDED — Core isolation kernel implemented; SaaS activation gates remain`
+创建日期：2026-07-19
+最近更新：2026-07-24
+
+本文记录 Cloud v2 多租户架构中已经确认的首期决策、明确淘汰的方案和仍需在后续阶段决定的扩展项。
+本文同时记录实现状态。“实现完成”仅指开源 Core/SDK 的隔离内核和 fail-closed 门禁，
+不表示闭源 Control Plane、计费或 Cloud v2 部署已经可上线。最终实现选择同步记录在
+[implementation-decisions.md](./implementation-decisions.md)，剩余发布门禁记录在实施清单和验证报告。
+
+## 0. 已确认的 SaaS 拓扑前提
+
+1. SaaS 只有一个逻辑 LangBot 实例，全部 Workspace 都是该实例内的租户。
+2. 产品和领域模型中不引入 Cell 内多个 CloudInstance、Workspace Placement 或 Workspace 到 CloudInstance 的路由。
+3. 当前不实现分布式，但同一个逻辑实例未来可以运行多个 Core、Plugin Runtime 和 Box Runtime replica，
+   也可以增加 PostgreSQL shard；这些只是内部实现，不成为新的租户或产品实体。
+4. 所有副本共享稳定的 `instance_uuid`；`replica_id`、`worker_id` 和进程地址是短期运行身份，不能写进业务资源的永久主键。
+5. `workspace_uuid` 始终是数据、任务和运行时的租户键，也是未来内部路由与分片的候选键。
+6. generation/epoch 的语义是执行所有权、故障转移和任务撤销，不代表 Workspace 在多个 CloudInstance 之间 Placement。
+7. 注册 Account 时自动创建 Workspace，但只新增目录记录和业务行，不创建租户专属部署、数据库、队列或 Runtime。
+8. OSS 仍是单租户 LangBot 实例，但允许该 Workspace 内存在多个用户；只有 SaaS 开启多 Workspace 租户模式。
+
+“单个 LangBot 实例”表示单个逻辑服务和安全域，不等于永远只有一个 OS 进程或一个 Kubernetes replica。
+当前代码字段 `placement_generation` 在完成架构迁移前继续兼容，目标语义和候选命名是 `execution_generation`。
+
+### 0.1 本轮确认的首期决策
+
+| 编号  | 结论                                                                                                                                          | 首期状态                                                             |
+| ----- | --------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------- |
+| D-001 | 一个共享 Plugin Runtime 控制面；每个运行中的 plugin installation 独占一个 nsjail 子进程；只有 digest 相同且已验证的代码 artifact 可以只读共享 | `IMPLEMENTED — egress/disk-quota pending; restart-storm fault injection pending` |
+| D-002 | 一个共享 Box Runtime；Cloud 固定使用 nsjail；符合套餐的 Workspace 最多一个持久 `global` 逻辑 sandbox，普通执行按需启动 nsjail 进程            | `IMPLEMENTED FAIL-CLOSED — hard filesystem quota provider pending`   |
+| D-003 | SaaS 业务数据使用 PostgreSQL shared schema、应用层作用域和 RLS 双重隔离；pgvector 使用同一 PostgreSQL，作为 SaaS 默认向量后端                 | `PARTIALLY IMPLEMENTED — transaction/outbox/deployment gates remain` |
+| D-004 | stdio MCP 与 Box availability 解耦；Cloud v2 首期强制关闭 stdio MCP，避免为每个 Workspace 创建额外的 `mcp-shared` persistent sandbox          | `IMPLEMENTED`                                                        |
+| D-005 | 目录启动使用事务一致的全量快照，运行时按事件涉及的 Workspace 拉取增量；每个 Core replica 独立消费事件，共享 PostgreSQL 投影和 inbox           | `IMPLEMENTED — production fault injection pending`                   |
+
+Workspace 的具体创建、释放、数据导出和单 Workspace 恢复机制不在本轮决定；本文只保证这些后续能力不会改变稳定的
+`workspace_uuid`，也不会要求重建租户专属部署。
+
+## 1. 本轮重构的最高目标
+
+> 共享可信控制面和基础设施池，隔离不可信执行单元；减少独立部署、扩缩容和运维组件，使新增 Account 或 Workspace 的静态成本接近零。
+
+这里的“减少组件”指减少独立 Deployment、Service、数据库、消息系统和租户专属常驻控制面，
+不是通过合并安全边界来减少必要的隔离进程。
+
+统一评估原则：
+
+1. 注册 Account、自动创建空 Workspace 时，不启动 Plugin worker 或 Box sandbox。
+2. 启用插件后，每个 installation 的常驻成本来自其独立安全边界；首次使用托管 sandbox 后，符合套餐的 Workspace 才承担一个持久逻辑 session 的成本。
+3. 可信 supervisor、artifact cache、数据库连接池和 Runtime 容量可以多租户共享。
+4. 一个不可信插件进程不能服务多个 installation；一个 sandbox/session 不能服务多个 Workspace。
+5. 默认使用共享 Runtime；dedicated 只作为未来高隔离、大客户或合规资源等级，不建立第二套外部协议。
+6. 没有明确容量证据前，不新增 Kafka、Redis、Runtime 专用数据库、Box 专用数据库或租户级调度服务。
+7. 多租户隔离必须覆盖身份、路由、存储、缓存、日志、配额、撤销和故障恢复，不能只给请求增加 `workspace_uuid`。
+
+## 2. 首期部署形态与未来演进
+
+```mermaid
+flowchart LR
+    Traffic["SaaS traffic"] --> Core["One logical LangBot instance<br/>1 Core replica in MVP"]
+    Core --> PluginRuntime["Shared Plugin Runtime<br/>trusted supervisor"]
+    Core --> BoxRuntime["Shared Box Runtime<br/>nsjail backend"]
+    PluginRuntime --> PA["Workspace A / installation 1<br/>isolated nsjail process"]
+    PluginRuntime --> PB["Workspace B / installation 2<br/>isolated nsjail process"]
+    BoxRuntime --> BA["Workspace A<br/>one persistent global logical session"]
+    BoxRuntime --> BB["Workspace B<br/>one persistent global logical session"]
+    Core --> PG["Shared PostgreSQL business schema<br/>RLS + pgvector"]
+```
+
+| 档位                       | 内部部署形态                                                                                                                                                | 新 Workspace 静态成本         | 启用条件                 |
+| -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------- | ------------------------ |
+| M0. 单副本 MVP             | 一个 Core、一个共享 Plugin Runtime、一个共享 Box Runtime、一个 PostgreSQL business database；插件按启用状态运行，托管 sandbox 按首次使用与 entitlement 创建 | 只新增 Workspace 业务行       | 当前已确认目标           |
+| M1. 同逻辑实例内部横向扩展 | Core、Plugin Runtime、Box Runtime 按容量增加 replica；运行所有权由内部 lease 和 generation fence 决定；PostgreSQL 可增加共享 shard                          | 不创建 Workspace 专属部署     | 出现容量或可用性证据后   |
+| M2. Dedicated 资源档位     | 特定 workload 使用独享 worker pool、sandbox class 或 PostgreSQL shard，但沿用相同身份、协议、schema 和控制面                                                | 仅由购买 dedicated 的客户承担 | 合规、数据驻留或超大负载 |
+
+M1 是 M0 的透明扩容，M2 是相同架构下的资源等级；两者都不是新的 LangBot 实例、Cell 或 CloudInstance。
+外部 API 只认识稳定的 `instance_uuid` 和 `workspace_uuid`，不认识 replica、worker、pool 或 shard。
+
+Plugin Runtime 与 Core 在 M0 使用独立容器和 security context，Core 不能继承 Plugin Runtime 所需的
+nsjail/cgroup 权限。当前 Runtime 在进程生命周期内绑定首次认证的 `runtime_id`，因此 M0 必须把 Core 与
+Plugin Runtime 放在同一 rollout/restart unit 中协调重启；在实现受认证 takeover 或 owner lease/fencing 前，
+不能单独滚动 Core 并让它接管仍存活的 Runtime。Box Runtime 同样使用独立进程身份和安全配置，
+不与 Plugin Runtime 合并成一个高权限进程。
+
+## 3. D-001：Plugin Runtime 多租户控制面
+
+状态：`IMPLEMENTED — Cloud egress/disk-quota pending; restart-storm fault injection pending`
+
+### 3.1 已实现的基础
+
+- Plugin Runtime 控制连接只绑定稳定实例身份；一个逻辑共享控制面通过完整 installation binding 管理多个 Workspace，M0 由一个 Supervisor replica 承担。
+- 每个运行中的 installation 使用独立 nsjail worker；enabled-resident 是 desired semantics。代码只读，home/tmp/data 私有，shared profile 不读取 artifact `.env`。
+- 实例级 `PluginWorkerPolicy` 由 Core 的 `data/config.yaml` 下发，支持原生环境变量覆写；manifest 不能覆盖。
+- `installation_uuid`、`artifact_digest` 和 `runtime_revision` 已持久化并进入 desired-state、注册、Host API 和 generation/revision fence。
+- 已验证 `.lbpkg` 先进入 Workspace-scoped durable binary storage；Runtime 本地缓存丢失后可由 Core replay。
+- 相同 digest 的代码和 Runtime 准备的只读依赖环境可以共享，但 worker、运行时写入、配置和数据不合并；
+  dependency preparation 在启动 worker 前完成，失败会进入明确的 installation failed 状态。
+- Cloud shared profile 强制 Linux nsjail；`plugin.worker.require_hard_limits=true` 时 cgroup v2 delegation 不可用会启动失败。
+
+### 3.2 已确认的不变量
+
+1. 每个运行中的 plugin installation 独占一个 worker process tree，任何时刻都不能与其他 installation 共用；
+   停用或删除的 installation 可以没有进程。
+2. 插件进程只绑定一个
+   `(instance_uuid, workspace_uuid, execution_generation, installation_uuid, runtime_revision, artifact_digest)`，且运行期间不可重绑。
+3. 插件不能通过 payload、Host API 参数、环境变量或重连选择 Workspace。
+4. 插件进程的 home、tmp、可写数据、secret、进程视图和配额必须按 installation 隔离。
+5. 只有 `artifact_digest` 相同且完整性已验证的代码文件和依赖环境可以只读共享；
+   同名同版本但 digest 不同的 artifact 不能共享。配置、持久数据和运行进程不能共享。
+6. generation、installation revision 或 capability 被撤销后，旧进程必须失去 Host API 和副作用权限。
+7. Supervisor 不在自身解释器中加载第三方插件代码。
+
+### 3.3 首期执行模型
+
+- 整个 SaaS 实例共享一个可信 Plugin Runtime 逻辑控制面，M0 运行一个 Supervisor replica；新 Workspace 不创建专属 Runtime、连接、卷或进程。
+- Supervisor 的控制连接只绑定稳定 `instance_uuid` 和短期 Runtime identity，不绑定某个 Workspace。
+  每条 installation desired-state 命令都携带并验证完整的 installation binding；每个 worker action context 在注册后永久绑定该 tuple。
+- 安装并启用插件后，Supervisor 在自己的 Runtime 容器内直接启动一个 nsjail 子进程；
+  不再为每个插件创建 nested container、Pod、sidecar 或租户级 Runtime service。
+- desired semantics 要求 enabled installation 保持 resident，不做 idle eviction；停用、删除、revision/generation 变化或 entitlement 撤销时停止并按需重建。
+  Supervisor 通过 completion callback、带 jitter 的有界指数 backoff 恢复意外退出的 worker。所有 restart launch 共用实例级并发槽；
+  在配置的失败窗口达到阈值后打开 Runtime 级 circuit breaker，冷却后只允许一个 half-open probe。
+  probe 必须完成初始化并持续稳定一个窗口后才能恢复其他 installation；未在 30 秒内 ready 的子进程会被取消回收。
+  生产候选环境仍需执行跨租户系统性故障注入，证明熔断、恢复和告警符合预期。
+- 子进程使用一次性 registration capability 向 Supervisor 注册；capability 由可信 desired state 派生并绑定完整 installation tuple，
+  不是插件直接建立 Core Host connection，也不能只绑定 author/name/path。Supervisor/Core 据此注入 tenant context，
+  丢弃插件 payload 中自带的 scope 字段。
+- Supervisor 的进程表、nsjail root/tmp 和 artifact cache 都是可重建运行态；PostgreSQL 中的 installation desired state 才是权威业务状态。
+- M0 不增加 Runtime 专用数据库、Redis、Kafka、scheduler 或 artifact service；Core 重连后向 Supervisor replay desired state。
+
+### 3.4 nsjail 和文件边界
+
+首期目标目录模型：
+
+```text
+data/plugin-runtime/
+├── artifacts/sha256/<artifact_digest>/code/   # digest 校验后只读共享
+├── environments/sha256/<environment_digest>/ # 原子发布、只读共享依赖环境
+└── installations/<installation_uuid>/
+    ├── home/                                  # 私有可写
+    ├── tmp/                                   # 私有可写、可清理
+    └── data/                                  # 私有持久数据
+```
+
+- artifact 只有在内容摘要和完整性校验一致时才允许共享，不能只凭 author/name/version 复用目录；
+  cache 可接受的签名/来源、撤销和 GC 规范属于后续发布规则，不改变本轮基于已验证 digest 的只读共享边界。
+- artifact 与按环境摘要构建的共享依赖环境以只读 mount 进入 nsjail；installation 的 home/tmp/data 使用独立可写 mount。
+  环境摘要包含 artifact、requirements、Python ABI、Runtime 版本和 installer schema。依赖只能从已验证 artifact 的 PEP 508 声明构建，
+  index/trusted-host 只由实例配置控制；构建在独立 nsjail 的临时路径中完成并在成功后原子发布，失败或并发安装不能留下可见半成品。
+- 插件 cwd 可以是其私有 mount namespace 内的只读 `/plugin`，不要求为每个 installation 复制代码；
+  必须私有的是 home/tmp/data 等所有可写路径。
+- nsjail 必须启用 mount、PID、IPC、UTS 和 private `/proc` 等必要 namespace，插件不能枚举或 signal 其他插件及 Runtime 进程，
+  不能读取 Runtime 文件系统、宿主机路径、其他 installation 目录或平台 metadata endpoint。
+- 公开 SaaS 禁止从插件 artifact 自动加载 `.env`。secret 只能由可信控制面按 installation 注入，且不能进入共享 artifact/cache。
+- 插件需要外网时使用受控 egress；不得通过共享 host network 访问 Core loopback、Box Runtime、数据库或其他内部服务。
+- Cloud 部署必须提供可用的 cgroup v2 delegation 和所需 namespace 权限；如果硬 CPU/内存/PID 限制不可用，
+  Plugin Runtime readiness 必须失败，不能只记录告警后降级为普通子进程。
+
+### 3.5 统一资源上限与配置
+
+首期资源规格完全由 LangBot 实例配置决定，manifest 不能声明、放宽或覆盖资源。以下数值是建议默认值，
+最终仍由同一实例的 `data/config.yaml` 统一配置：
+
+```yaml
+plugin:
+  worker:
+    max_cpus: 1.0
+    max_memory_mb: 512
+    max_pids: 128
+    max_open_files: 256
+    max_file_size_mb: 512
+    require_hard_limits: true # Cloud; OSS defaults false
+```
+
+配置文件路径为 `data/config.yaml`，沿用现有原生环境变量覆写：
+
+- `PLUGIN__WORKER__MAX_CPUS`
+- `PLUGIN__WORKER__MAX_MEMORY_MB`
+- `PLUGIN__WORKER__MAX_PIDS`
+- `PLUGIN__WORKER__MAX_OPEN_FILES`
+- `PLUGIN__WORKER__MAX_FILE_SIZE_MB`
+- `PLUGIN__WORKER__MAX_CONCURRENT_RESTARTS`
+- `PLUGIN__WORKER__RESTART_FAILURE_THRESHOLD`
+- `PLUGIN__WORKER__RESTART_FAILURE_WINDOW_SECONDS`
+- `PLUGIN__WORKER__RESTART_CIRCUIT_OPEN_SECONDS`
+- `PLUGIN__WORKER__REQUIRE_HARD_LIMITS`
+
+Core 启动时校验配置并通过现有 `SET_RUNTIME_CONFIG` 下发不可变 `PluginWorkerPolicy`。
+Runtime 不读取另一份环境变量配置，避免两个配置源不一致。CPU、内存和 PID 使用 cgroup 硬限制，
+open files/file size 使用 rlimit。Cloud deployment profile 固定使用 nsjail，不能通过插件 manifest 或 SaaS 环境变量降级为普通进程。
+installation data 的总空间硬配额需要 filesystem project quota 或独立 quota volume，不能用目录扫描伪装成硬限制；
+该字段在选定可原子拒绝写入的存储机制前不进入首期配置。
+
+### 3.6 淘汰与暂缓方案
+
+| 状态       | 方案                                               | 结论                                                     |
+| ---------- | -------------------------------------------------- | -------------------------------------------------------- |
+| 淘汰       | 每 Workspace 一个 Plugin Runtime                   | 部署、连接和固定内存随 Workspace 线性增长                |
+| 淘汰       | 一个插件进程服务多个 Workspace/installation        | 全局状态、本地文件和依赖无法形成可信租户边界             |
+| 淘汰       | 同 Workspace 多插件合并到一个 worker               | 与“每 installation 独立进程”冲突，扩大故障和权限边界     |
+| 淘汰       | manifest 自行声明 CPU、内存或更高限额              | 首期统一执行实例级最大值                                 |
+| MVP 不引入 | Runtime 专用数据库、Redis、Kafka 或独立 scheduler  | 当前无容量证据，会增加组件和运维面                       |
+| 后续演进   | 多 Supervisor replica、owner lease、dedicated pool | 保留接口，达到容量或可用性阈值后再决定具体存储与调度方式 |
+
+架构扩展项包括：Core/Supervisor 是否共置、artifact/venv cache 的签名/来源/撤销/GC 规范、installation data hard-quota provider、
+v1 connection 的兼容期限，以及进入多 replica 后的 lease TTL、fencing token 和 owner 转移顺序。
+这些不改变“每个运行中的 installation 一个隔离进程”的首期边界。
+
+### 3.7 验收条件
+
+- 两个 Workspace 安装 digest 相同且已验证的 artifact 时，共享目录仍为只读，进程、配置、data、home、tmp、日志和 Host API 完全隔离；
+  同名同版本但 digest 不同的 artifact 绝不共享目录。
+- 插件不能读取其他 installation 文件、枚举或 signal 其他进程，也不能修改共享代码/依赖目录。
+- CPU、内存、PID、open files 和单文件上限在真实 nsjail/cgroup 环境中生效；超额只终止或拒绝对应 installation。
+- 修改 manifest 不能改变任何资源上限。
+- installation data 的总空间硬配额在写入边界原子拒绝超额，并证明目录扫描不是生产 enforcement。
+- 旧 generation/revision 的回调、消息、副作用和存储访问全部失败关闭。
+- Runtime 重启能从业务 desired state 恢复，不依赖本地进程表作为权威真相。
+- 意外退出的 enabled worker 由 completion callback 触发带有界 backoff 的自动恢复；连续失败只影响对应 installation，不能形成跨租户重启风暴。
+- requirements 中存在 Runtime 基础镜像未预装的包时，Supervisor 仍能先完成共享依赖环境准备再启动 worker；
+  安装失败不会留下持续重启的半启动进程，也不会影响同 digest 已就绪环境的其他 installation。
+
+## 4. D-002：Box 多租户控制面和套餐边界
+
+状态：`IMPLEMENTED FAIL-CLOSED — production quota provider pending`
+
+### 4.1 已实现的基础
+
+- 共享 Box 控制连接可服务多个 Workspace；所有操作绑定 instance、Workspace 和 generation，Runtime namespace 由可信 context 派生。
+- 短期 `SandboxAdmissionGrant`、revision tombstone 和原子 session admission 强制每个合资格 Workspace 最多一个 `global` persistent session，managed process 固定为零。
+- Core 与 Runtime 使用认证 host-control challenge 校验同一个 durable volume，而不是比较路径字符串；不一致时启动和重连失败。
+- Cloud skill 只传逻辑名称；Runtime 从 Workspace-scoped store 解析只读包路径，Python env/cache 写入租户自己的 `/workspace/.skill-envs`。
+- ZIP 安装限制压缩输入、条目、单项、总解压量和压缩比，采用流式解压并拒绝 link、非普通文件、重复项和路径逃逸。
+- 附件 host path 使用 query UUID 和 dirfd/openat/O_NOFOLLOW；Cloud replica 启动不再全局清理其他请求目录，遍历和删除有 inode 预算。
+- grant-enforced readiness 强制 cgroup、namespace、mount、共享卷、Workspace hard quota、Skill hard quota、ephemeral storage 和 inode quota 全部被证明。
+  普通 nsjail backend 对尚未实现的硬磁盘能力明确返回 false，因此当前 Cloud Box 会按设计拒绝启动，直到新部署提供真实 quota provider。
+
+### 4.2 首期套餐与 entitlement 模型
+
+- 闭源订阅管理/Control Plane 负责把套餐映射为版本化 entitlement；Core 和 Box Runtime 不硬编码 `plan == pro`。
+- 首期复用 Cloud Control Plane（可结合现有 Space 的订阅模块）承载闭源套餐、计费和 entitlement 投影，
+  不再拆一个独立 billing/tenant microservice；开源 Core 只实现通用 capability 和数值限额。
+- 首期套餐投影为：Pro 的 `managed_sandbox_sessions = 1`，其他套餐为 `0`。建议 capability 形态：
+
+```json
+{
+  "features": {
+    "managed_sandbox": true,
+    "external_sandbox": false,
+    "mcp_stdio": false
+  },
+  "limits": {
+    "managed_sandbox_sessions": 1
+  }
+}
+```
+
+- `box.enabled` 只表示当前 LangBot 实例是否部署了 Box Runtime，不能替代 Workspace entitlement。
+- 工具发现层根据 entitlement 隐藏/禁用托管 sandbox。Core 校验 Control Plane 的 entitlement 后，
+  通过受认证控制连接向 Box Runtime 下发短期 `SandboxAdmissionGrant`，绑定
+  `instance_uuid + workspace_uuid + execution_generation + entitlement_revision + expires_at + max_sessions + max_managed_processes`。
+  Runtime 只验证和执行该内部 grant，不理解 Pro 等套餐名称，也不相信业务调用方提交的 plan、session ID 或 host path。
+- entitlement 缺失、过期或无法验证时失败关闭。并发创建必须用原子 admission 保证同一 Workspace 永远不超过一个 managed session。
+- entitlement 被撤销后停止 managed process 并关闭逻辑 session；Workspace 数据保留/删除策略随未来 Workspace 释放机制一并决定。
+
+### 4.3 Cloud nsjail 执行模型
+
+- 整个逻辑 SaaS 实例共享一个 Box Runtime 逻辑控制面，M0 运行一个 Runtime replica；不创建每 Workspace Box service、worker pool、PVC、bucket、scheduler、Redis 或 Box 数据库。
+- Cloud 显式固定 `box.backend: nsjail`。sandbox 直接作为 Box Runtime 容器内的 nsjail 子进程运行，
+  不创建 nested Docker container、独立 Pod、microVM 或 warm pool，也不挂宿主机 `docker.sock`。
+- 符合 entitlement 的 Workspace 首次使用时懒创建一个逻辑 session，内部固定 ID 为 `global`，并强制 `persistent=True`；
+  外部调用方不能选择或覆盖 session ID、persistence、host path 或 backend。
+- “全局 sandbox 一直存活”在当前机制中的精确定义是：每个合资格 Workspace 最多一个稳定的 `global` 逻辑 session，
+  它不被 TTL reaper 回收，其 `/workspace` 持久保存；普通命令仍按需启动并退出 nsjail 进程，不能承诺一个空闲 OS 进程永久驻留。
+- Box Runtime 重启后，旧进程、attach token、root/tmp/home 和内存 session 状态失效；下一次使用时以相同 Workspace namespace 懒重建
+  `global` session。持久 `/workspace` 必须继续存在，旧 generation 权限必须失败关闭。
+- 共享 Box Runtime 采用单 owner 的 M0 实现；未来多 replica 才引入 session owner lease 和跨 replica 路由，
+  但 session handle 永远不包含 replica 地址。
+- Cloud 首期强制 `network=off`，调用方和 WebUI 不能覆盖。当前 `network=on` 会关闭 nsjail 的独立 network namespace，
+  不能用于共享 SaaS。未来如需联网，必须先实现每 session 独立 netns 和受控 egress，再单独开放。
+- Cloud 首期禁止 `START_MANAGED_PROCESS`，`SandboxAdmissionGrant.max_managed_processes` 固定为 `0`；
+  普通 exec 在同一 Workspace 的 `global` session 内串行执行。未来开放 resident process 前必须增加数量和聚合 CPU/内存上限。
+
+### 4.4 文件与资源边界
+
+- 文件机制沿用当前 nsjail 方案：Box-owned durable volume 上的 Workspace 目录只 bind mount 到对应租户的 `/workspace`；
+  不在首期新增对象存储双向同步服务或文件服务。
+- `/workspace` 的持久性来自独立 durable host path，而不是 `persistent=True`；后者只禁止 TTL/普通 shutdown 回收逻辑 session。
+  Box Runtime 容器必须挂载持久卷，Workspace 数据不能只放在容器可写层。root/tmp/home 可以在 Runtime 重启时丢失。
+- Cloud MVP 要求 Core 与 Box Runtime 以相同路径挂载同一持久卷并沿用直接文件读写；现有 exec/base64 fallback 的单文件上限
+  低于正常附件上限，不能当作等价 Cloud 文件机制。无法共享路径时必须先扩展传输协议，否则 deployment readiness 失败。
+- 现有执行前后目录扫描只能提供软检查，不能阻止单次命令写满共享卷。生产 Cloud 的 Workspace 总空间上限必须由
+  Box-owned volume 的 filesystem project quota/subvolume quota 在写入点原子执行；Box Runtime 负责设置和验证，不能因 Core 看不到路径而跳过。
+- Cloud 的 nsjail CPU、内存、PID、单文件和总空间限制使用运维配置统一设置；套餐只决定 session 数量，
+  不允许 Workspace 放宽 sandbox 上限。
+- Box Runtime 必须在 cgroup v2 hard limit、namespace 和 mount 条件满足后才通过 readiness；不能在共享 SaaS 中告警后降级运行。
+- 同 Workspace 的 `global` session 可以复用持久 `/workspace`，但不同 Workspace 即使使用相同文件名、进程名或逻辑 session ID，
+  物理 namespace、路径、进程和 capability 也必须完全隔离；Cloud 首期没有可暴露端口或共享网络 namespace。
+
+### 4.5 非 Pro 和未来外部 E2B
+
+- 非 Pro Workspace 在首期没有 Cloud managed sandbox，直接调用内部 API 也必须被拒绝。
+- 未来允许 Workspace 在 WebUI 配置自己购买的远程 E2B endpoint/template/secret；它属于 tenant-owned external sandbox，
+  不消耗 Cloud 的 `managed_sandbox_sessions` 配额，也不能读取其他 Workspace 的凭证。
+- 当前 Box Runtime 只有实例级全局 backend 和 E2B credential，WebUI 也没有 Workspace 级配置，因此 BYOK E2B 明确不在首期实现。
+- 未来实现时在共享 Box Runtime 内增加按可信 Workspace context 选择 backend/provider 的 registry，
+  仍不创建租户专属 Box 控制面或新协议。
+
+### 4.6 淘汰与暂缓方案
+
+| 状态       | 方案                                             | 结论                                                     |
+| ---------- | ------------------------------------------------ | -------------------------------------------------------- |
+| 淘汰       | 每 Workspace 一个 Box service                    | 组件和空闲成本随 Workspace 线性增长                      |
+| 淘汰       | 多 Workspace 共享一个活 sandbox/session          | 不能承载不可信代码                                       |
+| 淘汰为 MVP | Docker、独立 Pod、microVM 或 warm pool           | Cloud v2 首期固定使用 Runtime 容器内 nsjail              |
+| 淘汰为 MVP | 非 Pro 使用 Cloud managed sandbox                | 首期数值 entitlement 为 0                                |
+| 后续演进   | 多 Box Runtime replica、dedicated pool、BYOK E2B | 保留 provider/ownership 接口，有真实容量或产品需求后实现 |
+
+### 4.7 验收条件
+
+- Pro entitlement 首次使用时懒创建一个 persistent `global` session；重复和并发请求都不能产生第二个 session。
+- 非 Pro、entitlement 缺失/过期及伪造 plan 的 API 直调全部失败关闭。
+- TTL 不回收 persistent session；Runtime 重启后进程和临时目录失效，但 `/workspace` 保留并能在下一次使用时安全重建。
+- 两个 Workspace 的文件、进程、session、attach token 和 generation 完全隔离；network/managed-process 请求在首期失败关闭。
+- Core 与 Box Runtime 通过随机 marker challenge 证明同一共享持久卷；只配置相同路径字符串不算通过。
+- Workspace、Skill store、ephemeral root/tmp/home 的 byte quota 与 inode quota 在写入点真实生效；现有目录扫描不被当作硬配额。
+- cgroup 或任一硬存储能力不可用时 Cloud Box Runtime readiness 失败；普通 nsjail 因此不会被误当成 production-ready provider。
+
+## 5. D-003：SaaS PostgreSQL 与 pgvector
+
+状态：`PARTIALLY IMPLEMENTED — shared schema/pgvector complete; SaaS transaction and deployment gates remain`
+
+当前分支已实现 PostgreSQL shared schema、transaction-local scope、FORCE RLS、Cloud runtime 非 DDL 模式、
+同业务数据库 pgvector、显式向量维度和 tenant-scoped vector 主键。一次性 migrator 使用独立凭据、advisory lock，
+负责建立并校验 runtime role 的最小权限，并完成全量 schema 验证；
+普通业务写入贯穿 commit 的 generation-aware fence、与外部副作用同事务的 outbox，以及 generation cutover 后稳定的 durable object 引用尚未实现。
+这些 Core 事务原语与生产 Job、凭据发放、备份和回滚流程，以及 runtime credential 的跨 database 连接隔离证明一起，
+都是 Cloud v2 的 SaaS activation gate。
+
+### 5.1 已确认的数据库边界
+
+- PostgreSQL 是 SaaS 的业务数据库，不把它扩展成通用 Runtime coordinator、Box session directory 或新控制面数据库。
+- M0 使用一个 PostgreSQL business database/shared schema 承载全部 Workspace。创建 Workspace 不创建 database、schema、role 或专属连接池。
+- 首版 migrator URL 和 runtime URL 必须归一化到同一 host、port 和 database，但必须使用不同 role。
+  未来如果 migrator 使用 direct endpoint、runtime 使用 pooler endpoint，只能在两个端点都验证同一个数据库内部 cluster identity 后放宽 host/port 相等。
+  该 identity 由 migrator 所有并固定，runtime role 只能读取，不能创建、修改或伪造。
+- 首版唯一业务 schema 固定为 `public`。migrator 和 runtime 连接都必须满足
+  `current_schema() = 'public'` 且 `current_schemas(false) = ARRAY['public']`；禁止 runtime role 级和 business database 级 `search_path` 覆写。
+- migrator 和 runtime session 的安全值固定为 `session_replication_role=origin`、`row_security=on`、
+  `lo_compat_privileges=off`。runtime role 或当前 business database 作用域内只要存在任意 `pg_db_role_setting` 持久化设置就失败关闭，
+  即使该设置当前看似等于安全值也不接受；tenant context 只能通过事务内 `SET LOCAL` 建立。
+- 业务行显式携带 `workspace_uuid`；Repository/Service 的应用层 scope 是第一道边界，PostgreSQL RLS 是第二道边界。
+- runtime role 的直接 ACL 固定为：business database 的 `CONNECT`、`public` 的 `USAGE`、全部 allowlisted business table 的
+  `SELECT/INSERT/UPDATE/DELETE`、`alembic_version` 的只读 `SELECT`，以及业务表自有 sequence 的 `USAGE/SELECT`。
+  不授予 database/schema `CREATE`、table `TRUNCATE/REFERENCES/TRIGGER`、sequence `UPDATE`、其他对象权限或任何 `WITH GRANT OPTION`。
+- runtime role 必须是 `LOGIN`，但不得具有 superuser、`BYPASSRLS`、`CREATEDB`、`CREATEROLE` 或 replication 属性；
+  不得在 role membership 中以 granted role、member 或 grantor 任一方向出现；不得拥有 database、schema、table、view、sequence、routine 或 extension，
+  不得持有 column ACL，也不得使用、创建或拥有其他非系统 schema。
+- business database 必须安装 `vector`，且 extension catalog 只允许 `plpgsql` 和 `vector`；不得存在 FDW、foreign server 或 user mapping。
+  runtime role 和 `PUBLIC` 都不得有显式 routine ACL 或 parameter `SET/ALTER SYSTEM` ACL；runtime role 不得有效执行任何
+  `SECURITY DEFINER` routine，包括被 allowlisted extension 收编的 routine。普通非 `SECURITY DEFINER` 内建函数的隐式执行权限不在此禁令内。
+- PostgreSQL 新 database 默认向 `PUBLIC` 提供的 `TEMP` 是首版在专用业务 database 上明确接受的兼容性决定，
+  不是 migrator 对 runtime role 的直接 grant；首版不得据此把业务 database 与不受信任工作负载混用。
+  migrator 在释放 advisory lock 前建立并校验上述精确 allowlist；Cloud runtime 每次启动都必须重新完成 schema、身份、有效权限和 catalog 负向校验，发现 drift 立即失败关闭。
+- PostgreSQL role 是 cluster-wide identity，当前 database 内的 catalog audit 不能证明同一 credential 无法连接 cluster 中的其他 database。
+  SaaS 生产环境必须使用仅向该 credential 暴露目标 business database 的专用 PostgreSQL cluster/endpoint，
+  或通过已验证的 HBA/proxy policy 证明该 credential 只能连接目标 business database；这项部署隔离仍是未完成的 activation gate。
+- 关键租户表使用 `FORCE ROW LEVEL SECURITY`；migration/repair/audit 使用独立受控 migrator role。
+- 每个租户事务通过 `SET LOCAL` 设置 tenant context，并由统一 `TenantUnitOfWork` 保证设置 context 和业务查询使用同一事务/连接。
+  禁止使用连接级 session variable 或 `search_path`，避免连接池、PgBouncer、异常回滚和后台任务串租户。
+- 一个 `TenantUnitOfWork` 只能访问一个 Workspace。业务写入与对应 business outbox 在同一事务中提交；
+  写入可以校验由执行层传入的 generation/fencing token，但 Runtime owner、lease 和 Box session directory 不由业务 PostgreSQL 承担。
+- SaaS schema、extension 和 policy 只由 release migration job 创建；应用启动角色不执行 `CREATE EXTENSION`、`create_all` 或自动 migration。
+- OSS 继续默认 SQLite，并保留自托管 PostgreSQL 选项；Cloud RLS 约束不让 OSS 强制依赖 PostgreSQL。
+
+### 5.2 pgvector 首期方案
+
+- SaaS 使用 pgvector 作为默认向量数据库，并与业务表使用同一个 PostgreSQL cluster/database；
+  vector schema 可以使用独立 adapter、受控 role 和有上限的 pool，但不新增 Chroma、Milvus 或独立向量数据库服务。
+- OSS 默认仍是 SQLite + Chroma，用户可以显式选择 pgvector；Cloud 配置 pgvector 失败时必须启动失败，不能静默回退到 Chroma。
+- 向量表必须显式保存 `workspace_uuid` 和 `knowledge_base_uuid`，并至少以
+  `(workspace_uuid, knowledge_base_uuid, vector_id)` 建立唯一键/主键和查询条件；服务端生成的 collection name/hash 不是安全边界。
+- pgvector adapter 复用相同的 tenant-context/RLS 契约，每次向量操作在自己的事务中执行 `SET LOCAL`；
+  是否复用普通业务 UoW、role 或 connection pool 由实现决定，但 adapter 不能丢弃 tenant metadata。
+- `vector(1536)` 不能继续作为无条件硬编码。首期使用无 typmod 的 `vector` 列和显式 `embedding_dimension`，
+  以 `CHECK (vector_dims(embedding) = embedding_dimension)` 校验；release migration 为允许的维度创建带 dimension predicate 的 expression/partial ANN index。
+  知识库/model 元数据必须选择已启用维度，写入和查询 mismatch 或未启用维度时失败关闭，不能截断、补齐、退化为无界扫描或换后端。
+- `vector` extension、表、索引和 RLS 由 release migration 创建。应用进程不在启动时执行 DDL。
+- 0013 如需搬迁 legacy pgvector 数据，migrator 作为源表 owner 先记录每个受保护源表的 `ENABLE/FORCE RLS` 状态，
+  仅在同一 migration transaction 内临时暂停 RLS，并在 `finally` 中精确恢复各表原状态。
+  该流程不依赖 superuser 或 `BYPASSRLS`，也不允许在迁移事务外留下已禁用的 RLS。
+
+### 5.3 候选拓扑与未来演进
+
+| 状态     | 方案                                      | 结论                                                                        |
+| -------- | ----------------------------------------- | --------------------------------------------------------------------------- |
+| 首期决定 | P0. shared database/shared schema         | 一个 pool、一套 migration；应用 scope + RLS                                 |
+| 后续演进 | P1. 多 shared database shard              | 每个 shard 仍承载多个 Workspace，并使用相同 schema；有容量/地域证据后再设计 |
+| 后续例外 | P2. dedicated shard                       | 只作为合规、驻留或超大 workload 的资源等级，不建立第二套代码路径            |
+| 淘汰     | schema/database per Workspace             | catalog、pool、migration、备份成本随 Workspace 线性增长                     |
+| 淘汰     | database/schema per replica/Cell/Instance | 把业务数据拓扑错误绑定到计算副本或已删除的产品实体                          |
+
+M0 不提前增加始终返回 `primary` 的 resolver、shard router 或 shard binding。
+P1 的 resolver、映射、在线迁移、连接池预算、shard-affine replica 和 dedicated shard 细节等到出现容量、地域或合规需求时再设计。
+在此之前，direct endpoint 与 pooler endpoint 分离只能通过数据库内部、runtime 不可伪造的 cluster identity 开启，不使用 DNS 名、数据库名或配置声明代替。
+
+### 5.4 备份与生命周期边界
+
+- PostgreSQL PITR 是 database/cluster 级恢复手段，不等同于单 Workspace 恢复。
+- Workspace 创建、释放、export、delete、单 Workspace restore 和在线迁移机制本轮暂缓，后续单独决策；
+  首期不以尚未设计的 export 能力作为数据库架构验收条件。
+- 除关系业务数据和 pgvector 的向量/检索字段外，大对象、插件 artifact 和 sandbox 文件仍存放在对象存储或 Runtime 持久卷；
+  PostgreSQL 保存相应业务元数据和稳定引用。
+- tenant-visible usage/billing 业务行可以进入 PostgreSQL；基础设施 log/metric/trace 不进入业务数据库。
+  高增长业务表在有数据量证据后再决定 retention、时间分区或分析存储。
+
+### 5.5 验收条件
+
+- 故意遗漏应用层 Workspace filter 时，RLS 仍阻止跨租户读写。
+- 连接池/事务池复用、异常回滚、并发请求和后台任务不会残留 tenant context；如部署 PgBouncer，也必须覆盖 transaction pooling。
+- migration 对 shared schema 只执行一次，不产生 Workspace 级 schema drift；应用启动角色不能执行 DDL。
+- 首版拒绝 host、port 或 database 不同的 migrator/runtime URL，并拒绝相同 role；migrator/runtime 都只解析到 `public`，
+  migrator 在迁移后完成精确 table/sequence/`alembic_version` ACL grant 和正反向 role 校验，runtime 每次启动重新校验。
+- runtime role 没有任一方向的 role membership、`WITH GRANT OPTION`、`search_path` 覆写、对象所有权、其他 schema 访问或非业务对象权限；
+  专用业务 database 上可继承 PostgreSQL 默认 `PUBLIC TEMP`，但 runtime role 没有直接 `TEMP` ACL。
+- migrator/runtime session GUC 保持 `session_replication_role=origin`、`row_security=on`、`lo_compat_privileges=off`，
+  runtime role/当前 database 没有任何 `pg_db_role_setting`；extension 仅为 `plpgsql/vector` 且 runtime 不拥有 extension，
+  database 中没有 FDW/server/user mapping、runtime 或 `PUBLIC` 显式 routine/parameter ACL、runtime-owned routine 或 runtime 可执行的 `SECURITY DEFINER` routine。
+- 生产 deployment 证明 cluster-wide runtime credential 只能连接目标 business database；专用 cluster/endpoint 或 HBA/proxy 隔离未经验证前不得启用 SaaS。
+- legacy pgvector 搬迁在非 superuser、非 `BYPASSRLS` 的 table-owner migrator 下可成功，成功、异常和重试路径都精确恢复所有源表的 RLS/FORCE 状态。
+- 业务写入和对应 business outbox 在同一事务内具备可证明的提交顺序；外部 generation/fencing token 校验失败时不产生写入。
+- pgvector 使用真实 PostgreSQL 集成测试覆盖：两个 Workspace 使用相同 `vector_id`、猜测其他 Workspace ID、
+  故意遗漏 scope、连接复用、CRUD 和后台任务，全部不能越权。
+- embedding dimension 不匹配或 pgvector extension 不可用时失败关闭，不回退到其他向量后端。
+- 新建 Workspace 只新增目录与业务行，不创建 database、schema、role 或专属连接池。
+
+## 6. D-004：stdio MCP 独立开关
+
+状态：`IMPLEMENTED`
+
+### 6.1 已修复的原问题
+
+- 修复前，stdio MCP 的启用条件只检查 transport 为 `stdio` 且 Box available，没有独立 feature gate。
+- 修复前，所有 stdio MCP 使用固定的 `mcp-shared` 逻辑 session，并强制 `persistent=True`。
+- 在该旧逻辑下，如果多租户 Cloud 只通过 `box.enabled` 开放能力，每个配置 stdio MCP 的 Workspace 都会额外保留一个 persistent sandbox，
+  绕过“每 Workspace 最多一个 managed `global` sandbox”的成本和套餐边界。
+
+### 6.2 首期决定
+
+新增独立实例配置：
+
+```yaml
+mcp:
+  stdio:
+    enabled: true
+```
+
+- OSS 默认 `true`，保持当前本地部署兼容；Cloud v2 通过 `MCP__STDIO__ENABLED=false` 强制关闭。
+- 该开关与 `box.enabled`、`managed_sandbox` entitlement 和 sandbox session 数量相互独立，不能从任一条件推导。
+- 后续如开放给特定套餐，可在实例开关之上再叠加 Workspace capability；实例开关为 `false` 时任何 entitlement 都不能绕过。
+- HTTP/SSE/其他远程 MCP transport 不受此开关影响。
+
+### 6.3 强制检查点
+
+开关必须同时覆盖：
+
+1. MCP create；
+2. MCP update 到 stdio；
+3. transient connection test；
+4. Core 启动时加载已有 stdio 配置；
+5. RuntimeMCPSession/loader 的最终执行门禁；
+6. WebUI transport selector 和错误提示。
+
+不能只在 WebUI 隐藏选项。Cloud 配置关闭时，已有 stdio 记录保留但不自动启动，并返回明确的 feature-disabled 错误；
+最终 gate 必须位于 Box 分支和 legacy host-stdio 分支之前，不能误报为 `box_unavailable`，
+也不得创建 `mcp-shared` session 或 stdio 子进程。
+
+### 6.4 验收条件
+
+- Cloud 即使 `box.enabled=true` 且 Workspace 拥有一个 managed sandbox，也无法 create/update/test/start 任何 stdio MCP。
+- 直接调用 API、重放旧配置和启动 bootstrap 都失败关闭，且不会产生 `mcp-shared` session、nsjail 进程或额外配额占用。
+- OSS 默认行为保持兼容；HTTP/SSE MCP 正常工作。
+
+## 7. 五项决策之间的关系
+
+五项决策共同遵循：
+
+> 多租户共享可信控制面、连接池、只读 artifact 和基础容量；租户独占不可信执行进程、sandbox、secret、可写文件和数据作用域。
+
+- Plugin Runtime 通过“共享 Supervisor + 每 installation 独立 nsjail 进程”降低控制面数量，同时保留进程级租户隔离。
+- Box 通过“共享 Runtime + 每个合资格 Workspace 一个持久逻辑 session + one-shot nsjail exec”避免每租户部署服务和空闲容器。
+- stdio MCP 独立关闭，防止从 Box availability 隐式产生第二套 persistent sandbox。
+- PostgreSQL 和 pgvector 共享数据库组件，但使用显式 tenant key、应用层 scope 和 RLS 防止共享存储变成共享权限。
+- 订阅管理只在闭源 Control Plane 维护套餐与计费规则，并向开源 Core 投影签名/版本化 entitlement；
+  Core/Runtime 执行通用 capability 和数值限额，不复制套餐名称或计费逻辑。
+- 目录同步只在启动或恢复时读取全量快照；常态变更按 Workspace 聚合为签名增量，新增租户不会使每次目录事件退化为全实例重投影。
+
+## 8. 当前不做分布式时仍保留的能力
+
+1. 所有运行时协议继续携带稳定 `instance_uuid`、`workspace_uuid` 和 execution generation；不能依赖进程地址表达身份。
+2. Core、Plugin Runtime 和 Box Runtime 的本地进程表不能成为 durable desired state 或撤销状态的唯一真相。
+3. 创建、重试、回调、outbox 和 worker 注册使用稳定 idempotency key；重复投递不能产生第二个 owner 或副作用。
+4. Plugin installation 和 Box session 使用稳定 owner abstraction；启用第二个 replica 前再实现带 expiry、CAS 和 fencing token 的 lease。
+   具体 lease store 后续决定，不预设复用业务 PostgreSQL，更不因此新增 Runtime/Box 数据库。
+5. Repository/UoW 不允许无边界跨 Workspace 事务；`workspace_uuid` 从第一天就是内部路由与分片候选键。
+6. schema migration、任务扫描、监控聚合和运维接口不能假设永远只有一个 Core 进程。
+7. 外部 API 不暴露 replica、worker 或 shard 标识；未来扩容不改变 Workspace URL、UUID 或客户端协议。
+8. 只有出现容量、可用性、地域或合规需求时才增加 replica/shard；预留协议不等于现在部署额外组件。
+9. 每个 Core replica 保存自己的事件消费 cursor，因为 entitlement cache 是进程本地状态；PostgreSQL 中的 projection high-water mark
+   、snapshot coverage 和 inbox 仍由所有 replica 共享，用于幂等投影和冲突检测。事件页携带签名 high-water，副本追平前不能续期
+   ready；不能让一个 replica 的共享 cursor 使其他 replica 跳过本地 cache 刷新。
+
+## 9. 本轮明确不做的事情
+
+- 不合并 plugin installation 进程，即使插件和版本完全相同；只允许共享摘要校验后的只读代码/依赖文件。
+- 不允许插件 manifest 声明或覆盖 CPU、内存、PID、文件或存储上限。
+- 不为每个 Workspace 创建 Plugin Runtime、Box service、database、schema、role、bucket、PVC 或消息队列。
+- 不在 Cloud v2 首期使用 Docker sandbox、microVM、warm pool 或非 Pro managed sandbox。
+- 不在首期实现 Workspace 级 BYOK E2B WebUI 配置。
+- 不在 Cloud v2 首期支持 stdio MCP，也不让 Box availability 隐式开启它。
+- 不把业务 PostgreSQL 用作未决定的 Runtime/Box 通用协调数据库，不在缺少容量证据时引入 Redis、Kafka 或新 scheduler。
+- 不在本轮实现 Workspace export、释放、单租户恢复或在线迁移；具体生命周期另行决策。
+- 不实现多个 CloudInstance、Workspace Placement 或 Cell Router；未来分布式只作为单逻辑实例内部的副本和分片能力。
+- 不修改旧 Space 部署模型；Cloud v2 继续按绿地方案设计。
@@ -0,0 +1,348 @@
+# LangBot Cloud Runtime 资源安全审查
+
+日期：2026-07-28 至 2026-07-29
+
+审查分支：
+
+- LangBot：`feat/multi-tenants`，审查起点 `32abbb636f4455e965141d8d209b359dbfbb5aae`
+- Plugin SDK：`feat/multi-tenants`，审查起点 `0cddf3c2bea5939c67b71e488a719e9903c28d17`
+
+## 结论
+
+本轮已覆盖 LangBot Core、Plugin Runtime 和 Box Runtime 的主要常驻对象、后台任务、队列、网络客户端、进程生命周期及数据库连接池。本轮定位到的攻击者可控或历史累积状态均已补充容量、超时、淘汰或确定性清理边界；修正后的高基数探针没有观察到随历史请求继续增长的活跃缓存。按本轮“代码审查 + 本地可重复测试”的验收口径，审查已经完成，未发现仍未处理的严重内存泄漏或 CPU 抢占路径。该结论不等于证明任意生产负载下不存在资源问题。
+
+最终 Cloud 拓扑和生产环境不是本轮完成条件。代码级审查、跨仓全量测试和仓库 Dockerfile 构建的 Linux/cgroup v2 探针已经通过，但当前状态仍不能单独作为 Cloud 生产激活批准。完整的环境侧剩余清单见
+[Cloud v2 仍待验证事项](./cloud-v2-pending-verification.md)。其中与本轮资源审查直接相关、上线前还必须完成的项目包括：
+
+1. 在最终 Cloud 部署权限和 cgroup 拓扑下重复 nsjail、namespace 和 delegated cgroup v2 的 CPU、内存、swap、PID、文件句柄验证。本轮一次性 Linux 容器已经证明代码路径可工作，但普通容器和仅 `--privileged` 的 private cgroup namespace 都不满足条件。
+2. 为 Cloud Box 提供并验证硬文件系统 quota provider。普通 nsjail bind mount 不能证明总字节数和 inode 硬配额，当前严格 readiness 按设计会失败关闭。
+3. 使用最终生产配置分布继续做容量测试，并据此确定单实例 Workspace placement 上限。本轮真实 PostgreSQL 16 + RLS 启动测试已经覆盖 1,000 个各带 Provider、三类 Model、Bot、Pipeline、KnowledgeBase、MCP 和 Plugin setting 的 Workspace，启动加载耗时和 SQL 次数保持线性；5,000 Workspace 的合成三代替换探针也证明旧运行时会释放。仓库已新增可同时采集 Core/Plugin/Box HTTP、进程树和 cgroup v2 的 24 小时门禁工具，并在受 CPU、memory、swap、PID 硬限制的 Linux 容器中完成短时自检；但最终生产候选拓扑的 24 小时运行仍未执行。测试中的 fake adapter/requester/Plugin handler 仍不能替代真实平台 SDK、外部连接池和插件进程的容量数据；合法活跃租户本身仍会线性占用内存。
+SDK 已先行发布到分支提交 `1d65ed301a6afc52150a998043f73cd6032c8162`，本提交集中的 LangBot
+`pyproject.toml` 和 `uv.lock` 已精确钉住该提交。最终镜像仍需按待验证清单记录并核对实际安装版本。
+
+## 覆盖范围
+
+### LangBot
+
+- 启动、停机、全局任务管理和运行时配置。
+- PostgreSQL、Tenant UoW、RLS、迁移和共享 pgvector。
+- HTTP、MCP、WebSocket、上传下载、S3/本地存储、维护任务和异步用户任务。
+- QueryPool、Pipeline Controller、会话/对话、限流、第三方 Agent/LLM runner 和同步 SDK 桥接。
+- PlatformManager 以及 DingTalk、QQ、Lark、WeCom、WeComCS、WeChatPad、LINE、Kook、Satori、OpenClaw Weixin、Telegram、Discord、Matrix、HTTP/WebSocket 等适配器。
+- Plugin Runtime connector、插件包校验、Marketplace 下载、pip 安装输出和 desired-state reconcile。
+- Box connector、admission、session/process 生命周期和 RPC 文件。
+- RAG、向量后端、Skill、Storage、Telemetry 和日志缓存。
+
+### Plugin SDK
+
+- stdio/WebSocket transport、请求 waiter、action task 和文件传输。
+- Runtime control handler、Workspace/generation fence 和 EventContext。
+- 插件 artifact、Marketplace、pip 依赖安装、共享依赖环境、installation desired state、Supervisor 和 worker launcher。
+- nsjail 参数、cgroup/rlimit、进程注册 capability 和 Runtime shutdown。
+- Box admission、generation fence、session/process/reaper、RPC/relay WebSocket 和 nsjail backend。
+
+## 主要修复
+
+### 确定性生命周期
+
+- 修复 `Application.shutdown()` 使用 `contextlib.suppress` 却未导入 `contextlib` 的问题。原行为会在实际资源关闭分支直接 `NameError`，阻断后续 Plugin、Box、HTTP 和数据库释放。
+- `make_app()` 在任一启动 stage 或初始化失败时会关闭尚未返回给 `main()` 的半构建 Application；Telemetry、Box、Tool、Platform、Vector 和 HTTP manager 在初始化前即挂到 Application，避免初始化中途失败后清理器无法发现已经创建的连接、会话、子进程或后台任务。
+- MCP streamable-HTTP session manager 现在随 Application shutdown 显式退出。
+- MCP loader 按 `(instance, workspace, generation)` 管理 host task 和 session；任务完成会从注册表移除，代次推进会取消旧 host task 并关闭旧 session，reload/shutdown 会先清空现有运行时，避免 completed task 和旧代连接永久驻留。
+- Platform bot reload/remove/shutdown 统一串行化，旧 bot、代理、adapter 任务和进程会先停止再从注册表移除。
+- Model provider requester 新增异步关闭契约；provider reload/remove、Workspace generation 替换、全量 reload 和 Application shutdown 都会确定性关闭旧 requester，允许第三方 requester 安全持有自己的 HTTP client 或连接池。
+- Plugin Runtime、Box Runtime、stdio transport、adapter 连接和共享 HTTP client 均补齐 close/cancel/await。
+- HTTPX 有界流在超限异常或消费者取消时会立即关闭底层响应流；原来的 response hook
+  只在正常读完后由 HTTPX 自动关闭，持久客户端反复收到超大响应时可能积累未释放连接。
+  已消费响应在超限分支也会先 `aclose()` 再传播错误。
+- `Application.dispose()` 只允许一个可追踪 shutdown task；重复的信号、窗口关闭或调用方清理不会铺开多个并行停机流程。
+- Lark、微信、钉钉、企业微信和 QQ Official 的凭证交换后台任务统一进入 Application TaskManager，受全局/单 Workspace admission 约束并随应用停机取消；容量满时关闭尚未调度的 coroutine 并返回 429，不留下游离 task。
+- `TaskCapacityError` 已下沉到无 Application/controller 依赖的纯错误模块。原来的 HTTP 过载异常路径会在特定冷启动导入顺序下触发 TaskManager/controller 循环导入，把应返回的 429 变成框架 500。
+- S3 storage provider 在初始化失败或 Application shutdown 时关闭 botocore HTTP connection pool；Storage manager 在 provider 初始化前即挂到 Application，避免 bucket probe 失败后遗留 client。
+- 修复 Coze runner 每次请求创建 `aiohttp.ClientSession` 却不关闭的问题；现在 runner 的 `aclose()` 会确定性关闭底层 API client。
+- LINE SDK client 现在随 adapter 停止关闭；Plugin Runtime shutdown 回调只创建一个可追踪、可等待的后台任务，重复回调不会累积清理 task。
+- SDK `lbp publish` 现在用上下文管理器关闭插件包上传文件；原实现的成功、API 错误和 HTTP 错误返回路径都会遗留文件句柄。
+- Plugin worker、Box Docker/CLI backend、Box nsjail backend 和 nsjail 依赖安装在调用方取消时会 terminate/kill、读取管道并 `wait()` 回收子进程；Windows worker 的原生进程路径也进入相同的 `finally` 清理契约，避免取消安装、停机或超时后留下孤儿进程。
+- Box 服务入口现在从 Runtime initialize、aiohttp app/runner setup、端口 bind 到主循环共用一个外层清理边界；建站期间的非 `OSError` 也会关闭 Runtime 和 reaper。WebSocket 控制模式端口绑定失败会退出并交给编排器重启，不再在没有任何可用 RPC/health 端口时永久等待；stdio 模式仍允许仅 relay 绑定失败后继续控制通道。
+- Plugin artifact 解压、installation staging/activation/rollback/delete、共享依赖环境和 nsjail session 目录的关键文件系统变更均移出事件循环。已经开始的原子变更在取消时会先等待线程结束，再回滚临时目录、目标目录或旧 supervisor，不会让后台线程继续修改一个调用方已经认为清理完毕的路径。
+- Core 和 SDK 的阻塞 executor 新增独立的有界清理入口。普通工作在容量耗尽时仍快速拒绝；已经拥有资源的 close/unlink/rmtree/进程回收则等待有限 worker 槽位并保证原子操作结束后再传播取消，避免过载恰好导致清理任务被拒绝。SeekDB、Milvus、S3、LINE、WeChatPad、MCP staging 和 TBox 临时文件等关键路径已接入。
+
+### 有界队列、缓存和历史状态
+
+- QueryPool 同时限制全局与单 Workspace 的 queued/running query；调度后不再被过载淘汰；历史 scope counter 有上限。
+- Session、Conversation、WebSocket connection/proxy/message、rate-limit identity、task record/log、telemetry task、vector handle 和 adapter 私有队列均有容量或 LRU/TTL。
+- SessionManager 现在维护 Workspace 二级索引和带 revision 校验的最小过期堆。新会话只扫描目标 Workspace 的有界会话集，TTL 回收只消费已过期堆前缀，全局 idle 淘汰使用最小堆；高频命中产生的旧堆项按活跃会话的有界倍数压缩。原实现会在每个攻击者可制造的新 launcher id 上扫描并排序实例全部会话。
+- SDK 的 EventContext 和依赖准备锁使用 weak reference；generation、admission、installation、capability 和 completed-process 状态有上限。
+- Plugin restart circuit 打开期间只有 `max_concurrent_restarts` 个 supervisor
+  能持有冷却计时器/状态等待，其他 installation 睡眠在同一个 semaphore FIFO；
+  probe 状态变更在调用者取消时仍会完成，避免 half-open 永久占用。
+- Box nsjail 启动时只扫描一次 `/proc`，并流式删除遗留 session 目录；不再为每个
+  遗留目录重复扫描全部进程或先把全部目录物化到内存。
+- Box Skill discovery、目录列表和列表正文分别限制扫描 entry、package、返回 entry
+  与累计文本字节；BFS 使用 deque，拒绝 inode 洪泛导致的 O(N²) 或无界列表。
+- Core message aggregation 使用 `(instance, workspace, generation)` O(1) scope
+  counter 做准入，不再为每个新 launcher 扫描全实例 buffer。
+- Cloud remote MCP 的 idle execution fence 不再由每个 session 每 5 秒查询
+  Workspace/ExecutionState；签名目录投影事务提交后将失效 scope 合并进一个有界
+  cleanup worker。实际工具/资源调用前后仍保留数据库强校验。
+- 空 Workspace 不再预分配 Model generation scope、Plugin installation set 或 Box generation event；只有 Workspace 实际拥有对应运行时资源或等待任务时才创建这些对象。
+- Runtime RPC 文件同时限制单文件字节数和单连接未消费文件数量，连接关闭时清理连接拥有的临时文件。
+- Box Runtime 维护实例与 Workspace 到活跃 session 的二级索引；创建、删除、过期、撤销和 shutdown 共用同一清理路径，避免每个租户 RPC 都扫描实例中的全部 session。
+- Box Runtime 另外只索引“可过期 session”和“持有 managed process 的 session”。Cloud 的持久 `global` sandbox 不进入 TTL 索引，managed process 被禁用时进程索引为空；session 创建、周期 reaper、状态和 `/healthz` 因此不会随全部持久租户数线性扫描。测试把总 session 字典替换为禁止迭代的映射，仍能完成第二个持久 session 创建、reap、status 和 health。
+- Core MCP loader 同样维护 Workspace/generation 到 session、host task 的二级索引；请求、代次回收和动态配置不再扫描实例中的全部租户 MCP session，已完成 task 的 done callback 会同步移除所有索引。
+- Box admission 过期回收使用带 revision/generation 校验的最小堆，只访问已到期记录；重复续期产生的旧堆项会被忽略，堆大小超过活跃 grant 的有界倍数时压缩，不再在每次 RPC 上全表扫描所有租户 grant。
+- 旧 QQ message ID/object cache 和 stdio MCP Workspace copy lock 不再随历史请求无限增长。
+- LLM/Agent runner 的单次生成结果默认限制为 1 MiB，流式传输限制单事件 1 MiB、单请求累计 16 MiB，并限制最多 100,000 个流式事件，避免上游异常响应无限占用内存或 CPU。
+- Marketplace JSON 限制为 1 MiB、插件包限制为 64 MiB；pip stdout/stderr 各最多保留 1 MiB，超出部分继续 drain 但不驻留内存。
+- Plugin Runtime stdio/WebSocket 协议除 16 MiB 消息字节上限外，新增最多 4,096 个入站和出站碎片的对象数量上限；大消息的 UTF-8 编码、分片、拼接、JSON 编解码和 Pydantic 验证均在线程执行。WebSocket receive 异常使用稳定的 `ConnectionClosed` 类型，不再因库顶层未导出 `exceptions` 属性而在错误路径二次失败。
+- HTTPX response hook 和 aiohttp 有界读取统一把第三方响应限制为 10 MiB；JSON 解析及诊断文本转换在线程执行，错误正文最多保留 4 KiB，避免大 JSON 在共享事件循环集中解析。
+- 图片、data URI 和平台媒体默认限制为 10 MiB，Base64 在解码前先校验编码长度；Plugin binary storage 默认单值 10 MiB，并设置不可由错误配置绕过的 64 MiB 绝对上限。
+- Skill 文本单文件限制为 1 MiB，Plugin UI 文件限制为 4 MiB，host edit 文件限制为 1 MiB；Box Skill ZIP、插件 artifact 和 GitHub Skill archive 同时限制条目数、单文件、解压总量和压缩比。
+- SDK E2B 文件同步限制为最多 2,048 个目录项、1,024 个文件、单文件 10 MiB、总计 50 MiB；同步文件 IO 从事件循环移到线程。
+- Dify 待提交表单、用户 Space OAuth state 和 Cloud launch JTI replay cache 改为带 revision 校验的最小过期堆；Space credits 使用按时间有序的 LRU/TTL 队列。原实现会在每次攻击者可触发的请求上扫描整个历史缓存并在满容量时再次线性寻找最旧项；现在过期回收为摊销 `O(log N)` 或仅消费已过期前缀，旧堆项会忽略并按活跃状态的有界倍数压缩。
+- Cloud launch JTI cache 达到 4,096 个仍有效 token 时失败关闭，不再为了接纳新 token 淘汰仍有效的 replay 记录；否则攻击者可以在容量满后重放被提前遗忘的合法签名。
+- Entitlement resolver 现在跟随 Cloud directory 的权威 Workspace 活跃集合。全量目录投影会丢弃已 fenced/removed Workspace 的历史 entitlement snapshot，delta 批量更新不会对每个变化重复扫描；provider 请求进行中发生目录撤销时，返回前的第二次 active fence 会阻止旧结果重新写回缓存。
+- Cloud directory 的签名响应、Workspace、membership 和实例 active Workspace
+  均新增可配置操作上限与绝对上限。闭源适配器按流读取响应并在 JSON/JWS 解析前
+  拒绝超过 32 MiB 默认值的解压后正文；Manifest/entitlement/event endpoint 再
+  分别限制为 256 KiB、256 KiB 和 2 MiB。entitlement 刷新最多并发并驻留 16 个
+  原始响应，逐批验证成小型 snapshot 后释放；Core 在目录投影行锁保护的事务内检查最终
+  active 数量，超限时回滚 Workspace、Account、membership、inbox 和 cursor，不会
+  截断权威数据或让并发副本各自越过最后一个容量槽。
+- Space 全量目录只投影 active Workspace，历史 archived Workspace 只在最多 100 个
+  目标的签名 delta 中作为 tombstone 返回。注册创建新个人 Workspace 前通过
+  PostgreSQL transaction advisory lock 串行执行全局 active 数量准入；达到上限
+  返回 503，避免多个 Space 副本同时观察到最后一个空位。
+- Monitoring 分页、offset、CSV export 和 session/message detail 均在 service
+  边界执行实例配置上限与不可放大的绝对上限；detail 的完整统计改为 SQL aggregate，
+  只物化有界的 tool/LLM/error 明细并显式返回 `detail_truncated`。默认分页 1,000、
+  export 10,000、detail 2,000，绝对上限分别为 5,000、50,000、10,000。
+- Token statistics 的时间序列不再把筛选范围内的全部 LLM call 拉回 Python 分桶；
+  PostgreSQL 使用 `date_trunc`、SQLite 使用 `strftime` 在数据库中聚合，并只返回
+  最近 1,000 个时间桶（绝对上限 10,000）。模型分组复用分页上限并在 SQL 中按 token
+  排序、限制；两类结果都返回显式的 `*_truncated` 标志。
+- Monitoring 过期数据每表每轮默认最多删除 4 个批次、绝对最多 100 个批次；本地/S3
+  过期上传文件候选和每轮删除默认最多 1,000、绝对最多 10,000。单个历史数据量异常的
+  Workspace 不再能让一次维护循环无限物化候选或持续清空全部 backlog。
+- Workspace webhook 数量默认限制为 16、绝对限制为 64；管理查询和运行时 fan-out
+  都只物化有界结果。实例同时发送的 webhook 请求默认限制为 16、绝对限制为 128；
+  满载时直接跳过未获准目的地，不创建一批等待 semaphore 的 task。取消调用时会取消并
+  await 已创建的所有请求任务，归还实例槽位。
+- Local/S3 Storage 的对象读取在实际 IO 中只读取 `limit + 1` 字节，S3 body 在成功、
+  超限和异常分支都会关闭；默认单对象 10 MiB、绝对上限 64 MiB。所有 scoped load
+  以及 WebSocket attachment 都经过同一边界，写入也不能产生当前实例无法安全读取的对象。
+- Valkey Search 的批量删除改为固定页流式搜索、删除并累计计数，不再把全部匹配 key
+  保留在 Python 列表；每次删除后从 offset 0 继续，避免结果集缩短造成跳项，并设置
+  1,000 轮绝对终止条件。
+
+### CPU 和事件循环保护
+
+- 修复旧 QQ `repeat_seed('')` 空输入无限循环。
+- ZIP 校验/重打包、PIL、Base64、AES、JSON 解析、fsync probe、插件 artifact/依赖文件、Skill、S3、本地存储和维护目录扫描从事件循环移到线程。
+- 公开 Slack、QQ Official、HTTP Bot、公众号、WeCom/WeComCS 回调体显式限制为 1 MiB；JSON/XML 解码移出共享事件循环。QQ、DingTalk、Satori、WeCom AI 和 WeChatPad 网关帧同样设置 1 MiB 上限或在解码前拒绝超限消息；KOOK zlib 数据使用 10 MiB 解压后硬上限，阻断小压缩包制造的大内存解压。
+- HTTP Bot 的幂等键和 outbound session 现在均在写入前执行硬容量 admission；满额时只按固定 64 项预算检查最旧记录，不能先超限后整体清空，也不再在每条回复上扫描和排序全部 session。已有 session 继续 O(1) 访问，新 session 在没有可安全回收的空闲记录时失败关闭。
+- Dashboard、Embed 和 Plugin Runtime 的协议 JSON 编解码在线程执行；Dashboard/Embed 在接收端提交 terminal error 后为发送端保留有界 drain 窗口并使用内部 sentinel 唤醒，不会因“任一方向结束即取消”在撤权错误帧发出前关闭连接。
+- 租户配置的敏感词、内容忽略和群响应正则统一使用声明为直接依赖的 `regex` 引擎：最多 64 个 pattern、单 pattern 1,024 字符、输入 1 MiB、单次总匹配 CPU 预算 50 ms，并在线程中执行。超时、非法正则和替换放大均失败关闭；灾难性 `(a+)+$` 回归在 1 ms 测试预算内被中断。
+- 原生 `read/write/edit/glob/grep` 文件工具移出事件循环并继承 Workspace 阻塞预算。目录列举、递归 walk、grep 文件/总字符、单行、pattern、结果和 regex CPU 均有硬上限；glob 只用固定大小最小堆保留最新 100 项，不再先把全部命中路径驻留内存。Box 内执行的 glob/grep 脚本同样限制命中集合、扫描量和正则时间。
+- Dify、DingTalk、QQ、WeCom 等客户端复用连接并在生命周期结束时关闭，响应体和下载有字节上限。
+- DashScope、TBox 等同步第三方 SDK 的调用和生成器迭代改为在线程执行；单个同步生成器最多消费 100,000 个事件。
+- Dashboard 和 Embed WebSocket 改为任一收发 task 结束即取消并等待另一方向，避免发送端退出后接收 task 永久阻塞；两方向 task 同时继承从认证结果或 RuntimeBot 得到的可信 Workspace 阻塞预算。
+- Plugin installation 生命周期全局串行化；不同租户的依赖 pip/nsjail 准备不会在安装高峰并发抢占 CPU。
+- Plugin installation 的意外退出除了每 installation 的 jittered exponential backoff，还经过 Runtime 全局 restart launch 并发槽和失败窗口熔断。
+  熔断冷却后只允许一个 half-open probe；probe 必须完成初始化并持续稳定后才恢复其他 installation。未在 30 秒内 ready 的 worker
+  会被取消回收。健康指标输出 active launch、窗口失败数、circuit 状态和累计打开次数，24 小时门禁把 circuit 打开或尾段启动槽不归零判为失败。
+- S3 同步 SDK 使用线程执行，并通过实例级 semaphore 限制并发；默认 `storage.s3.max_concurrency=16`，可通过实例配置和环境变量覆写。
+- Box 子进程 stderr 以 64 KiB 块读取，日志最多每秒输出 4 个摘录并汇总抑制数量，避免无换行或刷屏输出制造无界缓冲与日志放大。
+- Plugin worker 日志单行最多保留 64 KiB；Box managed-process stdout relay 以固定 64 KiB 块读取，不再依赖换行符，避免超长无换行输出触发 `StreamReader` limit 或堵塞子进程。
+- Box generation fence 的代次更新改为只访问目标 Workspace 的 event 和 active-task 二级索引。原实现每次更新都会遍历全部 Workspace 的 fence/task 记录，10,000 个 Workspace 的第二阶段更新会退化为 O(N²) 并在 40 秒后仍未完成；修正后包括其他 SDK 高基数负载和本轮协议 offload 在内的当前完整双阶段探针耗时 `11.270s`。
+- Box session 枚举、旧 generation 回收和 admission 计数均通过 Workspace 索引执行；admission 过期回收通过最小堆执行，不再在每次 RPC 上产生 O(实例总 session/grant 数) 的扫描。
+- Model、Pipeline、RAG 和 Platform manager 均维护 Workspace 到运行时 key 的二级索引。Workspace generation 更新只清理目标 Workspace 的缓存和运行时，不再扫描实例内所有租户的 provider/model、pipeline、knowledge runtime 或 bot；回归测试使用禁止全局迭代的映射验证该边界。
+- Cloud heartbeat 直接读取已加载且有容量边界的 Pipeline、MCP、KnowledgeBase 和 Bot registry 计数，不再为每个活跃 Workspace 依次打开 Tenant UoW、执行四类 COUNT 查询；这消除了租户数增长后每日周期性形成的串行 SQL/CPU 尖峰。OSS 模式仍保留数据库统计语义。
+- 邀请、Monitoring 和 Storage 的三个周期清理 task 合并为一个
+  `resource-maintenance` 调度器。调度器先等待首个 interval，不与启动加载争抢资源；
+  同一到期周期只执行一次 active Workspace discovery，然后按 Workspace 串行运行
+  有界 job，单 Workspace 失败不跳过其他 Workspace。默认相同的一小时周期由此从
+  三次全租户发现和三个同时唤醒的任务收敛为一次发现和一个任务。
+- Cloud 启动阶段先生成一份经过部署适配器和目录投影校验的 Workspace binding 快照，Model、Platform、Pipeline、RAG 和 Plugin 初始化共用该快照，初始化完成后立即释放；避免启动期间为每个 manager 重复执行整批租户发现和投影校验。
+- Platform、Pipeline 和 RAG 的资源加载在使用已验证启动快照时不再为每个 Bot/Pipeline/KnowledgeBase 重新查询同一个 execution binding；常规请求和动态更新路径仍保留数据库 generation fence。
+- MCP 初始 host 和 shutdown burst 由实例级 semaphore/批次限制；默认 `mcp.lifecycle_concurrency=16`，支持 `MCP__LIFECYCLE_CONCURRENCY` 覆写并硬性限制最大 128。初始加载不再先为每个 server 创建一个等待 semaphore 的 task，而是由一个可取消 dispatcher 每批最多物化 `lifecycle_concurrency` 个子 task；同时去掉了 ORM server/config 的双份临时列表，避免大量租户启动时集中占用 CPU、内存、socket 和文件句柄。
+- Core、Plugin Runtime、Box Runtime 和独立 Plugin worker 的默认 `asyncio.to_thread()` executor 统一改为硬有界线程池。默认最多同时运行 8 个阻塞调用、排队 128 个，达到容量后立即抛出 admission 错误，不再使用 Python 默认 `ThreadPoolExecutor` 的无界工作队列保留任意数量的请求对象、Future 和闭包。每个可信 Workspace 的 running + queued 默认再限制为 4，并强制配置值不超过 worker 数的一半，避免单个租户先提交一整批同步工作占满全部 worker/FIFO 队列。Core 使用 `system.blocking_executor.max_workers/max_pending/max_inflight_per_scope`，原生支持对应的 `SYSTEM__BLOCKING_EXECUTOR__*` 覆写；SDK 进程使用 `LANGBOT_BLOCKING_EXECUTOR_MAX_*`，并分别限制全局最大值为 64/4096。
+- Core、Plugin Runtime 和 Box Runtime 各自运行固定 1 秒间隔、仅保留最近 120 个样本的 event-loop lag monitor；健康快照输出 current/recent max/recent p95/进程期最大延迟和累计样本数。Plugin Runtime 的两个 WebSocket 端口现在都在免认证 `/healthz` 返回同一份无凭据、无租户标识的聚合 JSON，Box `/readyz` 同时附带资源快照。24 小时门禁默认拒绝缺失或停止的 monitor、超过 1 秒的 recent max、超过 250 ms 的尾段 recent p95 及 sample counter 回退。
+- Workspace 阻塞预算由服务端认证后的 `RequestContext`、公开 bot 的 RuntimeBot、公开对象 key 中经 binding fence 验证的 Workspace、Platform/TaskManager 的 ExecutionContext，以及 SDK 入站 ActionContext 建立，不接受调用方伪造的租户 header。公开 webhook、公开对象下载、Dashboard/Embed WebSocket、普通 HTTP handler、Platform adapter 和 detached tenant task 均已覆盖。容量拒绝在 Core HTTP 路径返回稳定的 429，health/debug counter 分开报告 global 与 scope rejection。
+- Argon2 密码 hash/verify 只允许一个实例级在途操作，额外并发立即返回容量错误而不是在 asyncio semaphore 中无限积累等待请求；该 CPU/内存密集工作同时使用独立的 `system:authentication` 阻塞作用域。Cloud 本身仍禁用本地密码登录。
+- WeCom 扩展 API 的无限客户端超时改为 120 秒；平台 webhook 的 AES、媒体 Base64 与同步 SDK 调用均移出共享事件循环。
+- 长文本转图片限制为 100,000 字符、256 行、800 万 RGBA 像素和 10 MiB 输出；
+  超限时回退到 forward message。数字边界查找从重复 `count/find/sort` 改成线性扫描，
+  PIL image 使用显式关闭，压缩步长为零时也能终止。
+- Core 在每次 quota-enforced Box exec 前后遍历 Workspace 时使用非递归 DFS，并在
+  超过字节 quota 或默认 100,000/绝对 1,000,000 个目录项后立即停止；目录项洪泛
+  失败关闭，不再重复完整扫描 inode bomb。远程 outbox fallback 同时限制扫描项、
+  文件数、单文件和总字节，Python project manifest 使用分块 hash 并限制单文件 10 MiB。
+
+### 插件和 Box 资源隔离
+
+- Plugin worker 数量受 `max_workers`、`max_total_cpus / max_cpus` 和 `max_total_memory_mb / max_memory_mb` 的最小值约束。
+- Shared profile 强制 Linux 和 nsjail；Cloud 强制 `plugin.worker.require_hard_limits=true`；cgroup v2 delegation 不可用时拒绝启动。
+- 每个 worker 下发 CPU、memory、swap、PID cgroup 限制，以及 process、open-file、file-size rlimit；插件 manifest 不能提高限制。
+- Box nsjail 的 cgroup v2 路径现在同时设置 `memory.max` 和 `memory.swap.max=0`。修复前，48 MiB 沙盒可以把强制提交的 128 MiB 页面换出并正常退出，形成宿主 swap 抢占；修复后同一探针以 exit 137 被 cgroup 杀死。
+- 仓库 Docker Compose/Kubernetes 示例显式下发 Core、Plugin Runtime 和 Box Runtime 的 blocking executor 上限；Kubernetes Box readiness probe 从仅报告进程存活的 `/healthz` 改为 `/readyz`，使 backend 或 managed-mode 隔离检查失败时不会把 Pod 加入就绪流量。
+- 相同 digest 的已验证代码和依赖环境可只读共享，每个 installation 的 home/tmp/data 和进程独立。
+- SDK 在发布共享依赖环境前最多校验 100,000 个目录项和 2 GiB 常规文件元数据总量；
+  超限的 staging tree 会被原子清理而不会进入 worker。`requirements.txt` 和插件
+  `manifest.yaml` 都使用 `limit + 1` 有界读取，manifest 额外限制为 1 MiB。
+- Box session、managed process、completed process、admission record 和 RPC 文件均有实例级上限；Cloud entitlement 仍限制每个合资格 Workspace 一个 `global` session、零 managed process。
+- Box Runtime 对上述实例级配置再增加不可放大的硬上限：session 5,000、managed
+  process 1,024、completed process 10,000、admission record 250,000、RPC 单文件
+  100 MiB、completed retention 86,400 秒。初始化与远程 INIT 对错误类型、负数和
+  超上限均失败关闭，错误动态更新不会留下部分生效的 limit。
+
+### PostgreSQL
+
+- Cloud 强制 PostgreSQL 业务库、共享 pgvector 和允许的固定向量维度。
+- pgvector Cloud 模式复用业务数据库的同一个 AsyncEngine，不创建第二个连接池。
+- `database.postgresql` 新增并校验 `pool_size`、`max_overflow`、`pool_timeout_seconds`、`pool_recycle_seconds`；默认最大连接数为 `10 + 10`。
+- `pool_size + max_overflow` 的绝对上限为 100，timeout/recycle 也有绝对上限；
+  Cloud runtime 的 asyncpg 连接默认设置 60 秒 statement timeout、5 秒 lock
+  timeout 和 60 秒 idle-in-transaction timeout，并分别限制最大 300/60/300 秒。
+  一次性 release migration 不继承这些短 runtime timeout。
+- `/healthz` 输出 pool 配置容量、checked-in/out、overflow、pool admission timeout
+  累计数和 SQL timeout 配置；目录同时输出 active/max 与最近批次
+  Workspace/membership 数，供生产 soak 和告警核对。
+- Application shutdown 显式 dispose 业务引擎；standalone pgvector 仅关闭自己拥有的引擎。
+- PersistenceManager 提供统一异步 shutdown；Cloud 常驻进程的启动失败、正常停机和一次性 release migration 的成功/异常路径都会释放数据库引擎。真实 PG catalog 测试还覆盖了“入口已经关闭后测试再次复用 manager 会重开 pool”的第二生命周期，严格资源告警模式下无 asyncpg socket/transport 遗留。
+
+## 本轮采用的默认决策
+
+- 优先 fail closed 或淘汰最老的 idle cache，不允许攻击者控制的历史 key 无限驻留。
+- 插件依赖准备选择实例级串行化，以稳定 CPU/磁盘峰值；代价是批量安装耗时增加。
+- PostgreSQL 使用一个显式有界共享连接池；未拆分 pgvector pool。
+- 单实例目录的默认 active/full-snapshot Workspace 上限均为 1,000，membership
+  上限为 20,000，签名响应上限为 32 MiB；绝对上限分别为 5,000、100,000 和
+  64 MiB。Space 和 Core 必须配置为同一操作上限，生产值只能根据 V-08 容量曲线
+  向下调整或在重跑全部门禁后提高。
+- 第三方 runner 采用 1 MiB 单结果、16 MiB 单流总量和 100,000 个同步/异步事件的统一实例级安全上限；超限请求失败关闭。
+- S3 默认允许 16 个并发阻塞调用，最大配置值 128；在没有独立 worker service 的前提下限制线程池排队和上游连接压力。
+- MCP 生命周期默认并发 16、最大 128；该限制统一约束实例启动时的 session host 峰值和 shutdown 批次，不允许租户配置单独放大。
+- Core 与 SDK 各进程的通用阻塞 executor 默认使用 8 个 worker、128 个 pending 槽位、每 Workspace 4 个在途槽位；它是实例/进程级共享背压，不由 Workspace 或插件 manifest 调高，单 Workspace 配置硬性不得超过 worker 的一半。生产值应按容器 CPU 和上游阻塞时延校准，不能把 pending 当吞吐配置无限放大。
+- 插件包下载上限 64 MiB，pip stdout/stderr 保留上限各 1 MiB；这不会限制安装进程实际输出，只限制父进程内存中的诊断副本。
+- 通用远程响应和媒体默认上限 10 MiB；错误诊断正文只保留 4 KiB。Plugin binary storage 默认 10 MiB、绝对上限 64 MiB；Skill 文本、Plugin UI 和 host edit 分别限制为 1 MiB、4 MiB 和 1 MiB。
+- Storage scoped object 默认读写上限 10 MiB、代码绝对上限 64 MiB；Webhook 默认每
+  Workspace 16 个、实例 16 个同时出站请求，代码绝对上限分别为 64 和 128。Box
+  Workspace quota 扫描默认最多访问 100,000 个目录项、绝对最多 1,000,000 个。
+- SDK 共享依赖环境在发布前最多接受 100,000 个条目、2 GiB 常规文件元数据总量；
+  artifact manifest 与 requirements 各最多 1 MiB。这些是 Runtime 控制面在启动
+  worker 前的保护，不替代最终文件系统的 byte/inode 硬配额。
+- Monitoring 查询上限由 `monitoring.query_limits` 配置并支持原生环境变量覆写，但始终
+  受代码绝对上限约束；cleanup 的每表批次数和 Storage 每轮文件数同样采用实例配置加
+  绝对上限。时间序列默认/绝对上限为 1,000/10,000 个数据库聚合桶，模型分组复用分页
+  上限。提高这些值必须计入 V-08/V-09 的数据库 CPU 与 Core RSS 容量曲线。
+- Managed-process relay 保留 stdout 的原始换行，并按 64 KiB WebSocket frame 分块；不再承诺“一行对应一个 frame”。这是为无换行输出提供确定内存边界所需的协议收敛。
+- 本轮没有把 Pipeline、Model、KnowledgeBase 等合法租户资源改成 lazy runtime。该改动会改变启动和请求语义，留到 Workspace placement/释放机制一起设计。
+- 本轮没有为普通 nsjail 声称伪硬盘配额；严格 Cloud readiness 保持失败关闭。
+
+## 验证结果
+
+| 验证项 | 结果 |
+| --- | --- |
+| LangBot Ruff + `git diff --check` | 通过 |
+| Plugin SDK Ruff + `git diff --check` | 通过 |
+| LangBot 全量测试（含 unit/integration/Box/E2E） | `2855 passed, 33 skipped` |
+| Plugin SDK 全量测试 | `1328 passed` |
+| Space Go 全量测试与闭源 Cloud Adapter 测试 | Go `go test ./...` 通过；Adapter `40 passed` |
+| Space PostgreSQL 16 Cloud v2 目录与并发容量准入 | 通过；两个注册并发争用最后一个槽位时 `1 success / 1 capacity rejection / 1 active Workspace` |
+| Core PostgreSQL 16 Cloud runtime server timeout | 真实连接从 `pg_settings` 读回 `60000ms / 5000ms / 60000ms` 的 statement/lock/idle-transaction timeout，并显式 dispose |
+| 真实 PostgreSQL 16 + pgvector 迁移/RLS/发布测试（严格资源告警） | `22 passed` |
+| 真实 PostgreSQL 16 + RLS populated Cloud 启动容量 | 500 Workspace `6.178s / CPU 3.026s`；当前 1,000 Workspace 复跑 `12.109s / CPU 5.967s` |
+| 较早 Core Dockerfile Linux 镜像构建与 `regex` 导入 | 通过，image SHA `8893a14053df`；该镜像使用旧 SDK pin，已失效，最终候选必须重建 |
+| `ResourceWarning` + `PytestUnraisableExceptionWarning` 全量门禁 | Core 与 SDK 均通过，并已固化到 pytest 配置 |
+| Plugin SDK Box 专项测试（含全局扫描回归保护） | `669 passed` |
+| Docker Compose 渲染、Compose/Kubernetes YAML 解析与 diff 检查 | 通过 |
+| Cloud soak 门禁解析/采样/判定单元测试 | `27 passed` |
+| Core/Plugin SDK event-loop monitor 专项测试 | 两仓各 `7 passed`，包含真实 50 ms scheduler stall |
+| Cloud soak Linux 硬限制短时自检 | 通过；CPU `0.5`、memory+swap `256 MiB`、PID `128` 均从 cgroup v2 读回，冷却尾段 verdict `pass` |
+| Core 双阶段历史 churn 资源探针（使用当前本地 SDK 分支） | audit 通过，`12.559s` |
+| Core 5,000 个 populated Workspace 三代容量探针（使用当前本地 SDK 分支） | 当前复跑通过，最大替换耗时比 `1.405` |
+| Plugin SDK 双阶段资源探针 | audit 通过，`11.270s` |
+
+两个仓库新增了可重复执行的历史 churn 探针，Core 另有 populated Workspace 三代替换探针：
+
+```bash
+# LangBot Core
+PYTHONPATH=../langbot-plugin-sdk/src uv run python scripts/runtime_resource_probe.py --scale audit --json
+
+# LangBot Core：5,000 个带代表性资源的 Workspace
+PYTHONPATH=../langbot-plugin-sdk/src uv run python scripts/workspace_runtime_capacity_probe.py --scale audit --json
+
+# LangBot Core：真实 PostgreSQL 16 + RLS populated Workspace 启动
+TEST_POSTGRES_URL=postgresql+asyncpg://... \
+LANGBOT_PG_CAPACITY_WORKSPACES=1000 \
+uv run pytest \
+  tests/integration/persistence/test_migrations_postgres.py::TestPostgreSQLTenantRuntime::test_populated_cloud_startup_is_linear_and_task_bounded \
+  -q -W error::ResourceWarning --log-cli-level=INFO
+
+# langbot-plugin-sdk
+uv run python scripts/runtime_resource_probe.py --scale audit --json
+```
+
+Core audit 每个阶段执行 10,000 个空 Workspace 的真实 Model/Plugin manager 加载与 reconcile、25,000 次 Query、2,500 次 session churn、10,000 个限流身份、5,000 个 task 和 2,500 次 WebSocket churn。第一、第二阶段的保留状态完全一致：
+
+- 20,000 个历史空 Workspace：Model scope/provider/LLM、Plugin Workspace set/installation 均为 `0`。
+- 50,000 个历史 Query：活跃 query cache `0`，历史 scope counter `100`。
+- 5,000 个会话身份：session cache `200`。
+- 20,000 个限流身份：rate-limit container `10,000`。
+- 10,000 个历史 task：task record `200`。
+- 5,000 次 WebSocket churn：conversation 与 stream index 均为 `200`。
+- event-loop task、线程和文件描述符保持 `1 / 1 / 6`；使用当前本地 SDK 分支的复跑中，第二阶段相对第一阶段 RSS 增长 `2,228,224 bytes`、tracemalloc current 增长 `344,669 bytes`，总耗时 `12.559s`。Session 淘汰改为 Workspace 索引和最小堆后，同一 audit 工作量相对此前 `16.150s` 明显下降。
+
+Populated Workspace audit 为 5,000 个 Workspace 各加载一个 Provider、LLM、Embedding、Rerank、Pipeline、Bot、KnowledgeBase 和 MCP session，然后全部推进两个 generation：
+
+- 三个阶段的活跃 provider/model、pipeline、bot、knowledge 和 MCP registry 均精确维持 `5,000`，不存在按历史 generation 增长。
+- 到第三阶段，前两代的 requester、Bot adapter 和 MCP session 各 `10,000` 个全部收到确定性关闭；weak reference 断言旧代对象可被回收。
+- event-loop task、线程和文件描述符保持 `1 / 1 / 6`；使用远端精确钉住 SDK 的当前复跑中，第三阶段相对第二阶段 RSS 增长 `1,245,184 bytes`，tracemalloc current 仅增长 `2,061 bytes`。
+- 初始/第一次替换/第二次替换分别耗时 `1.893s / 2.549s / 2.659s`，最大替换耗时比为 `1.405`，未随历史代次出现 CPU 退化。
+- macOS RSS sample 从初始的 `154,648,576` 增至第一阶段 `368,181,248`、第二阶段 `389,087,232` 和第三阶段 `390,332,416 bytes`；第二次替换只比第一次替换增加约 1.19 MiB，但“合法活跃租户资源的线性容量”仍必须作为 placement 容量输入。这里使用轻量 fake adapter/requester，不应把第一阶段约 204 MiB 增量外推为生产每租户成本。
+
+Plugin SDK audit 每个阶段执行 25,000 次 loopback RPC、5,000 次安装 binding 激活/撤销、10,000 个 Workspace generation 更新和 2,500 次带 Workspace 上下文的 Box session 创建/删除。第一、第二阶段的保留状态完全一致：
+
+- RPC waiter、stream queue、action task 和活跃 installation binding 均为 `0`。
+- installation watermark 为有界的 `5,000`；Workspace generation record 为有界的 `10,000`，没有等待者时 generation event 为 `0`。
+- generation active task/index、Box session、Box Workspace session index、creating/closing/background task 和 session lock 均为 `0`。
+- event-loop task 和文件描述符保持 `1 / 7`；当前复跑第二阶段相对第一阶段 RSS peak 增长 `2,637,824 bytes`、tracemalloc current 增长 `289,746 bytes`，总耗时 `11.270s`。耗时增加来自本轮把大协议消息的 JSON/Pydantic、UTF-8 编码、分片和拼接移入有界线程池；结构状态和第二阶段 tracemalloc 增量保持平稳。
+
+第二轮反向静态审查另外枚举了 Core 的 50 个显式 task 创建点和 204 个线程、阻塞调用及子进程调用点，以及 SDK 的 28 个显式 task 创建点和 62 个线程、阻塞调用及子进程调用点。第三轮独立复核继续从高基数定时器、目录遍历、准入全表扫描和取消竞态反推，新增关闭了 Plugin restart 冷却唤醒群、MCP idle 数据库轮询、nsjail orphan 的 O(session × process) 启动扫描、message aggregation 的 O(buffer) 准入及 Skill inode/文本列表边界。显式 task 均具有持有者、完成回调或 `finally` 回收路径；所有生产入口在第一次 `asyncio.to_thread()` 前安装有界默认 executor。Core、Plugin Runtime 和 Box 的公开 `/healthz`（Box `/readyz` 亦同）会输出各自的 aggregate runtime/resource counter 和 event-loop lag，供 soak 对比活跃量、pending、累计 capacity rejection 与调度延迟；不输出 debug key、控制 token、租户或插件身份。Plugin Runtime 的授权 debug info 复用同一资源快照，避免公开/私有指标语义漂移。
+
+真实 PostgreSQL populated 启动门禁会先通过 release migration 创建最新 schema，再用无 `BYPASSRLS` 的临时 Cloud Runtime 角色启动。每个 Workspace 都含九类代表性资源，测试会走实际的 instance discovery、tenant UoW、启动 binding 快照和 Model/Platform/Pipeline/RAG/MCP/Plugin 加载路径：
+
+- 500 Workspace：启动加载 `6.178s`，进程 CPU `3.026s`。
+- 当前 1,000 Workspace 复跑：启动加载 `12.109s`，进程 CPU `5.967s`；相对此前 500 Workspace 的墙钟比为 `1.960`。
+- `model_providers`、`llm_models`、`embedding_models`、`rerank_models`、`bots`、`legacy_pipelines`、`knowledge_bases`、`mcp_servers`、`plugin_settings` 九张表的 SELECT 次数均精确等于 Workspace 数，没有重复的全租户发现或超线性资源扫描。
+- MCP host dispatcher、host task 和临时 Runtime 角色/asyncpg 连接在测试结束后均清空；严格 `ResourceWarning` 模式通过。
+
+探针要求第二阶段的结构状态与第一阶段精确相等，并对第二阶段 RSS 与 tracemalloc 增长设置失败阈值。macOS 的 RSS 来源是 `getrusage` peak，因此这里验证的是峰值增量边界而非“当前 RSS 回落”；最终 Linux 24 小时 soak 仍需采集 current RSS/PSS 和 cgroup `memory.current`。
+
+LangBot 全量测试的 33 个 skip 中，22 个是默认全量运行未提供 PostgreSQL/pgvector 而跳过的集成用例，10 个是未提供 Valkey，另 1 个是可选环境的 collection skip；真实 PostgreSQL 相关路径已由上表单独运行覆盖。Plugin SDK 的 26 个 warning 为现有 Pydantic v2 deprecation 与 aiohttp AppKey 建议；没有失败、未关闭资源或资源上限降级。Core 当前全量产生 194 个既有第三方/兼容性 warning；`ResourceWarning` 和 `PytestUnraisableExceptionWarning` 仍由 pytest 配置提升为错误，本轮没有此类泄漏告警。
+
+Linux Runtime 探针使用上述镜像并只读挂载本地最新 SDK 源码：
+
+- 普通容器：nsjail binary 可执行，但 namespace、mount、network 与 cgroup v2 检查均为 `false`，严格 readiness 按预期失败关闭。
+- `--privileged` + private cgroup namespace：namespace、mount、network 通过，但 cgroup v2 delegation 为 `false`，仍按预期不能进入 Cloud ready。
+- 一次性容器内建立可写 delegated cgroup 子树后：Plugin 与 Box cgroup 探针均为 `true`，nsjail namespace、mount、network 和 cgroup v2 均通过；硬文件系统与 inode quota 继续报告 `false`。
+- `cpus=0.1` 的 1.0 秒 process-CPU busy loop 实际耗时 `9.13s`；`memory_mb=48` 下逐页提交 128 MiB 以 exit `137` 终止；`pids_limit=8` 下批量 fork 返回 `EAGAIN`。这些结果验证了 CPU、memory+swap 和 PID 的实际内核执行路径。
+- 新增 `scripts/cloud_runtime_soak.py` 后，在同一 Linux 镜像的独立容器中设置 `--cpus 0.5 --memory 256m --memory-swap 256m --pids-limit 128`，工具从目标 cgroup 读回 quota `50000/100000 usec`、memory `268435456 bytes`、swap `0 bytes` 和 PID `128`。最终复跑中，32 MiB 子负载退出后的 4 秒冷却尾段 `memory.current` 稳健增长和斜率均为 `0`，平均 CPU `0.00132 cores`，OOM、memory pressure、PID max 和 throttle delta 均为 `0`，最终 verdict 为 `pass`。这只是采集器/判定器自检，不替代最终 24 小时生产候选运行。
+- 本地实际启动 Plugin Runtime 后，控制端口与 debug 端口的公开 `/healthz` 均返回相同聚合 JSON，event-loop monitor 为 running，且正文不含 debug key。采集器显式绕过进程级 HTTP proxy 后，对控制端口执行 6 秒短时 endpoint gate：无失败，观测到的 recent max/p95 均为 `2.233 ms`，verdict 为 `pass`。
+- 本地实际启动 Box Runtime（未创建 sandbox session）后，`/healthz` 与 `/readyz` 均返回 event-loop、blocking executor、session/process/task 聚合快照；monitor 为 running、样本持续增长，两个端点观测到的 recent max 均为 `2.265 ms`。SIGINT 后 aiohttp、Runtime、reaper 与 monitor 走统一清理路径并正常退出。
+
+## 上线配置与监控门禁
+
+最终 24 小时命令、运行位置、阈值语义、负载矩阵和产物要求见 [LangBot Cloud 24 小时资源 Soak 门禁](./cloud-runtime-soak-gate.md)。该工具默认把任一健康失败、OOM/memory pressure、PID limit、CPU throttling 超阈值、blocking executor rejection、冷却尾段内存持续增长或空闲 CPU 过高判为失败；生产运行必须使用 `--require-hard-limits`。
+
+至少需要监控并告警：
+
+- Core/Plugin Runtime/Box Runtime 的 RSS、CPU throttling、OOM、PID 数和 event-loop lag。
+- 各进程 blocking executor 的 running、pending、inflight、active scopes、`global_rejected_total` 和 `scope_rejected_total`；pending 持续不归零或 rejection 增长都应告警。
+- QueryPool、WebSocket、session、task、plugin worker、Box session 的当前量、容量拒绝和淘汰计数。
+- Plugin crash/restart 频率、dependency prepare 耗时和失败率。
+- PostgreSQL pool checked-out/overflow/wait timeout、事务耗时和连接错误。
+- 临时文件、artifact、dependency environment、Box Workspace volume 的字节数和 inode。
+
+生产 soak 应覆盖租户突发登录、批量插件 reconcile、插件崩溃重启、WebSocket 断连、Box 并发执行、PG pool 饱和和应用 SIGTERM；持续运行至少 24 小时，并验证负载停止后 RSS、task、socket、文件和子进程数量回到稳定基线。
@@ -0,0 +1,271 @@
+# Cloud v2 multi-tenant verification report
+
+Date: 2026-07-24
+
+Status: `FOUNDATION AND CONTROL PLANE VERIFIED — PRODUCTION ACTIVATION REMAINS GATED`
+
+This report records the implementation and verification evidence for one
+logical LangBot instance serving multiple Workspace tenants. It covers the
+open-source Core, the shared Plugin/Box runtimes, the closed Space adapter, and
+the Space Cloud v2 modular-monolith control plane. It does not claim that the
+production Cloud deployment may enable `CLOUD_V2_ENABLED` yet.
+
+## Repository refs and scope
+
+- LangBot Core: branch `feat/multi-tenants`, commit
+  `e8a09b7537ef285a967f24add05fdb9bb557b97e`
+- langbot-plugin-sdk: branch `feat/multi-tenants`, head
+  `ca545d079ca1657a5d4efb4e31bfeafe1a374a46`
+- langbot-space: branch `feat/cloud-v2-control-plane`, head
+  `ce41ff370e94a405f70e2fbb2f99b0946e0e0387`
+- Closed Core adapter: `langbot-space/cloud-adapter`
+- SDK protocol/package version: `0.4.18`
+
+Core pins SDK commit `e7d946af4a6b1494fbe74627c1815ace19ac8991`;
+the SDK branch head adds CI-only follow-up. Cloud v2 is a greenfield
+multi-Workspace deployment and does not provision one Pod, database, queue, or
+Runtime per tenant. Legacy Space Pods remain a compatibility surface only.
+
+## Implemented product and security boundaries
+
+### Workspace and identity
+
+- SaaS has one stable `instance_uuid`; `workspace_uuid` is the tenant key.
+- Registering an Account creates its personal Workspace and owner Membership in
+  the same PostgreSQL transaction.
+- A Workspace may contain multiple users with owner, admin, and member roles.
+  Invitation acceptance is token-hash based, email-bound, single-use,
+  concurrency-safe, and member-limit checked while locked.
+- Community remains exactly one local Workspace while supporting multiple
+  Accounts and fixed RBAC roles. Installing the closed adapter is required to
+  inject the SaaS policy; configuration alone cannot activate it.
+- Disabled or deleted Accounts cannot use password/code/OAuth login, sessions,
+  refresh/access tokens, or personal access tokens.
+
+### Closed Space control plane
+
+- `CLOUD_V2_ENABLED` defaults to false. Invalid or incomplete configuration
+  fails startup, and disabled endpoints return 503.
+- Space owns Workspace/Membership/Invitation, versioned plans, subscriptions,
+  entitlements, usage events, directory outbox, and signed instance manifests.
+- Free and Pro are Workspace plans. Pro projects one managed global sandbox;
+  Free projects none. Both force stdio MCP off in the first release.
+- Subscription changes use Workspace advisory locking, expected entitlement
+  revision, one-live-order uniqueness, idempotent usage ingestion, period-end
+  downgrade, renewal, and lazy expiry settlement.
+- Account provisioning and quota projection into New API use PostgreSQL
+  transactional outboxes, replica-safe claims, retry/backoff, revision fencing,
+  and database-enforced identity/user/token ownership. No additional service is
+  introduced.
+- EPay and Stripe callbacks are bound to the locked order's provider identity,
+  amount, currency, channel/session, expiry, and provider transaction ID.
+  Replays are idempotent, EPay is CNY-only, credential rotation is supported by
+  encrypted per-order snapshots, and permanent fulfillment conflicts are
+  retained for reconciliation.
+- Directory snapshot and per-Workspace delta reads use repeatable-read,
+  read-only transactions with a high-water cursor. Core stores a replica-local
+  consumer cursor and shared PostgreSQL projection state, snapshot coverage,
+  and inbox rows.
+- The `/cloud` page selects a Workspace and shows its independent subscription,
+  entitlement, limits, and usage. When Cloud v2 is disabled or the backend does
+  not expose the feature field, the complete legacy Welcome/Pod UI is used.
+
+### PostgreSQL and pgvector
+
+- SaaS business data uses one PostgreSQL shared schema with application scope
+  plus forced RLS. Cloud directory writes are separated from local tenant
+  writes.
+- Projected Account and Membership revisions are monotonic. Tombstones remove
+  memberships, and stale revisions cannot resurrect them.
+- pgvector shares the business PostgreSQL database and remains
+  Workspace-scoped.
+- Space runs versioned SQL migrations before application seeds. A fresh
+  database, a partial pre-migration database, and an existing-baseline path
+  converge without startup-time `AutoMigrate`.
+
+### Shared Plugin Runtime
+
+- One instance-scoped trusted Supervisor serves multiple Workspaces.
+- Every enabled installation has its own nsjail process bound to
+  `(instance, workspace, execution generation, installation, runtime revision,
+  artifact digest)`.
+- Verified same-digest code and dependency files are mounted read-only and may
+  be shared; home, tmp, data, process namespace, registration capability, and
+  cgroup are private.
+- Instance configuration owns CPU, memory, PID, open-file, and file-size
+  limits. Plugin manifests cannot increase them.
+- Memory includes swap: nsjail receives `memory.max` and `swap.max=0`.
+- Unexpected worker exit is recovered by a completion callback with bounded
+  per-installation exponential backoff. Remove, reconcile, Runtime shutdown,
+  and container SIGTERM perform a graceful-to-SIGKILL bounded reap.
+
+### Shared Box Runtime and MCP
+
+- One shared Box control plane serves Workspace-bound logical sessions.
+- Cloud grants allow at most one persistent `global` session for an entitled
+  Workspace and no managed processes in the first release.
+- Cloud is fixed to nsjail and network-off. Core and Box prove the shared
+  durable Workspace mount with an authenticated marker challenge.
+- stdio MCP is independently gated and forced off for Cloud v2.
+- The current nsjail Box backend does not provide hard byte/inode quotas, so
+  Cloud readiness correctly fails closed instead of silently using a soft
+  directory scan.
+
+## Automated verification
+
+### LangBot Core
+
+```text
+uv run --no-sync pytest -q
+  2590 passed, 32 skipped, 177 warnings
+
+real PostgreSQL migration, pgvector, and release-migrator suites
+  21 passed, 11 warnings
+
+uv run --no-sync ruff check .
+uv lock --check
+git diff --check
+  passed
+```
+
+The full suite ran without the closed adapter installed, proving the open-source
+single-Workspace/multi-user path remains standalone. Focused closed-adapter,
+directory projection, runtime connector, Box cleanup, and configuration suites
+also passed with the adapter installed.
+
+### Plugin SDK and real Linux runtime
+
+```text
+SDK full suite
+  1226 passed, 22 existing warnings
+
+Ruff check and format check
+git diff --check
+  passed
+```
+
+A privileged Linux test container with host cgroup namespace ran one shared
+Runtime and two Workspace installations:
+
+- both workers referenced the same artifact inode;
+- home, tmp, and data inodes were distinct;
+- each plugin saw only PID 1 in its private PID namespace;
+- a tampered binding and an unknown installation were rejected;
+- the control token was absent from worker environments;
+- cgroups were distinct with `memory.max=134217728`, `memory.swap.max=0`,
+  `pids.max=32`, and `cpu.max=500000 1000000`;
+- touching 256 MiB exited with code 137 without swap growth;
+- the 32nd fork failed with `EAGAIN`;
+- reconcile and container SIGTERM removed the worker cgroups.
+
+The same run started the Runtime from a non-root working directory, covering
+absolute nsjail mount-source normalization.
+
+### Space backend, adapter, and frontend
+
+```text
+MIGRATIONS_TEST_DSN=... MIGRATIONS_TEST_DSN_FRESH=... \
+  go test -count=1 ./...
+go vet ./...
+  passed against PostgreSQL 16
+
+fresh PostgreSQL app startup, partial-baseline migration,
+Cloud v2 migration rerun, and control-plane integration
+  passed
+
+closed adapter pytest and Ruff
+  passed
+
+pnpm exec tsc --noEmit
+pnpm check:i18n
+pnpm check:cloud-checkout-currency
+  passed; 7 checkout/currency cases
+```
+
+The PostgreSQL checks started from an empty database and verified all 34
+registered migrations in order, Cloud v2 Free/Pro seeds, legacy plan seeds,
+Cloud columns/indexes, payment callback constraints, New API outbox/ownership
+constraints, and repeatable reruns.
+
+## Cross-service and browser E2E
+
+### Signed Space-to-Core directory projection
+
+Using an isolated Space PostgreSQL database and a migrated Core PostgreSQL
+database:
+
+1. Space issued a signed manifest for the fixed LangBot instance.
+2. Space returned a directory snapshot at cursor 5 containing two Workspaces,
+   two projected Accounts, and owner/admin memberships.
+3. Core verified the signature, instance, release, validity window, and
+   capability before injecting the Cloud Workspace policy.
+4. Core stored both Workspaces, all active Memberships, both projected
+   Accounts, snapshot coverage, inbox entries, and cursor 5.
+5. Account-field-only projection revisions and Workspace directory revisions
+   remained independent, and cross-Workspace Account conflicts failed closed.
+
+### Real browser Cloud v2 flow
+
+A real local browser operated the Space frontend and backend:
+
+1. An existing legacy-Pod owner logged in and saw the automatically created
+   personal Workspace on Free.
+2. The page showed Free and Pro Workspace plans while retaining the legacy Pro
+   instance card, its Online state, URL, version, billing period, and actions.
+3. Annual Pro checkout through the configured EPay/Alipay rail was re-quoted
+   from the displayed USD plan price to `¥490.00 CNY`.
+4. The browser reached the EPay gateway with `money=490.00`; no USD amount was
+   sent through the CNY-only rail.
+5. A valid signed `TRADE_SUCCESS` callback returned `success`. An exact replay
+   also returned `success`, leaving one successful order and one Pro Workspace
+   subscription.
+6. Refreshing payment state cleared the pending indicator. The page then showed
+   the Pro annual period and managed-sandbox entitlement while the legacy Pod
+   remained Online.
+
+The browser run used the real Space UI and HTTP handlers. Its disposable local
+development harness added a same-origin Next.js rewrite only in the temporary
+worktree; that harness change was removed after the run.
+
+The legacy feature-flag branch was then covered by API/static checks and the
+production build: false or missing `cloud_v2_enabled` renders the original
+Welcome/Pod client; a failed web-config request renders an explicit retry
+instead of guessing a deployment mode.
+
+### Deliberate Core startup failure
+
+Core completed signed manifest verification and directory projection, then
+stopped at the Box readiness gate because the current nsjail backend cannot
+prove hard Workspace byte/inode quota enforcement. Connector shutdown and
+reconnect tasks were cleanly reaped; no event-loop or never-awaited coroutine
+warning remained.
+
+This is a successful fail-closed acceptance result, not a passing production
+Cloud boot.
+
+## Remaining production activation gates
+
+Cloud v2 must remain disabled until these gates are closed:
+
+- Box provides and proves hard byte and inode quotas for Workspace, Skill,
+  root, home, and tmp storage.
+- Plugin installation writable data receives an operator-owned hard total
+  disk quota.
+- Plugin and future networked Box workloads have tenant-safe egress/SSRF
+  policy.
+- Plugin Runtime adds jitter, global restart concurrency limiting, and a
+  Runtime-level circuit breaker, then passes systemic-failure injection.
+- M0 rolls Core and Plugin Runtime together until authenticated Runtime takeover
+  or an owner lease/fencing protocol exists.
+- Payment operations add scheduled reconciliation and alerting for stale
+  `processing` orders and persisted permanent fulfillment conflicts.
+- Cloud v2 subscription service periods are stored immutably and included in
+  recognized-revenue reporting.
+- Production migration Job, backup/rollback, PostgreSQL credential/network
+  boundaries, horizontal-replica fault injection, Workspace release/export,
+  deletion, and restore semantics are completed.
+
+These gates intentionally add no tenant-specific service. They are implemented
+inside the existing Space, Core, Plugin Runtime, Box Runtime, and PostgreSQL
+components to preserve the architecture goal: near-zero static cost for a new
+Workspace.
@@ -0,0 +1,941 @@
+# LangBot Workspace 多用户与 SaaS 多租户架构
+
+状态：`ARCHITECTURE BASELINE — isolation kernel implemented; SaaS activation gates remain`
+
+本文描述 Cloud v2 的目标架构和安全边界。详细的 Runtime、Box、PostgreSQL、pgvector 与 stdio MCP 决策以
+[pending-architecture-decisions.md](./pending-architecture-decisions.md) 为权威来源；已经落地的实现选择记录在
+[implementation-decisions.md](./implementation-decisions.md)。
+
+“隔离内核已实现”仅表示开源 Core/SDK 已具备多租户数据和运行时隔离所需的基础能力，
+不表示闭源控制面、计费、生产部署或 Cloud v2 已经可以上线。
+
+## 1. 架构决策摘要
+
+Cloud v2 采用以下模型：
+
+> SaaS 对外只有一个逻辑 LangBot 实例，全部 Workspace 都是该实例内的租户；
+> 开源 Core 提供完整隔离内核，闭源 Cloud Control Plane 管理 SaaS 目录、订阅、权益和计费。
+
+核心决策如下：
+
+1. `Workspace` 是数据、成员、权限、用量和不可信执行的租户边界，不是一个 Pod、namespace、数据库或独立 LangBot 部署。
+2. SaaS 注册 Account 时自动创建个人 Workspace；这只新增目录与业务记录，不创建租户专属服务、数据库、队列或 Runtime。
+3. OSS 每个 LangBot 实例只能存在一个 Workspace，但该 Workspace 可以有多个 Account、邀请和固定角色。
+4. SaaS 才允许一个 Account 拥有或加入多个 Workspace，并在 WebUI 中切换当前 Workspace。
+5. MVP 可以各运行一个 Core、Plugin Runtime 和 Box Runtime 进程；未来增加副本或 PostgreSQL shard 仍属于同一个逻辑实例的内部扩展，不改变产品模型和外部 API。
+6. 一个共享 Plugin Runtime 控制面管理所有 Workspace，但每个运行中的 plugin installation 独占一个 nsjail 进程；enabled-resident 是 desired semantics，只读代码和依赖可按已验证摘要共享。
+7. 一个共享 Box Runtime 管理所有 Workspace；首期符合 entitlement 的 Workspace 最多拥有一个持久 `global` 逻辑 sandbox，实际命令继续以 nsjail 子进程执行。
+8. SaaS 业务数据使用 PostgreSQL shared schema、应用层 scope 与 RLS 双重隔离；pgvector 位于同一个业务数据库并作为 SaaS 默认向量后端。
+9. stdio MCP 有独立实例开关，Cloud v2 首期强制关闭，不能由 Box availability 或套餐能力隐式开启。
+10. 闭源 Control Plane 可以作为模块化单体复用现有账户、支付和运营能力，但历史 Cloud 的租户专属部署模型不进入新架构。
+11. Workspace 创建、释放、export、单 Workspace restore 和在线迁移的具体流程仍待后续决策。
+
+本轮重构的最高目标是：
+
+> 共享可信控制面和基础设施池，隔离不可信执行单元；减少独立部署和常驻组件，使新增 Account 或空 Workspace 的静态成本接近零。
+
+减少组件数量不意味着合并安全边界。插件进程、sandbox、secret、可写文件和租户数据仍必须严格隔离。
+
+## 2. 范围与非目标
+
+### 2.1 本方案覆盖
+
+- OSS 单 Workspace 多用户、邀请和固定 RBAC。
+- SaaS 多 Workspace 账户、成员和 Workspace 切换模型。
+- HTTP、WebSocket、API Key、Bot、Webhook、后台任务和内部调用的可信 Workspace 上下文。
+- Bot、Pipeline、Provider、Knowledge、Plugin、MCP、RAG、Session、Storage 和 Monitoring 的租户隔离。
+- Plugin Runtime 与 Box Runtime 的共享控制面和进程级隔离。
+- SaaS PostgreSQL shared schema、RLS 与 pgvector 边界。
+- 开源 Core 与闭源 Control Plane 的职责、协议和故障边界。
+- 当前单副本运行和未来同一逻辑实例内横向扩展的兼容约束。
+- 分阶段实施、激活门禁和验收策略。
+
+### 2.2 本方案不覆盖
+
+- 兼容或原地升级历史 Cloud 的租户专属部署方案。
+- 为每个 Workspace 创建独立服务、数据库、schema、role、bucket、PVC、队列或 Runtime。
+- 当前阶段实现多副本调度、跨地域 active-active 或 PostgreSQL 在线分片迁移。
+- 第一版自定义角色、SAML、SCIM 或企业离线授权。
+- 第一版 Workspace 级 BYOK E2B WebUI 配置。
+- Cloud v2 首期 stdio MCP。
+- Workspace export、释放、单租户恢复和在线迁移的具体产品流程。
+
+历史客户数据、账户和财务记录如需迁移，应单独立项；旧部署拓扑不作为本架构的设计约束。
+
+## 3. 术语与不变量
+
+### 3.1 术语
+
+| 术语 | 定义 |
+| --- | --- |
+| Account | 登录主体。OSS 中是实例本地账户；SaaS 中是全局账户 |
+| Workspace | 逻辑 LangBot 实例内的租户，是资源、成员、权限、用量和不可信执行的首要边界 |
+| Membership | Account 与 Workspace 的关系，包含固定角色、状态和权限版本 |
+| Invitation | 邀请一个 Account 或邮箱加入 Workspace 的一次性凭证 |
+| Logical Instance | 对外唯一的 LangBot 服务与安全域，拥有稳定 `instance_uuid`，不等同于某个进程或 Pod |
+| Replica | Core、Plugin Runtime 或 Box Runtime 的短期内部运行副本，不是产品实体 |
+| Execution Generation | Workspace 执行所有权和撤销的单调代数，用于隔离旧任务、旧连接和故障转移 |
+| Billing Account | SaaS 付款主体，可以为一个或多个 Workspace 付费 |
+| Entitlement | Control Plane 签发、Core 与 Runtime 本地执行的功能和数值额度快照 |
+| Cloud Control Plane | 闭源 SaaS 控制面，管理全局身份、Workspace 目录、订阅、权益、计费和生命周期 |
+| LangBot Core | 开源数据面，执行 Bot、Pipeline、Plugin、MCP、RAG 等业务并实施最终授权与隔离 |
+
+当前代码中的 `placement_generation` 字段在迁移完成前保留兼容；其架构语义和目标命名均为
+`execution_generation`，不表达 Workspace 属于某个产品级部署单元。
+
+### 3.2 必须始终成立的不变量
+
+1. SaaS 只有一个稳定 `instance_uuid`；所有副本共享该身份。
+2. `replica_id`、`worker_id`、Pod 名称、进程地址和数据库连接地址都是短期运行信息，不能进入业务资源的永久主键或外部 URL。
+3. `workspace_uuid` 是租户数据、任务、缓存、文件、日志、用量和运行时隔离的稳定键，也是未来内部路由与分片的候选键。
+4. OSS 一个实例最多一个 Workspace；SaaS 才能激活多个 Workspace。
+5. 一个 Workspace 可以有多个 Account；一个 SaaS Account 可以加入多个 Workspace。
+6. 所有租户业务资源都具有非空 `workspace_uuid`，并使用 `(workspace_uuid, resource_uuid)` 定位。
+7. Workspace 选择器只是路由输入，不是授权凭证；服务端必须重新验证 Account、Membership、资源所有权和权限。
+8. API Key、Bot、Webhook、后台任务、Plugin 与 Box 调用从可信所有权或绑定派生 Workspace，不能信任调用方自报 scope。
+9. SaaS 缺少有效 Workspace 上下文时必须失败关闭，不能回退到第一个、最近或 OSS 默认 Workspace。
+10. Core 是资源访问、运行时授权和 entitlement 执行的最后一道边界；Control Plane 不同步代理每条消息或普通资源请求。
+11. 一个不可信插件进程只能属于一个 installation；一个 sandbox/session 只能属于一个 Workspace。
+12. execution generation 失效后，旧任务、连接、回调和副作用必须被拒绝。
+13. 本地进程表、缓存和临时目录都可重建，不能成为 desired state、撤销状态或业务数据的唯一真相。
+14. 创建空 Workspace 不启动插件 worker、sandbox 或租户专属常驻组件。
+15. 未来横向扩展不能改变 Workspace UUID、外部 API、权限模型或隔离语义。
+
+## 4. 产品与部署模型
+
+### 4.1 SaaS 逻辑拓扑
+
+```mermaid
+flowchart LR
+    User["Browser / API / Bot traffic"] --> Edge["SaaS Edge"]
+    User --> CP["Closed Cloud Control Plane<br/>directory + subscription + billing"]
+    Edge --> Core["One logical LangBot instance<br/>Core replica pool; MVP = 1"]
+    CP -->|"signed manifest, directory projection,<br/>entitlement and desired state"| Core
+    Core -->|"usage outbox and observed state"| CP
+    Core --> PG["Shared PostgreSQL business database<br/>RLS + pgvector"]
+    Core --> PluginRT["Shared Plugin Runtime<br/>trusted supervisor"]
+    Core --> BoxRT["Shared Box Runtime<br/>trusted supervisor"]
+    PluginRT --> PluginA["Workspace A installation<br/>isolated nsjail process"]
+    PluginRT --> PluginB["Workspace B installation<br/>isolated nsjail process"]
+    BoxRT --> SandboxA["Workspace A<br/>persistent global logical sandbox"]
+    BoxRT --> SandboxB["Workspace B<br/>persistent global logical sandbox"]
+    Core --> ObjectStore["Shared durable object storage<br/>Workspace-scoped keys"]
+```
+
+这里的“一个逻辑实例”是一个服务、安全域和稳定身份，不是“永远只有一个 OS 进程”。
+MVP 不实现分布式，但从第一天保留内部扩展所需的身份、幂等、generation 和 owner 抽象。
+
+### 4.2 容量演进
+
+| 阶段 | 内部部署形态 | 新 Workspace 静态成本 | 启用条件 |
+| --- | --- | --- | --- |
+| M0 单副本 MVP | 一个 Core、一个共享 Plugin Runtime、一个共享 Box Runtime、一个 PostgreSQL business database | 只新增目录和业务行 | 当前目标 |
+| M1 同逻辑实例横向扩展 | 按容量增加 Core/Runtime 副本；使用 owner lease、fencing 和 generation；PostgreSQL 可增加 shared shard | 不创建 Workspace 专属部署 | 出现容量或可用性证据后 |
+| M2 Dedicated 资源等级 | 特定 workload 使用独享 worker pool、sandbox class 或 database shard，但沿用相同身份、协议和 schema | 仅购买该等级的客户承担 | 合规、驻留或超大负载需求 |
+
+M1 是 M0 的透明扩容，M2 是相同架构下的资源等级。外部 API 只认识稳定的
+`instance_uuid` 和 `workspace_uuid`，不认识 replica、worker、pool 或 shard。
+
+### 4.3 当前不做分布式时必须预留的能力
+
+1. 运行时协议携带稳定 `instance_uuid`、`workspace_uuid` 和 `execution_generation`，不依赖进程地址表达身份。
+2. Plugin installation 和 Box session 使用稳定 owner 抽象；启用第二个副本前再实现带 expiry、CAS 和 fencing token 的 lease。
+3. 创建、重试、回调、worker 注册和 outbox 使用稳定 idempotency key，重复投递不能产生第二个 owner 或副作用。
+4. Repository/UoW 不允许无边界跨 Workspace 事务；`workspace_uuid` 可直接作为未来 shard key。
+5. schema migration、后台任务扫描、监控聚合和运维接口不能假设永远只有一个 Core 进程。
+6. Runtime 重启通过 durable desired state reconciliation 恢复，不依赖原进程或本地 cache。
+7. 只有出现容量、可用性、地域或合规证据后才增加副本、lease store 或 shard router；预留协议不等于提前部署组件。
+
+### 4.4 组件边界
+
+- Core、Plugin Runtime 和 Box Runtime 必须保持独立进程身份、容器和 security context。M0 中 Core 与 Plugin Runtime
+  需要处于同一 rollout/restart unit；在实现受认证 takeover 或 owner lease/fencing 前，Core 不能单独重启后接管仍存活的 Runtime。
+- Core 不能继承 nsjail、cgroup 或 mount namespace 所需的高权限。
+- Plugin Runtime 与 Box Runtime 不合并为一个高权限进程。
+- MVP 不新增 Runtime 专用数据库、Box 专用数据库、Kafka、Redis、租户级 scheduler 或 artifact service。
+- 可信 supervisor、数据库连接池、只读 artifact cache 和基础容量可以多租户共享。
+
+## 5. OSS 与 SaaS 产品行为
+
+### 5.1 能力矩阵
+
+| 能力 | OSS | SaaS |
+| --- | --- | --- |
+| Workspace 数量 | 实例固定一个 | Account 可拥有或加入多个，受 ProductPolicy 约束 |
+| Workspace 成员 | 多用户 | 多用户，受 entitlement 约束 |
+| 邀请成员 | 支持 | 支持 |
+| 固定 RBAC | 支持 | 支持 |
+| 自定义角色 | 不支持 | 后续商业能力 |
+| Workspace 创建 | 首次初始化创建唯一 Workspace | 注册自动创建个人 Workspace；后续创建受 ProductPolicy 约束 |
+| Workspace 切换 | 无需展示 | 支持 |
+| 订阅与计费 | 无远端依赖 | 闭源 Control Plane 管理 |
+| 租户隔离 | 完整实现 | 完整实现 |
+
+OSS edition policy 应表达为：
+
+```text
+workspace_limit = 1
+members_enabled = true
+invitations_enabled = true
+fixed_rbac_enabled = true
+multi_workspace_enabled = false
+```
+
+不能用 `member_limit = 1`、关闭邀请或移除 RBAC 来实现单租户限制。
+
+### 5.2 OSS 初始化和邀请
+
+首次初始化在一个事务中完成：
+
+1. 创建本地 Account。
+2. 创建实例唯一 Workspace。
+3. 创建 owner Membership。
+4. 创建默认 Pipeline、metadata 等 Workspace 初始资源。
+5. 标记实例初始化完成。
+
+初始化后默认关闭公开注册。后续用户由 owner/admin 创建一次性 Invitation，注册或登录后接受邀请并加入唯一 Workspace。
+OSS 后续注册不创建第二个 Workspace。未配置 SMTP 时，系统返回只展示一次的邀请链接供管理员通过可信渠道发送。
+
+### 5.3 SaaS 注册和邀请
+
+普通注册由 Control Plane 通过幂等工作流完成：
+
+1. 创建或确认全局 Account 与 AuthIdentity。
+2. 创建 personal Workspace 和 owner Membership。
+3. 创建初始 Subscription/Entitlement 投影。
+4. 完成 verified email、速率限制和基础风控。
+5. 将 Account、Workspace 和 Membership 投影到 Core。
+6. Core 达到要求的目录 revision 后返回可访问 route。
+
+注册只创建逻辑记录，不启动 Runtime 或租户专属基础设施。
+
+通过邀请注册的新用户也创建自己的 personal Workspace，同时加入受邀 Workspace；已注册用户接受邀请时只新增目标 Membership。
+个人 Workspace 与团队 Workspace 的付费关系必须由 ProductPolicy 明确，不允许代码根据名称或创建路径隐式推断。
+
+### 5.4 Invitation 安全规则
+
+- token 使用至少 256-bit 加密安全随机数，数据库只保存 hash。
+- token 具有 `expires_at`、`accepted_at`、`revoked_at`，只能使用一次。
+- Membership 创建与 token 消费在同一事务中提交。
+- Invitation 不能授予 owner；owner 转移使用独立流程。
+- SaaS 接受邀请时必须验证目标邮箱；OAuth 邮箱相同不能跳过 token 和显式确认。
+- Workspace 必须始终至少有一个 active owner；admin 不能移除或降级 owner。
+- 浏览器邀请链接把 secret 放在 URL fragment 中，页面读取后立即清除 fragment，并只短期保存在 `sessionStorage`。
+
+### 5.5 固定 RBAC
+
+Core 权威定义 `owner`、`admin`、`developer`、`operator` 和 `viewer` 固定角色。
+权限按能力划分，例如资源查看、资源管理、运行操作、成员管理、provider secret 管理、审计查看和数据导出。
+
+规则：
+
+- 普通资源可见性不自动授予 secret 可见性。
+- 跨 Workspace 猜测资源 UUID 返回 404，不泄露存在性。
+- 同 Workspace 资源存在但缺少权限时返回 403。
+- 最后一个 owner 不能被删除或降级。
+- 前端隐藏或禁用无权限入口只改善体验；后端仍必须执行所有授权检查。
+
+## 6. 开源与闭源职责边界
+
+### 6.1 LangBot Core OSS
+
+Core 负责：
+
+- 本地 Account、Workspace、Membership 和 OSS Invitation。
+- 固定 RBAC 与单 Workspace edition policy。
+- 业务资源及其 Workspace scope。
+- HTTP、WebSocket、后台任务和运行时请求上下文。
+- Plugin、MCP、RAG、Box、Session、Storage 和 Monitoring 隔离。
+- SaaS Account/Workspace/Membership 的版本化执行投影。
+- InstanceManifest、EntitlementSnapshot 和 Runtime 控制通道验证。
+- 通用 capability 与数值 quota enforcement。
+- UsageEvent/business outbox 和基础安全审计。
+
+Core 是 Bot、Pipeline、Model、Knowledge、Plugin installation、MCP configuration 和 Monitoring 数据的权威来源，
+也是每个业务和运行时请求的最终授权边界。
+
+### 6.2 Closed Cloud Control Plane
+
+Control Plane 负责：
+
+- SaaS 全局 Account、AuthIdentity、Session、OIDC 和后续 SSO。
+- SaaS Workspace、Membership 和 Invitation 的权威目录。
+- Workspace 创建、暂停、归档和删除工作流。
+- BillingAccount、Product、PlanVersion、Price、Subscription、Invoice、Refund 和 provider event。
+- Entitlement 计算、签名与版本。
+- Usage ledger、聚合、额度和欠费策略。
+- 实例 manifest、release、capacity、内部 desired state 和 observed state。
+- SaaS 运营后台、平台角色和高级审计。
+
+首期不把这些职责拆成多个租户、计费和调度微服务。推荐以一个独立于 Core 的闭源模块化单体承载，
+并通过模块边界复用已有账户、OAuth、支付、邮件和运营能力。历史 Cloud 的租户专属部署代码不复用。
+
+Control Plane 不保存 Bot、Pipeline、Model 或 Knowledge 等业务内容，也不代理普通消息执行。
+
+### 6.3 SaaS Adapter
+
+Core 中只保留薄的协议适配层：
+
+- 验证 InstanceManifest、Account token 和 JWKS。
+- 消费 DirectoryEvent 并写入本地投影。
+- 缓存并验证 EntitlementSnapshot。
+- 将 UsageEvent 写入 durable outbox。
+- 接收 execution desired state 并上报 observed state。
+
+适配层不得 monkey patch ORM、绕过 Core 权限检查或在普通资源请求中同步调用 Control Plane。
+
+### 6.4 Source of Truth
+
+| 数据 | OSS | SaaS |
+| --- | --- | --- |
+| Account、Workspace、Membership | Core 本地数据库 | Control Plane 权威，Core 保存版本化投影 |
+| Invitation | Core 本地数据库 | Control Plane 权威，不向 Core 投影 pending secret |
+| Bot、Pipeline、Model、KB、Plugin、MCP | Core | Core |
+| Subscription、Payment、Invoice、Usage ledger | 无远端依赖 | Control Plane |
+| Feature 和 quota | 本地 edition policy | Control Plane 签发，Core/Runtime 验证执行 |
+| Execution generation | OSS 固定本地值 | Control Plane desired state，Core 执行 |
+| 运行时授权 | Core | Core 根据本地投影和 entitlement 执行 |
+
+SaaS 不维护两套可写目录。Control Plane 是目录权威写模型；Core 只保存带 revision 的执行投影。
+
+## 7. 控制面协议
+
+### 7.1 InstanceManifest
+
+仅设置 `system.edition=cloud`、环境变量或前端 feature flag 不得启用 SaaS 多 Workspace。
+Cloud bootstrap 必须验证由预置根信任签名的 InstanceManifest，并据此安装闭源 Workspace policy。
+
+Manifest 至少绑定：
+
+```text
+iss, aud, sub, jti, iat, nbf, exp
+instance_uuid
+release
+capabilities
+tenant_isolation_version
+execution_generation
+delegated issuers and keyset revision
+```
+
+签名错误、audience 不匹配、过期、generation 回滚或信任链缺失时必须失败关闭，不能降级为 OSS 默认 Workspace。
+
+### 7.2 DirectoryEvent 与目录新鲜度
+
+Control Plane 通过 transactional outbox 发布 Account、Workspace 和 Membership 的版本化事件。
+Core 使用 inbox 按 `event_id` 去重，以 aggregate revision 拒绝旧写，并追踪连续应用水位。启动时读取一个 PostgreSQL
+`REPEATABLE READ` 事务内生成的签名全量 snapshot；运行时先消费携带当前 high-water 的签名事件页，再只请求该页涉及的 Workspace 签名增量。
+增量响应不携带新的事件 cursor，因此即使其内容已包含并发提交的后续 revision，也不能跳过尚未消费的事件。
+
+要求：
+
+- 事件和 batch 经过实例绑定的强认证与签名。
+- 重复、乱序、延迟、断流和全量 replay 都安全。
+- 删除使用 tombstone。
+- 新实例先导入带 high watermark 的 snapshot，再消费增量。
+- 常态目录更新成本与本页发生变化的 Workspace 数量相关，不得为每个 `directory.changed` 重新读取和投影全部 Workspace。
+- 每个 Core replica 独立保存进程内消费 cursor，以确保各自的 entitlement cache 都看到事件；共享 PostgreSQL 保存投影
+  high-water mark、全量 snapshot coverage 和 inbox。同一事件被多个 replica 消费时，第二个 replica 验证已有 receipt；
+  snapshot coverage 内缺少的 receipt 可以补写，coverage 之外缺失则失败关闭。只有本地 cursor 追平签名 high-water 后才续期 ready。
+- projection 未就绪或落后于授权 lease 要求时，交互与自动化请求按策略失败关闭。
+- SaaS pending Invitation、email 和 token hash 不进入 Core 投影。
+
+MVP 可采用一个共享、原子且可恢复的 Control Plane store；未来多副本不能继续使用进程内状态承担一次性 token 或目录水位。
+
+### 7.3 EntitlementSnapshot
+
+Entitlement 使用版本化签名快照，至少绑定：
+
+```text
+instance_uuid
+workspace_uuid
+plan_revision
+entitlement_revision
+status
+features
+limits
+nbf, exp, grace_until
+```
+
+Core 校验 issuer、audience、subject、instance、revision、时间和签名；旧 revision 不覆盖新快照。
+套餐名称和价格规则只存在于闭源 Control Plane，Core 与 Runtime 只理解通用 capability 和数值限额。
+
+Control Plane 故障时，已缓存且仍有效的快照可继续执行；过期后只能进入明确、有限的 grace 模式或失败关闭。
+
+### 7.4 UsageEvent 与 outbox
+
+用量事件 append-only、至少一次投递，Control Plane 按 `event_id` 去重。事件至少包含：
+
+```text
+event_id
+instance_uuid
+workspace_uuid
+execution_generation
+meter
+quantity_integer
+unit
+source
+occurred_at
+entitlement_revision
+schema_version
+```
+
+Core 不计算账单金额，也不在普通请求中同步扣费。业务写入与相应 business outbox 必须在同一事务中提交；
+generation-aware write fence 与 outbox 原子性尚是 SaaS 激活门禁。
+
+### 7.5 Desired state 与 observed state
+
+闭源控制面发布版本化的 release、capacity 和 execution desired state，Core/Runtime 幂等 reconcile 并上报 observed state。
+desired state 只描述同一逻辑实例内部的执行所有权和容量，不产生新的产品级实例或租户实体。
+
+Workspace 安全状态由 directory revision 决定，订阅状态由 entitlement revision 决定，执行撤销由
+execution generation 决定。三者取最严格有效状态，但任何通道都不能修改另一个通道的权威字段。
+
+## 8. 身份、鉴权与请求上下文
+
+### 8.1 上下文模型
+
+租户业务入口统一解析不可变的 `RequestContext`：
+
+```python
+@dataclass(frozen=True)
+class RequestContext:
+    instance_uuid: str
+    workspace_uuid: str
+    execution_generation: int
+    principal_type: str
+    principal_uuid: str
+    permissions: frozenset[str]
+    auth_method: str
+    entitlement_revision: int | None
+    request_id: str
+```
+
+不同入口的 Workspace 来源：
+
+| 入口 | Workspace 来源 |
+| --- | --- |
+| Browser Account token | `X-Workspace-Id` 只作候选；服务端校验 Membership |
+| API Key | key 记录绑定的 Workspace，忽略 caller selector |
+| Public Bot / Webhook | Bot 或 webhook route 的可信所有权 |
+| Background job | durable payload 中的完整 scope，执行前重新验证 generation |
+| Plugin Host API | 认证控制连接和 immutable action context |
+| Box operation | 已验证 entitlement、admission grant 和 Runtime namespace |
+| System operation | 显式、最小能力的 SystemContext，禁止隐式全局上下文 |
+
+禁止从模块全局变量、进程默认 Workspace、请求 payload 或“第一个 Workspace”推断 scope。
+
+### 8.2 Account token 与 Workspace discovery
+
+- 新 JWT 使用稳定 Account UUID 作为 `sub`，并绑定 issuer、当前 `instance_uuid` audience 和 expiry。
+- 账户级 Workspace discovery 是一个窄 bootstrap capability，只列出该 Account 的 active Membership，不能执行租户业务。
+- multi-Workspace 模式下，tenant route 缺少 selector 必须拒绝；OSS singleton 模式可由 policy 选择唯一 Workspace。
+- Account token 不直接证明任一 Workspace 权限；Membership 必须在服务端解析并验证状态与 revision。
+
+### 8.3 API Key、WebSocket 与长任务
+
+- API Key 只持久化 hash，raw secret 仅返回一次；记录绑定 Workspace、固定 scopes、状态、expiry 和 creator。
+- Dashboard WebSocket 在升级后认证，并在每条入站消息前重新验证 Account、Membership、权限、资源所有权和 generation。
+- 长时间 LLM、MCP、Plugin 或 Box 调用在产生副作用或接受结果前再次校验 execution generation。
+- 临时凭证交换绑定发起者、Workspace、instance 和 generation；其他 scope 查询返回与不存在相同的 404。
+
+### 8.4 错误语义
+
+| 场景 | 语义 |
+| --- | --- |
+| 未认证或 token 无效 | 401 |
+| 同 Workspace 资源存在但权限不足 | 403 |
+| 资源不存在或属于其他 Workspace | 404 |
+| edition / entitlement / quota 禁止 | 稳定领域错误码，不伪装为 500 |
+| execution generation 过期 | fail closed，并停止旧运行态 |
+| 未处理异常 | 稳定 `internal_error` + request ID；细节只进入服务端日志 |
+
+## 9. Core 数据模型
+
+### 9.1 Account、Workspace 与 Membership
+
+核心实体至少包含：
+
+```text
+Account
+  uuid
+  email_normalized
+  display_name
+  status
+  auth bindings
+
+Workspace
+  uuid
+  name
+  status
+  source: local | cloud_projection
+  directory_revision
+
+WorkspaceExecutionState
+  workspace_uuid
+  instance_uuid
+  execution_generation
+  status
+  write_fenced_at
+  revision
+
+WorkspaceMembership
+  workspace_uuid
+  account_uuid
+  role
+  status
+  directory_revision
+```
+
+约束：
+
+- Membership 对 `(workspace_uuid, account_uuid)` 唯一。
+- Workspace 的 source 不允许通过可变本地配置从 local 升级成 cloud projection。
+- Cloud projection 只有在 manifest、instance binding、目录 revision 和 execution state 均有效时才可路由。
+- OSS bootstrap 只创建或修复 local singleton Workspace。
+
+### 9.2 Invitation
+
+OSS Invitation 存在 Core 本地数据库；SaaS Invitation 只存在于闭源目录。
+
+```text
+WorkspaceInvitation
+  uuid
+  workspace_uuid
+  email_normalized
+  role
+  token_hash
+  expires_at
+  accepted_at
+  revoked_at
+  created_by
+```
+
+数据库约束必须保证同一 Workspace 与邮箱只有一个有效邀请，并保证 token hash 全局唯一。
+
+### 9.3 业务资源
+
+所有租户资源显式包含 `workspace_uuid`，包括但不限于：
+
+- Bot、Pipeline、Provider、Model、Knowledge Base 和 vector record。
+- Plugin installation、MCP configuration、API Key 和 webhook binding。
+- Query、Message、Session、Monitoring、Usage 和 AuditEvent。
+- Upload、ObjectRef、Skill、Runtime desired state 和 temporary credential session。
+
+唯一键、索引、缓存 key、object key、日志维度和幂等键都必须包含 Workspace scope。
+服务层不得暴露可绕过 Workspace 条件的普通 `get(id)`、`list()` 或 `delete(id)`。
+
+### 9.4 防御性约束
+
+- tenant table 的 `workspace_uuid` 非空并有外键。
+- SaaS PostgreSQL 关键表启用并强制 RLS。
+- 需要全局唯一的 opaque token 使用 hash 唯一索引，不依赖 Workspace 内唯一。
+- owner 保底、Membership revision、invitation one-shot 等规则同时由 service 和数据库事务保护。
+- 任何跨 Workspace 运维操作必须走显式受审计的 system capability，不得复用普通 repository。
+
+## 10. PostgreSQL、pgvector 与存储
+
+### 10.1 数据库边界
+
+- OSS 继续默认 SQLite，并可显式选择自托管 PostgreSQL。
+- SaaS 使用一个 PostgreSQL business database、一个 `public` shared schema 和共享连接池。
+- 创建 Workspace 不创建 database、schema、role 或专属连接池。
+- 每个 tenant transaction 使用 `SET LOCAL` 建立 scope，并由统一 TenantUnitOfWork 保证 context 与 SQL 使用同一事务和连接。
+- 应用层 Workspace scope 是第一道边界，`ENABLE` + `FORCE ROW LEVEL SECURITY` 是第二道边界。
+- runtime role 必须是非 owner、最小权限、无 superuser、无 `BYPASSRLS`、无 role membership 和跨 schema 权限。
+- schema、extension、policy 和 ACL 只由独立 release migrator 创建与验证；Cloud runtime 不执行 DDL。
+- PostgreSQL 仅承载业务数据和 pgvector，不成为 Plugin/Box 通用协调数据库、进程目录或新的控制面数据库。
+
+首期 migrator 和 runtime URL 必须连接同一个 host、port、database，但使用不同 role。
+生产部署还必须证明 runtime credential 无法连接 PostgreSQL 集群中的其他 database；专用 endpoint 或经验证的 HBA/proxy 隔离仍是激活门禁。
+
+### 10.2 Transaction 与后台任务
+
+- 一个 TenantUnitOfWork 只绑定一个 Workspace、一个 execution generation 和一个事务所有者任务。
+- 子任务不能继承并提交、回滚或关闭父任务的 tenant session。
+- 长时间 LLM 或网络等待不持有数据库连接；每次数据库 helper 打开短事务。
+- detached task 只在父事务提交后启动，并自行建立新 scope；父事务回滚时取消待启动任务。
+- generation-aware write fence 必须保持到 commit，并与 business outbox 原子提交；该能力完成前不得激活 SaaS 写流量。
+
+### 10.3 pgvector
+
+- SaaS 默认使用同一业务 PostgreSQL 中的 pgvector，不静默回退到 Chroma。
+- 向量身份至少为 `(workspace_uuid, knowledge_base_uuid, vector_id)`。
+- 向量操作使用相同 tenant context 与 RLS 契约。
+- embedding 维度显式存储和校验；不匹配时失败关闭，不截断、补齐或改用无界扫描。
+- extension、表、constraint 和 ANN index 由 release migration 创建。
+- OSS 默认仍可使用 SQLite + Chroma；选择 pgvector 时遵守相同 scope。
+
+### 10.4 Object storage
+
+- 大对象、plugin artifact、upload、knowledge 文件和 sandbox 文件不作为 PostgreSQL blob 存储。
+- durable object key 和 metadata 都包含 Workspace scope；临时 staging 可包含 generation，但稳定业务引用不能因未来 generation 切换而永久失效。
+- 现有 generation-scoped opaque key 在固定 generation 的 OSS 中安全，但 Cloud cutover 前必须实现稳定 final identity 或原子引用迁移。
+- public image 与 private document 使用不同 capability；不能把通用 upload key 当作公开读取凭证。
+
+## 11. Plugin Runtime
+
+### 11.1 共享 supervisor、独立 worker
+
+整个逻辑实例共享一个可信 Plugin Runtime 逻辑控制面；M0 由一个 supervisor replica 承担。新 Workspace 不创建专属 Runtime、连接、卷或进程。
+
+每个运行中的 plugin installation 独占一个 nsjail worker process tree；enabled-resident 是 desired semantics。worker 运行期间永久绑定：
+
+```text
+instance_uuid
+workspace_uuid
+execution_generation
+installation_uuid
+runtime_revision
+artifact_digest
+```
+
+插件不能通过 payload、Host API 参数、环境变量或重连改变该绑定。Supervisor 不在自身解释器中加载第三方插件代码。
+停用、删除、revision/generation 变化或 entitlement 撤销时，旧 worker 必须停止并失去 Host API 权限。
+
+### 11.2 文件和进程边界
+
+```text
+data/plugin-runtime/
+├── artifacts/sha256/<artifact_digest>/code/   # 已验证、只读共享
+├── environments/sha256/<environment_digest>/ # 原子发布、只读共享
+└── installations/<installation_uuid>/
+    ├── home/                                  # 私有可写
+    ├── tmp/                                   # 私有可写
+    └── data/                                  # 私有持久数据
+```
+
+- 同插件同版本只有在 package digest 完全相同且完整性已验证时才共享只读代码。
+- dependency environment key 包含 artifact/requirements digest、Python ABI、Runtime version 和 installer schema。
+- installation 进程、配置、secret、home、tmp、data 和日志永不合并。
+- namespace、private `/proc`、mount、PID、IPC、UTS、cgroup 与 rlimit 阻止读取其他文件、枚举或 signal 其他进程。
+- Cloud 不从 artifact 自动加载 `.env`；secret 只由可信控制面按 installation 注入。
+- 插件 egress 必须阻止访问 Core loopback、Box Runtime、数据库和平台 metadata endpoint。
+
+### 11.3 统一资源上限
+
+资源限制只来自实例级 `data/config.yaml`，并支持现有环境变量覆写；plugin manifest 不能声明、放宽或覆盖。
+
+```yaml
+plugin:
+  worker:
+    max_cpus: 1.0
+    max_memory_mb: 512
+    max_pids: 128
+    max_open_files: 256
+    max_file_size_mb: 512
+    require_hard_limits: true
+```
+
+CPU、内存和 PID 使用 cgroup 硬限制，open files 和单文件大小使用 rlimit。
+Cloud deployment profile 强制 nsjail；硬限制不可用时 readiness 失败，不能降级为普通子进程。
+installation 总磁盘配额需要可原子拒绝写入的 quota provider，不能以目录扫描冒充硬限制。
+
+### 11.4 Desired state 与恢复
+
+- PostgreSQL 中的 installation desired state 与 durable binary storage 是权威状态。
+- Runtime 本地进程表、nsjail 目录、artifact/venv cache 都可重建。
+- Runtime 重连执行实例范围 full reconciliation，清理 stale worker 并恢复 enabled installation。
+- dependency preparation 失败记录在对应 installation，不启动半就绪 worker，也不阻塞其他 installation。
+- desired semantics 要求 enabled installation 常驻，不做 idle eviction；是否按负载回收以后再决定。
+- 当前 Supervisor 已在意外退出时通过 completion callback 和有界指数 backoff 恢复 enabled worker。
+  Cloud 激活前仍需加入 jitter、全局重启并发上限和 Runtime 级 circuit breaker，并验证系统性故障不会形成跨租户重启风暴。
+
+真实 Linux/nsjail/cgroup 与受控 egress 的 Cloud 部署验证尚未完成，是生产激活门禁。
+
+## 12. Box Runtime 与 stdio MCP
+
+### 12.1 共享 Box 控制面
+
+整个逻辑实例共享一个可信 Box Runtime 逻辑控制面；M0 由一个 Runtime replica 承担。Core 与 Runtime 控制通道绑定稳定 instance identity，
+每个 operation 绑定 `workspace_uuid`、`execution_generation`、session revision 和短期 admission grant。
+
+首期 entitlement 模型：
+
+```json
+{
+  "features": {
+    "managed_sandbox": true,
+    "external_sandbox": false
+  },
+  "limits": {
+    "managed_sandbox_sessions": 1
+  }
+}
+```
+
+闭源订阅模块把套餐映射为该通用 capability；Core 与 Runtime 不判断 `plan == pro`。
+预期 Pro 得到 `managed_sandbox_sessions = 1`，其他套餐为 `0`。
+
+### 12.2 Sandbox 模型
+
+- 合资格 Workspace 首次使用时懒创建一个持久 `global` 逻辑 session。
+- `global` 表示 Workspace 内默认逻辑 sandbox，不表示跨 Workspace 共享。
+- session TTL 不自动回收；Runtime 重启后进程和临时目录失效，但 `/workspace` 持久数据保留。
+- 每次普通命令在 Box Runtime 容器内启动一个 one-shot nsjail 子进程。
+- 首期禁止 managed background process 和 network，避免 session 被当成常驻共享主机。
+- Core 与 Runtime 通过认证 random-marker challenge 证明看到同一 durable volume，不能只比较路径字符串。
+- 文件同步、attachment 和 skill mount 沿用现有 nsjail 机制，但所有 host path 解析必须由可信 Workspace context 派生并防止 symlink/path escape。
+
+Cloud readiness 必须证明 cgroup、namespace、mount、Workspace/Skill/ephemeral byte quota 和 inode quota 均为硬限制。
+当前普通 nsjail backend 不具备全部硬磁盘能力，因此 Cloud Box 应失败关闭，直到绿地部署提供并验证真实 quota provider；
+不能把软目录扫描写成“生产已就绪”。
+
+### 12.3 外部 E2B
+
+非 Pro 用户后续可在 WebUI 配置 Workspace 自有的远程 E2B sandbox。该功能尚未实现，首期不纳入。
+未来 credential 必须属于 Workspace、加密存储且读取受 secret 权限保护，不消耗 Cloud managed sandbox 配额。
+
+### 12.4 stdio MCP 独立开关
+
+```yaml
+mcp:
+  stdio:
+    enabled: true
+```
+
+- OSS 默认 `true` 保持兼容。
+- Cloud v2 通过 `MCP__STDIO__ENABLED=false` 强制关闭。
+- 该 gate 独立于 `box.enabled`、managed sandbox entitlement 和 session quota。
+- gate 同时覆盖 create、update、test、bootstrap load 和最终 Runtime execution。
+- 已有 stdio 配置在 gate 关闭时保留但不启动，并返回明确的 feature-disabled 错误。
+- HTTP/SSE 等远程 MCP transport 不受影响。
+
+## 13. HTTP API 与 WebUI
+
+### 13.1 Core API
+
+OSS 与 SaaS 执行面共用通用 Workspace API：
+
+```text
+GET    /api/v1/workspaces
+GET    /api/v1/workspaces/{workspace_uuid}
+GET    /api/v1/workspaces/{workspace_uuid}/members
+POST   /api/v1/workspaces/{workspace_uuid}/invitations
+PATCH  /api/v1/workspaces/{workspace_uuid}/members/{account_uuid}
+DELETE /api/v1/workspaces/{workspace_uuid}/members/{account_uuid}
+```
+
+Cloud policy 下，目录 mutation 由闭源 Control Plane 负责；Core 对本地创建、邀请和成员修改返回稳定的
+`control_plane_required`，只提供执行投影的安全读取。
+
+所有 tenant resource route 必须经过统一 decorator/middleware：
+
+1. 认证 principal。
+2. 解析可信 Workspace。
+3. 校验 Workspace/ExecutionState。
+4. 校验 Membership 或资源绑定。
+5. 校验 permission 和 entitlement。
+6. 创建 RequestContext 与 TenantUnitOfWork。
+
+### 13.2 SaaS Control Plane API
+
+SaaS 产品 API 包含：
+
+```text
+POST /cloud/workspaces
+GET  /cloud/workspaces
+POST /cloud/workspaces/{workspace_uuid}/invitations
+POST /cloud/invitations/{token}/accept
+GET  /cloud/workspaces/{workspace_uuid}/subscription
+POST /cloud/workspaces/{workspace_uuid}/checkout
+GET  /cloud/workspaces/{workspace_uuid}/usage
+```
+
+这些 API 管理目录、产品和计费，不直接操作 Bot/Pipeline 等 Core 业务资源。
+
+### 13.3 WebUI
+
+OSS：
+
+- 首次注册进入唯一 Workspace。
+- owner/admin 可邀请成员并管理固定角色。
+- 不展示 Workspace 切换器和创建第二 Workspace 的入口。
+
+SaaS：
+
+- 登录先获取 Account 级 Workspace 列表，再显式选择当前 Workspace。
+- 当前 Workspace UUID 保存在受控客户端状态中；所有 tenant request 自动附带 selector。
+- 切换 Account 或 Workspace 时清理缓存、WebSocket、上传、表单、错误和 optimistic state，不能显示前一租户数据。
+- 页面 refresh、新 tab 和邀请跳转恢复同一个经过授权的 Workspace；失效 Membership 不回退到其他 Workspace。
+- UI 权限变化必须响应式更新，但 API 仍是最终授权边界。
+
+## 14. 故障、安全与降级
+
+### 14.1 Fail-closed 场景
+
+以下情况必须拒绝新的租户业务和副作用：
+
+- Cloud manifest 缺失、签名失败、audience 错误或回滚。
+- Account token、Membership、Workspace status 或 execution generation 无效。
+- 目录投影未就绪或落后于有效 lease 要求。
+- Entitlement 缺失、过期且不在明确 grace 范围内。
+- Runtime 控制通道认证失败或实例绑定不一致。
+- Plugin nsjail/cgroup hard limit 在 Cloud profile 下不可用。
+- Box 的任一硬存储或 namespace capability 无法证明。
+- PostgreSQL RLS、runtime role、schema、catalog 或 endpoint 隔离校验失败。
+- stdio MCP 在 Cloud profile 下被尝试启用。
+
+不能把上述错误静默降级为 OSS singleton、普通子进程、Chroma、软 quota 或 caller-supplied Workspace。
+
+### 14.2 撤销语义
+
+- Membership 删除或降权必须影响下一次 HTTP 请求，并使长连接在下一条消息前重新授权。
+- Workspace 暂停禁止新交互、自动化工作负载和新副作用；恢复只允许当前 generation。
+- entitlement 到期按 capability 明确停止新创建或新执行，不隐式删除已有数据。
+- generation 变化使旧 worker、session、callback、cached runtime object 和 outbox publisher 失效。
+- 控制面暂时不可达时，只能在有效签名快照和本地投影允许的范围内继续；过期后失败关闭。
+
+### 14.3 安全清单
+
+- 所有 identifier 使用不可猜 UUID，但不把随机性当成授权。
+- 所有 token/secret 只存 hash 或加密值，raw secret 一次展示。
+- 日志、trace、metric、cache 和 object key 都包含 Workspace 维度并过滤 secret。
+- Provider、Bot、Plugin、MCP 配置的 read response 递归遮蔽 credential。
+- Runtime control、debug、registration 和 attachment capability 分离，不能复用万能 secret。
+- untrusted code 不访问 Core loopback、数据库、其他 Runtime、宿主文件系统或 metadata endpoint。
+- bulk operation、后台扫描和 monitoring 聚合使用显式 tenant/system capability。
+- 所有跨 Workspace 运维操作记录 principal、reason、scope、request ID 和结果。
+
+## 15. 实现状态与 SaaS 激活门禁
+
+### 15.1 已实现的隔离内核
+
+当前分支已经实现或具备基础的部分包括：
+
+- OSS singleton Workspace、多 Account、Invitation 和固定 RBAC。
+- trusted RequestContext、Workspace-scoped repository 和资源所有权检查。
+- tenant-aware Plugin SDK protocol 与 Runtime installation binding。
+- shared Plugin Runtime / Box Runtime 控制协议和 execution generation fence。
+- stdio MCP 独立 gate。
+- PostgreSQL shared schema、transaction-local scope、FORCE RLS 与 pgvector adapter。
+- Cloud bootstrap 默认不可由普通配置激活，并对缺失安全能力失败关闭。
+
+这些是代码能力边界，不等于完成闭源 SaaS 产品或生产部署验收。
+
+### 15.2 尚未完成的激活门禁
+
+以下事项完成并取得真实环境证据前，不得宣称 Cloud v2 production-ready：
+
+1. 闭源 Control Plane 的全局目录、注册、邀请、订阅、计费、entitlement 签发和签名 manifest bootstrap；横向扩展前 OAuth exchange 与目录投影还必须使用原子共享存储。
+2. 普通业务写入贯穿 commit 的 generation-aware fence，以及与外部副作用同事务的 business outbox。
+3. generation cutover 后稳定的 durable object identity 或原子对象引用迁移。
+4. 所有 tenant-configurable outbound URL 的 SSRF 防护与 tenant-safe egress；Plugin Runtime 还需在真实 Linux/nsjail/cgroup v2 环境验证 namespace、资源限制和文件隔离。
+5. Plugin Runtime 已实现意外退出 worker 的 completion callback、有界 backoff 和自动恢复；Cloud 激活前增加全局重启风暴抑制并完成故障注入验证。
+6. Plugin installation data 的 production hard disk quota provider，能够在写入边界原子拒绝超额，不能以目录扫描代替。
+7. Box Runtime 的 production hard quota provider，包括 Workspace、Skill、root/tmp/home 的 byte 与 inode quota；真实部署还必须在启动和重连时通过共享卷 marker challenge。
+8. PostgreSQL runtime credential 的专用 endpoint 或 HBA/proxy 跨 database 隔离证明、生产 migration/rollback 流程，以及 legacy pgvector migration 失败后精确恢复 RLS/FORCE 并可安全重试的集成证据。
+9. 闭源目录事件、lease、snapshot、entitlement 和 usage/outbox 的重放、断流与灾难恢复验证。
+10. 真实浏览器多 Account/RBAC/邀请/刷新场景已完成；仍需生产 Runtime 重启、worker crash、断流、异常回滚和闭源 Control Plane 的 fault-injection 验收。
+
+### 15.3 有意暂缓的产品决策
+
+- Workspace 创建后的休眠、释放、删除和保留策略。
+- Workspace export 与单 Workspace restore。
+- 非 Pro Workspace 的 BYOK E2B WebUI。
+- 多副本 owner lease 的 store、TTL、fencing token 和转移顺序。
+- PostgreSQL shard resolver、在线迁移和 dedicated shard 产品规则。
+- artifact/cache 的签名来源、撤销、GC 和磁盘配额机制。
+- custom roles、SSO、SCIM 和企业合规能力。
+
+暂缓项不得被实现代码用隐式默认值提前固化。
+
+## 16. 实施顺序
+
+### Phase 0：契约和基线
+
+- 固定术语、RequestContext、角色矩阵、edition policy 和错误语义。
+- 建立升级备份、回滚和跨租户负向测试基线。
+
+### Phase 1：OSS tenancy kernel
+
+- Account、Workspace、Membership、Invitation。
+- singleton bootstrap、多用户邀请、RBAC 和前端权限。
+
+### Phase 2：数据与入口隔离
+
+- 为所有资源补充 Workspace scope。
+- HTTP、API Key、Bot、Webhook、WebSocket、后台任务和 storage 统一上下文。
+- SQLite migration recovery 与 PostgreSQL RLS 集成测试。
+
+### Phase 3：Runtime 与 SDK 隔离
+
+- Plugin installation binding、nsjail、资源上限和 artifact replay。
+- Box admission、session namespace、skill/attachment 文件边界。
+- MCP gate、RAG/vector 与 long-running generation revalidation。
+
+### Phase 4：闭源 SaaS 控制面
+
+- signed manifest bootstrap。
+- 全局目录、注册、邀请、Subscription、Entitlement 和 Usage ledger。
+- projection、lease、outbox、reconciliation 和运维后台。
+
+### Phase 5：生产部署激活
+
+- 真实 Linux Plugin/Box hard isolation。
+- PostgreSQL credential、migration、backup 和 rollback 验证。
+- 完整浏览器/API/Runtime E2E 和故障注入。
+- 所有激活门禁通过后才开启多 Workspace Cloud policy。
+
+### Phase 6：同逻辑实例内部扩展
+
+- 有容量证据后增加副本、owner lease 和 fencing。
+- 有地域、合规或规模证据后增加 shared/dedicated shard。
+- 保持外部身份、API 和 Workspace URL 不变。
+
+## 17. 测试与验收
+
+### 17.1 数据隔离
+
+- 两个 Workspace 使用相同 resource UUID、name、vector ID 和 cache key，不发生冲突或越权。
+- 故意遗漏应用层 Workspace filter 时，PostgreSQL RLS 仍阻止跨租户读写。
+- 连接池复用、异常回滚、子任务、后台任务和 transaction pooling 不残留 tenant context。
+- 跨 Workspace 猜测返回 404；同租户缺权限返回 403。
+
+### 17.2 产品行为
+
+- OSS 首个 Account 创建唯一 Workspace；第二个 Account 只能通过邀请加入；创建第二 Workspace 返回 edition error。
+- 邀请覆盖有效、已使用、撤销、过期、邮箱不匹配和并发接受。
+- owner/admin/developer/operator/viewer 的 API 和 WebUI 权限一致。
+- SaaS 普通注册和邀请注册都创建个人 Workspace，但不创建专属部署或 Runtime。
+
+### 17.3 Runtime
+
+- 两个 Workspace 安装同一已验证 artifact 时只共享只读 code/env，进程、secret、home/tmp/data、日志和 Host API 完全隔离。
+- cgroup、rlimit、namespace、egress 和 generation fence 在真实 Linux 环境生效。
+- Runtime restart/cache loss 通过 durable desired state 与 binary storage 恢复。
+- 两个 Workspace 的 Box session、files、process、skill、attachment 和 quota 完全隔离。
+- stdio MCP gate 对 UI、API、bootstrap 和最终 execution 同时生效。
+
+### 17.4 Control Plane 与故障
+
+- DirectoryEvent 重复、乱序、缺口、snapshot + replay 和过期 lease 均安全。
+- Entitlement 旧 revision、签名错误、过期和撤销均失败关闭。
+- UsageEvent 重放不重复计费；业务事务回滚不发送副作用。
+- Runtime、Core 或 Control Plane 重启不创建重复 Workspace、worker 或 sandbox。
+- manifest、数据库安全校验或 hard quota 缺失时实例保持不可激活，而不是静默降级。
+
+### 17.5 浏览器端到端
+
+真实浏览器至少覆盖：
+
+1. clean database 首位 owner 注册与 singleton Workspace bootstrap。
+2. owner 创建邀请，第二个用户注册/登录并接受。
+3. 角色在 viewer/operator/developer/admin 间变化时，导航、控制项和 API 结果同步变化。
+4. Account/Workspace 切换清空前一 scope 状态，refresh 和新 tab 恢复正确 Workspace。
+5. 第二 Workspace edition limit，以及 invitation used/revoked/expired/email mismatch 的可见错误。
+6. 直接 API 越权、伪造 selector 和跨租户 UUID 猜测不能绕过 UI。
+
+## 18. 最终结论
+
+Cloud v2 的产品模型只有一个逻辑 LangBot 实例和实例内多个 Workspace。
+当前选择单副本 MVP 是为了减少组件和新增租户成本，不是把单进程假设写进业务身份或协议。
+未来需要容量或高可用时，在同一逻辑实例内部增加 Core/Runtime 副本和 PostgreSQL shard，
+Workspace 的 UUID、权限、数据边界和外部 API 均保持不变。
+
+开源 Core 必须完整实现安全的 Workspace 隔离和 OSS 单 Workspace 多用户；闭源 Control Plane
+管理 SaaS 的全局目录、订阅、权益、计费和生命周期。共享可信控制面、连接池、只读 artifact 和数据库组件，
+同时让每个不可信插件进程、sandbox、secret、可写文件和 tenant transaction 保持独占边界，
+才能在不增加每租户部署的前提下最大化降低新增用户成本。
+
+在闭源控制面、事务 fence/outbox、真实 Runtime hard isolation、Box hard quota 和 PostgreSQL 生产隔离等门禁完成之前，
+本架构仍处于隔离内核阶段，不应被描述为可上线的 SaaS 多租户部署。
@@ -1,6 +1,6 @@
 [project]
 name = "langbot"
-version = "4.10.5"
+version = "4.10.6"
 description = "Production-grade platform for building agentic IM bots"
 readme = "README.md"
 license-files = ["LICENSE"]
@@ -39,6 +39,7 @@ dependencies = [
    "quart>=0.20.0",
    "quart-cors>=0.8.0",
    "requests>=2.33.0",
+    "regex>=2026.1.15",
    "slack-sdk>=3.35.0",
    "alembic>=1.15.0",
    "sqlalchemy[asyncio]>=2.0.40",
@@ -70,7 +71,7 @@ dependencies = [
    "chromadb>=1.0.0,<2.0.0",
    "qdrant-client (>=1.15.1,<2.0.0)",
    "pyseekdb==1.1.0.post3",
-    "langbot-plugin==0.4.13",
+    "langbot-plugin @ git+https://github.com/langbot-app/langbot-plugin-sdk.git@1d65ed301a6afc52150a998043f73cd6032c8162",
    "asyncpg>=0.30.0",
    "line-bot-sdk>=3.19.0",
    "matrix-nio>=0.25.2",
@@ -80,7 +81,7 @@ dependencies = [
    "pgvector>=0.4.1",
    "botocore>=1.42.39",
    "litellm>=1.0.0",
-    "valkey-glide>=2.4.1,<3.0.0",
+    "valkey-glide>=2.4.1,<3.0.0; sys_platform != 'win32'", # No Windows wheels are published
 ]
 keywords = [
    "bot",
@@ -13,6 +13,12 @@ testpaths = tests
 # Asyncio configuration
 asyncio_mode = auto

+# Resource leaks are often reported during object finalization and wrapped by
+# pytest. Keep both forms fatal so --disable-warnings cannot hide them.
+filterwarnings =
+    error::ResourceWarning
+    error::pytest.PytestUnraisableExceptionWarning
+
 # Output options
 addopts =
    -v
@@ -0,0 +1,466 @@
+#!/usr/bin/env python3
+"""Exercise long-lived Core registries and verify that they reach a plateau.
+
+This probe is intentionally separate from the default test suite because the
+audit profile creates tens of thousands of historical identities. It uses the
+real admission, eviction, and cleanup code while replacing external platform
+objects that are irrelevant to registry retention.
+"""
+
+from __future__ import annotations
+
+import argparse
+import asyncio
+import gc
+import json
+import time
+import tracemalloc
+from dataclasses import asdict, dataclass
+from types import SimpleNamespace
+from unittest.mock import patch
+
+import psutil
+
+from langbot.pkg.api.http.context import ExecutionContext
+
+# Import the Application graph before taskmgr. The production boot path has
+# this same ordering; importing taskmgr first exposes its historical cycle
+# through HTTP route annotations.
+from langbot.pkg.core import app as _core_app  # noqa: F401
+from langbot.pkg.core.taskmgr import AsyncTaskManager
+from langbot.pkg.pipeline.pool import QueryPool
+from langbot.pkg.pipeline.ratelimit.algos.fixedwin import FixedWindowAlgo
+from langbot.pkg.plugin.connector import PluginRuntimeConnector
+from langbot.pkg.platform.sources.websocket_adapter import (
+    WebSocketMessage,
+    WebSocketSession,
+)
+from langbot.pkg.provider.modelmgr.modelmgr import ModelManager
+from langbot.pkg.provider.session.sessionmgr import SessionManager
+from langbot_plugin.api.entities.builtin.provider.session import LauncherTypes
+
+
+@dataclass(frozen=True, slots=True)
+class ProbeScale:
+    query_churn_per_phase: int
+    session_churn_per_phase: int
+    rate_limit_churn_per_phase: int
+    task_churn_per_phase: int
+    websocket_churn_per_phase: int
+    empty_workspace_churn_per_phase: int
+
+
+SCALES = {
+    'quick': ProbeScale(
+        query_churn_per_phase=2_500,
+        session_churn_per_phase=500,
+        rate_limit_churn_per_phase=10_000,
+        task_churn_per_phase=1_000,
+        websocket_churn_per_phase=500,
+        empty_workspace_churn_per_phase=1_000,
+    ),
+    'audit': ProbeScale(
+        query_churn_per_phase=25_000,
+        session_churn_per_phase=2_500,
+        rate_limit_churn_per_phase=10_000,
+        task_churn_per_phase=5_000,
+        websocket_churn_per_phase=2_500,
+        empty_workspace_churn_per_phase=10_000,
+    ),
+}
+
+
+class _ProbeQuery:
+    """Small weak-referenceable stand-in for SDK Query construction."""
+
+    def __init__(self, **values):
+        self.__dict__.update(values)
+
+
+class _EmptyResult:
+    def all(self) -> list:
+        return []
+
+
+class _EmptyPluginRuntimeHandler:
+    async def reconcile_plugin_installations(self, _states: tuple) -> dict:
+        return {
+            'applied': [],
+            'removed': [],
+            'missing_artifacts': [],
+            'failed_installations': [],
+        }
+
+    def unregister_installation_binding(self, _binding) -> None:
+        raise AssertionError('An empty Workspace exposed an installation binding')
+
+
+@dataclass(frozen=True, slots=True)
+class ProcessSample:
+    rss_bytes: int
+    traced_current_bytes: int
+    traced_peak_bytes: int
+    asyncio_tasks: int
+    threads: int
+    open_fds: int | None
+
+
+def _sample_process() -> ProcessSample:
+    gc.collect()
+    process = psutil.Process()
+    try:
+        open_fds = process.num_fds()
+    except (AttributeError, psutil.Error):
+        open_fds = None
+    traced_current, traced_peak = tracemalloc.get_traced_memory()
+    return ProcessSample(
+        rss_bytes=process.memory_info().rss,
+        traced_current_bytes=traced_current,
+        traced_peak_bytes=traced_peak,
+        asyncio_tasks=len(asyncio.all_tasks()),
+        threads=process.num_threads(),
+        open_fds=open_fds,
+    )
+
+
+def _execution_context(index: int, *, query_uuid: str | None = None) -> ExecutionContext:
+    return ExecutionContext(
+        instance_uuid='runtime-resource-probe',
+        workspace_uuid=f'workspace-{index}',
+        placement_generation=1,
+        bot_uuid='probe-bot',
+        pipeline_uuid='probe-pipeline',
+        query_uuid=query_uuid,
+    )
+
+
+class CoreRuntimeProbe:
+    """Own the same manager instances across two equal churn phases."""
+
+    def __init__(self) -> None:
+        self.query_pool = QueryPool(max_queries=100, max_queries_per_workspace=1)
+        app = SimpleNamespace(
+            event_loop=asyncio.get_running_loop(),
+            persistence_mgr=None,
+            instance_config=SimpleNamespace(
+                data={
+                    'concurrency': {'session': 1},
+                    'system': {
+                        'session_retention': {
+                            'idle_ttl_seconds': 86_400,
+                            'max_entries': 200,
+                            'max_entries_per_workspace': 200,
+                            'max_conversations_per_session': 20,
+                            'max_messages_per_conversation': 100,
+                        },
+                        'task_retention': {
+                            'completed_limit': 200,
+                            'max_log_chars': 4_096,
+                            'max_active_user_tasks': 256,
+                            'max_active_user_tasks_per_workspace': 8,
+                        },
+                    },
+                }
+            ),
+        )
+        self.session_manager = SessionManager(app)
+        self.task_manager = AsyncTaskManager(app)
+        self.rate_limit = FixedWindowAlgo(SimpleNamespace())
+        self.websocket_session = WebSocketSession(
+            'resource-probe',
+            max_conversations=200,
+            max_messages=100,
+        )
+        logger = SimpleNamespace(
+            debug=lambda *_args, **_kwargs: None,
+            info=lambda *_args, **_kwargs: None,
+            warning=lambda *_args, **_kwargs: None,
+            error=lambda *_args, **_kwargs: None,
+        )
+        self.empty_model_queries = 0
+
+        async def execute_empty(_statement):
+            self.empty_model_queries += 1
+            return _EmptyResult()
+
+        model_app = SimpleNamespace(
+            logger=logger,
+            persistence_mgr=SimpleNamespace(execute_async=execute_empty),
+        )
+        self.empty_model_manager = ModelManager(model_app)
+
+        async def runtime_disconnect_callback(_connector) -> None:
+            return None
+
+        plugin_app = SimpleNamespace(
+            instance_config=SimpleNamespace(data={'plugin': {'enable': True}}),
+            deployment=SimpleNamespace(mode='cloud'),
+            logger=logger,
+        )
+        self.empty_plugin_connector = PluginRuntimeConnector(
+            plugin_app,
+            runtime_disconnect_callback,
+        )
+        self.empty_plugin_connector.handler = _EmptyPluginRuntimeHandler()
+
+        async def validate_context(context):
+            return context
+
+        async def load_desired_states(_context):
+            return []
+
+        self.empty_plugin_connector._validate_execution_context = validate_context
+        self.empty_plugin_connector._load_workspace_desired_states = load_desired_states
+
+    async def initialize(self) -> None:
+        await self.rate_limit.initialize()
+
+    async def run_phase(self, scale: ProbeScale, phase: int) -> None:
+        offsets = {
+            'query': (phase - 1) * scale.query_churn_per_phase,
+            'session': (phase - 1) * scale.session_churn_per_phase,
+            'rate': (phase - 1) * scale.rate_limit_churn_per_phase,
+            'task': (phase - 1) * scale.task_churn_per_phase,
+            'websocket': (phase - 1) * scale.websocket_churn_per_phase,
+            'empty_workspace': ((phase - 1) * scale.empty_workspace_churn_per_phase),
+        }
+        await self._churn_queries(offsets['query'], scale.query_churn_per_phase)
+        await self._churn_sessions(offsets['session'], scale.session_churn_per_phase)
+        await self._churn_rate_limits(offsets['rate'], scale.rate_limit_churn_per_phase)
+        await self._churn_tasks(offsets['task'], scale.task_churn_per_phase)
+        self._churn_websocket_history(
+            offsets['websocket'],
+            scale.websocket_churn_per_phase,
+        )
+        await self._churn_empty_workspaces(
+            offsets['empty_workspace'],
+            scale.empty_workspace_churn_per_phase,
+        )
+        await asyncio.sleep(0)
+
+    async def _churn_queries(self, start: int, count: int) -> None:
+        def make_query(**values):
+            return _ProbeQuery(**values)
+
+        with patch(
+            'langbot.pkg.pipeline.pool.pipeline_query.Query',
+            side_effect=make_query,
+        ):
+            for index in range(start, start + count):
+                context = _execution_context(index)
+                query = await self.query_pool.add_query(
+                    bot_uuid='probe-bot',
+                    launcher_type=LauncherTypes.PERSON,
+                    launcher_id=f'launcher-{index}',
+                    sender_id=f'sender-{index}',
+                    message_event=SimpleNamespace(),
+                    message_chain=SimpleNamespace(),
+                    adapter=None,
+                    pipeline_uuid='probe-pipeline',
+                    execution_context=context,
+                )
+                removed = await self.query_pool.remove_query(query)
+                if not removed:
+                    raise AssertionError('Query cleanup failed')
+
+    async def _churn_sessions(self, start: int, count: int) -> None:
+        for index in range(start, start + count):
+            workspace_index = index % 100
+            context = _execution_context(
+                workspace_index,
+                query_uuid=f'session-query-{index}',
+            )
+            query = SimpleNamespace(
+                launcher_type=LauncherTypes.PERSON,
+                launcher_id=f'launcher-{index}',
+                sender_id=f'sender-{index}',
+                bot_uuid='probe-bot',
+                pipeline_uuid='probe-pipeline',
+                query_uuid=context.query_uuid,
+                _execution_context=context,
+            )
+            await self.session_manager.get_session(query)
+
+    async def _churn_rate_limits(self, start: int, count: int) -> None:
+        for index in range(start, start + count):
+            context = _execution_context(
+                index % 1_000,
+                query_uuid=f'rate-query-{index}',
+            )
+            query = SimpleNamespace(
+                bot_uuid='probe-bot',
+                pipeline_uuid='probe-pipeline',
+                _execution_context=context,
+                pipeline_config={
+                    'safety': {
+                        'rate-limit': {
+                            'window-length': 60,
+                            'limitation': 100_000,
+                            'strategy': 'drop',
+                        }
+                    }
+                },
+            )
+            admitted = await self.rate_limit.require_access(
+                query,
+                LauncherTypes.PERSON,
+                f'rate-identity-{index}',
+            )
+            if not admitted:
+                raise AssertionError('Rate-limit registry rejected bounded churn')
+
+    async def _churn_tasks(self, start: int, count: int) -> None:
+        async def complete_immediately() -> None:
+            return None
+
+        for batch_start in range(start, start + count, 256):
+            batch_size = min(256, start + count - batch_start)
+            wrappers = [
+                self.task_manager.create_task(
+                    complete_immediately(),
+                    name=f'resource-probe-{batch_start + offset}',
+                )
+                for offset in range(batch_size)
+            ]
+            await asyncio.gather(*(wrapper.task for wrapper in wrappers))
+            await asyncio.sleep(0)
+
+    def _churn_websocket_history(self, start: int, count: int) -> None:
+        for index in range(start, start + count):
+            conversation_key = f'conversation-{index}'
+            response_id = f'response-{index}'
+            indexes = self.websocket_session.get_stream_message_indexes(conversation_key)
+            indexes[response_id] = 0
+            self.websocket_session.append_message(
+                conversation_key,
+                WebSocketMessage(
+                    id=self.websocket_session.next_message_id(conversation_key),
+                    role='assistant',
+                    content='probe',
+                    message_chain=[],
+                    timestamp='1970-01-01T00:00:00+00:00',
+                    is_final=True,
+                ),
+            )
+
+    async def _churn_empty_workspaces(self, start: int, count: int) -> None:
+        for index in range(start, start + count):
+            await self.empty_model_manager._load_workspace_models(_execution_context(index))
+        await self.empty_plugin_connector.reconcile_projected_workspaces(
+            _execution_context(index) for index in range(start, start + count)
+        )
+
+    def retained_state(self) -> dict[str, int]:
+        return {
+            'query_cached': len(self.query_pool.cached_queries),
+            'query_queued': len(self.query_pool.queries),
+            'query_active_workspaces': len(self.query_pool.active_query_count_by_workspace),
+            'query_scope_counters': len(self.query_pool.query_count_by_scope),
+            'sessions': len(self.session_manager.session_list),
+            'session_index': len(self.session_manager._session_index),
+            'rate_limit_containers': len(self.rate_limit.containers),
+            'task_records': len(self.task_manager.tasks),
+            'websocket_conversations': len(self.websocket_session.message_lists),
+            'websocket_stream_indexes': len(self.websocket_session.stream_message_indexes),
+            'empty_model_scopes': len(self.empty_model_manager._scope_generations),
+            'empty_model_providers': len(self.empty_model_manager.provider_dict),
+            'empty_model_llms': len(self.empty_model_manager.llm_model_dict),
+            'empty_plugin_workspace_sets': len(self.empty_plugin_connector._workspace_installations),
+            'empty_plugin_installations': len(self.empty_plugin_connector._known_desired_states),
+        }
+
+    def assert_bounded(self) -> None:
+        state = self.retained_state()
+        expected_maximums = {
+            'query_cached': 0,
+            'query_queued': 0,
+            'query_active_workspaces': 0,
+            'query_scope_counters': 100,
+            'sessions': 200,
+            'session_index': 200,
+            'rate_limit_containers': 10_000,
+            'task_records': 200,
+            'websocket_conversations': 200,
+            'websocket_stream_indexes': 200,
+            'empty_model_scopes': 0,
+            'empty_model_providers': 0,
+            'empty_model_llms': 0,
+            'empty_plugin_workspace_sets': 0,
+            'empty_plugin_installations': 0,
+        }
+        violations = {key: (state[key], maximum) for key, maximum in expected_maximums.items() if state[key] > maximum}
+        if violations:
+            raise AssertionError(f'Core retained-state limits failed: {violations}')
+
+
+async def _run(args: argparse.Namespace) -> dict:
+    scale = SCALES[args.scale]
+    tracemalloc.start()
+    started_at = time.monotonic()
+    probe = CoreRuntimeProbe()
+    await probe.initialize()
+
+    baseline = _sample_process()
+    await probe.run_phase(scale, 1)
+    probe.assert_bounded()
+    phase_one = _sample_process()
+    state_one = probe.retained_state()
+
+    await probe.run_phase(scale, 2)
+    probe.assert_bounded()
+    phase_two = _sample_process()
+    state_two = probe.retained_state()
+
+    if state_two != state_one:
+        raise AssertionError(f'Core retained state did not plateau: phase_one={state_one}, phase_two={state_two}')
+    traced_growth = phase_two.traced_current_bytes - phase_one.traced_current_bytes
+    rss_growth = phase_two.rss_bytes - phase_one.rss_bytes
+    max_traced_growth = int(args.max_traced_growth_mib * 1024 * 1024)
+    max_rss_growth = int(args.max_rss_growth_mib * 1024 * 1024)
+    if traced_growth > max_traced_growth:
+        raise AssertionError(f'Second-phase traced memory grew by {traced_growth} bytes (limit {max_traced_growth})')
+    if rss_growth > max_rss_growth:
+        raise AssertionError(f'Second-phase RSS grew by {rss_growth} bytes (limit {max_rss_growth})')
+
+    return {
+        'component': 'langbot-core',
+        'scale': args.scale,
+        'work_per_phase': asdict(scale),
+        'elapsed_seconds': round(time.monotonic() - started_at, 3),
+        'samples': {
+            'baseline': asdict(baseline),
+            'phase_one': asdict(phase_one),
+            'phase_two': asdict(phase_two),
+        },
+        'second_phase_growth': {
+            'rss_bytes': rss_growth,
+            'traced_current_bytes': traced_growth,
+        },
+        'retained_state': {
+            'phase_one': state_one,
+            'phase_two': state_two,
+        },
+        'passed': True,
+    }
+
+
+def _parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument('--scale', choices=tuple(SCALES), default='quick')
+    parser.add_argument('--max-traced-growth-mib', type=float, default=8.0)
+    parser.add_argument('--max-rss-growth-mib', type=float, default=64.0)
+    parser.add_argument('--json', action='store_true', help='Print compact JSON')
+    return parser.parse_args()
+
+
+def main() -> None:
+    args = _parse_args()
+    result = asyncio.run(_run(args))
+    if args.json:
+        print(json.dumps(result, sort_keys=True))
+    else:
+        print(json.dumps(result, indent=2, sort_keys=True))
+
+
+if __name__ == '__main__':
+    main()
@@ -0,0 +1,561 @@
+#!/usr/bin/env python3
+"""Measure populated Workspace runtime replacement cost and retention.
+
+Unlike ``runtime_resource_probe.py``, which stresses historical request keys
+and empty tenants, this probe keeps one representative Provider, LLM,
+Embedding model, Rerank model, Pipeline, Bot, and Knowledge Base per Workspace.
+It then advances every Workspace to a new placement generation and verifies
+that old runtime objects are closed and collectible while active registry
+cardinality remains constant.
+"""
+
+from __future__ import annotations
+
+import argparse
+import asyncio
+import gc
+import json
+import time
+import tracemalloc
+import weakref
+from dataclasses import asdict, dataclass
+from types import SimpleNamespace
+
+import psutil
+
+from langbot.pkg.api.http.context import ExecutionContext
+
+# Match the production import order; importing a leaf manager first exposes a
+# historical annotation cycle that the application graph resolves.
+from langbot.pkg.core import app as _core_app  # noqa: F401
+from langbot.pkg.entity.persistence import bot as persistence_bot
+from langbot.pkg.entity.persistence import model as persistence_model
+from langbot.pkg.entity.persistence import pipeline as persistence_pipeline
+from langbot.pkg.entity.persistence import rag as persistence_rag
+from langbot.pkg.pipeline.pipelinemgr import PipelineManager
+from langbot.pkg.platform.botmgr import PlatformManager
+from langbot.pkg.provider.modelmgr import requester
+from langbot.pkg.provider.modelmgr.modelmgr import ModelManager
+from langbot.pkg.provider.tools.loaders.mcp import MCPLoader
+from langbot.pkg.rag.knowledge.kbmgr import RAGManager
+from langbot.pkg.workspace.entities import WorkspaceExecutionBinding
+
+
+@dataclass(frozen=True, slots=True)
+class ProbeScale:
+    workspaces: int
+
+
+SCALES = {
+    'quick': ProbeScale(workspaces=250),
+    'audit': ProbeScale(workspaces=5_000),
+}
+
+
+@dataclass(frozen=True, slots=True)
+class ProcessSample:
+    rss_bytes: int
+    traced_current_bytes: int
+    traced_peak_bytes: int
+    asyncio_tasks: int
+    threads: int
+    open_fds: int | None
+
+
+class _ProbeLogger:
+    def debug(self, *_args, **_kwargs) -> None:
+        return None
+
+    def info(self, *_args, **_kwargs) -> None:
+        return None
+
+    def warning(self, *_args, **_kwargs) -> None:
+        return None
+
+    def error(self, *_args, **_kwargs) -> None:
+        return None
+
+
+class _ProbeWorkspaceService:
+    instance_uuid = 'runtime-capacity-probe'
+
+    def __init__(self) -> None:
+        self.generations: dict[str, int] = {}
+        self.binding_lookups = 0
+
+    async def get_execution_binding(
+        self,
+        workspace_uuid: str,
+        *,
+        expected_generation: int | None = None,
+    ) -> WorkspaceExecutionBinding:
+        self.binding_lookups += 1
+        generation = self.generations[workspace_uuid]
+        if expected_generation is not None and expected_generation != generation:
+            raise AssertionError(f'stale probe generation {expected_generation} != {generation}')
+        return WorkspaceExecutionBinding(
+            instance_uuid=self.instance_uuid,
+            workspace_uuid=workspace_uuid,
+            placement_generation=generation,
+            write_fenced=False,
+            state='active',
+        )
+
+
+class _ProbeRequester(requester.ProviderAPIRequester):
+    name = 'capacity-probe'
+    closed = 0
+
+    async def invoke_llm(
+        self,
+        query,
+        model,
+        messages,
+        funcs=None,
+        extra_args=None,
+        remove_think=False,
+    ):
+        return None
+
+    async def aclose(self) -> None:
+        type(self).closed += 1
+
+
+class _ProbeAdapter:
+    killed = 0
+
+    def __init__(self, _config, _logger) -> None:
+        self.listeners = []
+
+    def register_listener(self, event_type, listener) -> None:
+        self.listeners.append((event_type, listener))
+
+    async def kill(self) -> None:
+        type(self).killed += 1
+
+
+class _ProbeMCPSession:
+    closed = 0
+
+    def __init__(self, server_name: str) -> None:
+        self.server_name = server_name
+
+    async def shutdown(self) -> None:
+        type(self).closed += 1
+
+
+def _sample_process() -> ProcessSample:
+    gc.collect()
+    process = psutil.Process()
+    try:
+        open_fds = process.num_fds()
+    except (AttributeError, psutil.Error):
+        open_fds = None
+    traced_current, traced_peak = tracemalloc.get_traced_memory()
+    return ProcessSample(
+        rss_bytes=process.memory_info().rss,
+        traced_current_bytes=traced_current,
+        traced_peak_bytes=traced_peak,
+        asyncio_tasks=len(asyncio.all_tasks()),
+        threads=process.num_threads(),
+        open_fds=open_fds,
+    )
+
+
+class PopulatedWorkspaceProbe:
+    def __init__(self) -> None:
+        _ProbeRequester.closed = 0
+        _ProbeAdapter.killed = 0
+        _ProbeMCPSession.closed = 0
+        self.workspace_service = _ProbeWorkspaceService()
+        self.logger = _ProbeLogger()
+        self.app = SimpleNamespace(
+            logger=self.logger,
+            workspace_service=self.workspace_service,
+            persistence_mgr=SimpleNamespace(
+                mode=SimpleNamespace(value='cloud_runtime'),
+            ),
+            pipeline_config_meta_trigger={'name': 'trigger', 'stages': []},
+            pipeline_config_meta_safety={'name': 'safety', 'stages': []},
+            pipeline_config_meta_ai={'name': 'ai', 'stages': []},
+            pipeline_config_meta_output={'name': 'output', 'stages': []},
+            task_mgr=SimpleNamespace(
+                cancel_by_scope=lambda *_args, **_kwargs: None,
+                cancel_task=lambda *_args, **_kwargs: None,
+            ),
+        )
+        self.model_manager = ModelManager(self.app)
+        self.model_manager.requester_dict = {
+            _ProbeRequester.name: _ProbeRequester,
+        }
+        self.pipeline_manager = PipelineManager(self.app)
+        self.pipeline_manager.stage_dict = {}
+        self.rag_manager = RAGManager(self.app)
+        self.mcp_loader = MCPLoader(self.app)
+        self.platform_manager = PlatformManager(self.app)
+        self.platform_manager.adapter_dict = {
+            'capacity-probe': _ProbeAdapter,
+        }
+        self.generation_refs: dict[
+            int,
+            list[weakref.ReferenceType],
+        ] = {}
+
+    def _context(
+        self,
+        workspace_uuid: str,
+        generation: int,
+        *,
+        bot_uuid: str | None = None,
+        pipeline_uuid: str | None = None,
+    ) -> ExecutionContext:
+        return ExecutionContext(
+            instance_uuid=self.workspace_service.instance_uuid,
+            workspace_uuid=workspace_uuid,
+            placement_generation=generation,
+            bot_uuid=bot_uuid,
+            pipeline_uuid=pipeline_uuid,
+        )
+
+    async def load_generation(self, workspaces: int, generation: int) -> None:
+        for index in range(workspaces):
+            workspace_uuid = f'workspace-{index}'
+            provider_uuid = f'provider-{index}'
+            llm_uuid = f'llm-{index}'
+            embedding_uuid = f'embedding-{index}'
+            rerank_uuid = f'rerank-{index}'
+            pipeline_uuid = f'pipeline-{index}'
+            bot_uuid = f'bot-{index}'
+            kb_uuid = f'knowledge-{index}'
+            mcp_server_name = f'mcp-{index}'
+            self.workspace_service.generations[workspace_uuid] = generation
+            context = self._context(workspace_uuid, generation)
+
+            runtime_provider = await self.model_manager.load_provider(
+                context,
+                persistence_model.ModelProvider(
+                    uuid=provider_uuid,
+                    workspace_uuid=workspace_uuid,
+                    name='Capacity Provider',
+                    requester=_ProbeRequester.name,
+                    base_url='https://capacity.invalid',
+                    api_keys=['probe'],
+                ),
+            )
+            await self.model_manager.cache_provider(context, runtime_provider)
+
+            runtime_llm = await self.model_manager.load_llm_model_with_provider(
+                context,
+                persistence_model.LLMModel(
+                    uuid=llm_uuid,
+                    workspace_uuid=workspace_uuid,
+                    name='Capacity LLM',
+                    provider_uuid=provider_uuid,
+                    abilities=['func_call'],
+                    extra_args={'temperature': 0.1},
+                ),
+                runtime_provider,
+            )
+            await self.model_manager.cache_llm_model(context, runtime_llm)
+            runtime_embedding = await self.model_manager.load_embedding_model_with_provider(
+                context,
+                persistence_model.EmbeddingModel(
+                    uuid=embedding_uuid,
+                    workspace_uuid=workspace_uuid,
+                    name='Capacity Embedding',
+                    provider_uuid=provider_uuid,
+                    extra_args={'dimensions': 1_024},
+                ),
+                runtime_provider,
+            )
+            await self.model_manager.cache_embedding_model(
+                context,
+                runtime_embedding,
+            )
+            runtime_rerank = await self.model_manager.load_rerank_model_with_provider(
+                context,
+                persistence_model.RerankModel(
+                    uuid=rerank_uuid,
+                    workspace_uuid=workspace_uuid,
+                    name='Capacity Rerank',
+                    provider_uuid=provider_uuid,
+                    extra_args={},
+                ),
+                runtime_provider,
+            )
+            await self.model_manager.cache_rerank_model(
+                context,
+                runtime_rerank,
+            )
+
+            pipeline_context = self._context(
+                workspace_uuid,
+                generation,
+                pipeline_uuid=pipeline_uuid,
+            )
+            await self.pipeline_manager.load_pipeline(
+                pipeline_context,
+                persistence_pipeline.LegacyPipeline(
+                    uuid=pipeline_uuid,
+                    workspace_uuid=workspace_uuid,
+                    name='Capacity Pipeline',
+                    description='',
+                    for_version='probe',
+                    is_default=True,
+                    stages=[],
+                    config={},
+                    extensions_preferences={},
+                ),
+                _binding_validated=True,
+            )
+            runtime_pipeline = self.pipeline_manager._pipelines_by_key[
+                (
+                    self.workspace_service.instance_uuid,
+                    workspace_uuid,
+                    pipeline_uuid,
+                )
+            ]
+
+            runtime_kb = await self.rag_manager.load_knowledge_base(
+                context,
+                persistence_rag.KnowledgeBase(
+                    uuid=kb_uuid,
+                    workspace_uuid=workspace_uuid,
+                    name='Capacity Knowledge',
+                    description='',
+                    knowledge_engine_plugin_id=None,
+                    collection_id=kb_uuid,
+                    creation_settings={},
+                    retrieval_settings={},
+                ),
+                _binding_validated=True,
+            )
+
+            await self.mcp_loader._assert_execution_active(context)
+            runtime_mcp = _ProbeMCPSession(mcp_server_name)
+            self.mcp_loader._register_session(
+                context,
+                mcp_server_name,
+                runtime_mcp,
+            )
+
+            bot_context = self._context(
+                workspace_uuid,
+                generation,
+                bot_uuid=bot_uuid,
+            )
+            runtime_bot = await self.platform_manager.load_bot(
+                bot_context,
+                persistence_bot.Bot(
+                    uuid=bot_uuid,
+                    workspace_uuid=workspace_uuid,
+                    name='Capacity Bot',
+                    description='',
+                    adapter='capacity-probe',
+                    adapter_config={},
+                    enable=True,
+                    use_pipeline_uuid=pipeline_uuid,
+                    pipeline_routing_rules=[],
+                ),
+                _binding_validated=True,
+            )
+
+            self.generation_refs.setdefault(generation, []).extend(
+                (
+                    weakref.ref(runtime_provider),
+                    weakref.ref(runtime_llm),
+                    weakref.ref(runtime_embedding),
+                    weakref.ref(runtime_rerank),
+                    weakref.ref(runtime_pipeline),
+                    weakref.ref(runtime_kb),
+                    weakref.ref(runtime_mcp),
+                    weakref.ref(runtime_bot),
+                )
+            )
+
+        await asyncio.sleep(0)
+
+    def retained_state(self) -> dict[str, int]:
+        return {
+            'model_providers': len(self.model_manager.provider_dict),
+            'llm_models': len(self.model_manager.llm_model_dict),
+            'embedding_models': len(self.model_manager.embedding_model_dict),
+            'rerank_models': len(self.model_manager.rerank_model_dict),
+            'model_scopes': len(self.model_manager._scope_generations),
+            'pipelines': len(self.pipeline_manager._pipelines_by_key),
+            'pipeline_scopes': len(self.pipeline_manager._scope_generations),
+            'knowledge_bases': len(self.rag_manager.knowledge_bases),
+            'knowledge_scopes': len(self.rag_manager._scope_generations),
+            'mcp_sessions': len(self.mcp_loader.sessions),
+            'mcp_scopes': len(self.mcp_loader._scope_generations),
+            'bots': len(self.platform_manager._bots_by_key),
+            'bot_scopes': len(self.platform_manager._scope_generations),
+            'requesters_closed': _ProbeRequester.closed,
+            'adapters_killed': _ProbeAdapter.killed,
+            'mcp_sessions_closed': _ProbeMCPSession.closed,
+            'binding_lookups': self.workspace_service.binding_lookups,
+        }
+
+    def assert_generation_state(
+        self,
+        workspaces: int,
+        generation: int,
+    ) -> None:
+        state = self.retained_state()
+        cardinality_keys = (
+            'model_providers',
+            'llm_models',
+            'embedding_models',
+            'rerank_models',
+            'model_scopes',
+            'pipelines',
+            'pipeline_scopes',
+            'knowledge_bases',
+            'knowledge_scopes',
+            'mcp_sessions',
+            'mcp_scopes',
+            'bots',
+            'bot_scopes',
+        )
+        invalid = {key: value for key in cardinality_keys if (value := state[key]) != workspaces}
+        if invalid:
+            raise AssertionError(f'populated Workspace cardinality mismatch: {invalid}')
+        expected_retired = (generation - 1) * workspaces
+        if state['requesters_closed'] != expected_retired:
+            raise AssertionError(f'retired requester count {state["requesters_closed"]} != {expected_retired}')
+        if state['adapters_killed'] != expected_retired:
+            raise AssertionError(f'retired adapter count {state["adapters_killed"]} != {expected_retired}')
+        if state['mcp_sessions_closed'] != expected_retired:
+            raise AssertionError(f'retired MCP session count {state["mcp_sessions_closed"]} != {expected_retired}')
+
+    def assert_generation_collected(self, generation: int) -> None:
+        gc.collect()
+        references = self.generation_refs.pop(generation)
+        retained = sum(reference() is not None for reference in references)
+        if retained:
+            raise AssertionError(f'{retained} generation-{generation} runtime objects remain reachable')
+
+
+async def _run(args: argparse.Namespace) -> dict:
+    scale = SCALES[args.scale]
+    tracemalloc.start()
+    probe = PopulatedWorkspaceProbe()
+    baseline = _sample_process()
+
+    phase_one_started = time.monotonic()
+    await probe.load_generation(scale.workspaces, 1)
+    phase_one_seconds = time.monotonic() - phase_one_started
+    probe.assert_generation_state(scale.workspaces, 1)
+    phase_one = _sample_process()
+    phase_one_state = probe.retained_state()
+
+    phase_two_started = time.monotonic()
+    await probe.load_generation(scale.workspaces, 2)
+    phase_two_seconds = time.monotonic() - phase_two_started
+    probe.assert_generation_state(scale.workspaces, 2)
+    probe.assert_generation_collected(1)
+    phase_two = _sample_process()
+    phase_two_state = probe.retained_state()
+
+    phase_three_started = time.monotonic()
+    await probe.load_generation(scale.workspaces, 3)
+    phase_three_seconds = time.monotonic() - phase_three_started
+    probe.assert_generation_state(scale.workspaces, 3)
+    probe.assert_generation_collected(2)
+    phase_three = _sample_process()
+    phase_three_state = probe.retained_state()
+
+    cardinality_keys = (
+        'model_providers',
+        'llm_models',
+        'embedding_models',
+        'rerank_models',
+        'model_scopes',
+        'pipelines',
+        'pipeline_scopes',
+        'knowledge_bases',
+        'knowledge_scopes',
+        'mcp_sessions',
+        'mcp_scopes',
+        'bots',
+        'bot_scopes',
+    )
+    if any(
+        phase_two_state[key] != phase_one_state[key] or phase_three_state[key] != phase_one_state[key]
+        for key in cardinality_keys
+    ):
+        raise AssertionError(
+            'populated Workspace registries did not plateau: '
+            f'phase_one={phase_one_state}, phase_two={phase_two_state}, '
+            f'phase_three={phase_three_state}'
+        )
+
+    traced_growth = phase_three.traced_current_bytes - phase_two.traced_current_bytes
+    rss_growth = phase_three.rss_bytes - phase_two.rss_bytes
+    max_traced_growth = int(args.max_traced_growth_mib * 1024 * 1024)
+    max_rss_growth = int(args.max_rss_growth_mib * 1024 * 1024)
+    if traced_growth > max_traced_growth:
+        raise AssertionError(f'replacement traced memory grew by {traced_growth} bytes (limit {max_traced_growth})')
+    if rss_growth > max_rss_growth:
+        raise AssertionError(f'replacement RSS grew by {rss_growth} bytes (limit {max_rss_growth})')
+    phase_ratio = max(
+        phase_two_seconds,
+        phase_three_seconds,
+    ) / max(phase_one_seconds, 0.000_001)
+    if phase_ratio > args.max_replacement_time_ratio:
+        raise AssertionError(f'replacement phase ratio {phase_ratio:.3f} exceeds {args.max_replacement_time_ratio:.3f}')
+
+    return {
+        'component': 'langbot-populated-workspaces',
+        'scale': args.scale,
+        'workspaces': scale.workspaces,
+        'passed': True,
+        'phase_seconds': {
+            'initial': round(phase_one_seconds, 3),
+            'replacement_one': round(phase_two_seconds, 3),
+            'replacement_two': round(phase_three_seconds, 3),
+            'maximum_replacement_ratio': round(phase_ratio, 3),
+        },
+        'samples': {
+            'baseline': asdict(baseline),
+            'phase_one': asdict(phase_one),
+            'phase_two': asdict(phase_two),
+            'phase_three': asdict(phase_three),
+        },
+        'replacement_growth': {
+            'rss_bytes': rss_growth,
+            'traced_current_bytes': traced_growth,
+        },
+        'retained_state': {
+            'phase_one': phase_one_state,
+            'phase_two': phase_two_state,
+            'phase_three': phase_three_state,
+        },
+    }
+
+
+def _parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument('--scale', choices=tuple(SCALES), default='quick')
+    parser.add_argument('--max-traced-growth-mib', type=float, default=16.0)
+    parser.add_argument('--max-rss-growth-mib', type=float, default=64.0)
+    parser.add_argument(
+        '--max-replacement-time-ratio',
+        type=float,
+        default=3.0,
+    )
+    parser.add_argument('--json', action='store_true')
+    return parser.parse_args()
+
+
+def main() -> None:
+    args = _parse_args()
+    result = asyncio.run(_run(args))
+    if args.json:
+        print(json.dumps(result, sort_keys=True))
+    else:
+        print(json.dumps(result, indent=2, sort_keys=True))
+
+
+if __name__ == '__main__':
+    main()
@@ -27,6 +27,17 @@ The `all` / `box` profile starts three services:
 - `langbot_box` — Box sandbox runtime (`:5410`). Uses the host Docker socket to
  spawn sandbox containers, so the **Box root host path and in-container path
  must be identical** (`BOX__LOCAL__HOST_ROOT=${LANGBOT_BOX_ROOT:-${PWD}/data/box}`).
+  Its RPC and managed-process relay require a shared
+  `LANGBOT_BOX_CONTROL_TOKEN` (at least 32 non-whitespace characters) in both
+  the LangBot and Box containers. Generate it once with `openssl rand -hex 32`;
+  never put it in `box.runtime.endpoint` or commit it to config.
+
+Every Compose deployment also needs one
+`LANGBOT_PLUGIN_RUNTIME_CONTROL_TOKEN` shared by `langbot` and
+`langbot_plugin_runtime`. Generate it with `openssl rand -hex 32` and export it
+before `docker compose up`; the external Plugin Runtime fails closed when the
+token is empty or weak. Kubernetes uses the `langbot-plugin-runtime-control`
+Secret shown in `docker/kubernetes.yaml`.

 With Box off, the dashboard/skills list stays visible (read-only) but sandbox
 tools, skill add/edit, and stdio MCP are disabled. Set `box.enabled: false`
@@ -65,10 +65,23 @@ Route auth is declared per-route via `AuthType` in
 - `API_KEY` — `X-API-Key` or `Authorization: Bearer <key>`.
 - `USER_TOKEN_OR_API_KEY` — either.

-API keys are verified by `apikey_service.verify_api_key()`, which accepts:
-1. the **global key** from `config.yaml` `api.global_api_key` (no DB, no login,
-   no `lbk_` prefix required), then
-2. **web-UI keys** (DB-stored, `lbk_` prefix).
+Authenticated routes receive an immutable `RequestContext` containing the
+principal, authorized Workspace membership, fixed-role permissions, instance,
+request id, and placement generation. A browser's `X-Workspace-Id` is only a
+selector and is always checked against the Account membership. Tenant services
+must accept this context (or an explicit trusted execution context) and fail
+closed when it is absent.
+
+API-key authentication accepts:
+
+1. the **global key** from `config.yaml` `api.global_api_key` only for a
+   community instance with exactly one local Workspace, then
+2. **web-UI keys** whose one-time `lbk_` secret is stored only as a hash and is
+   bound to one Workspace, explicit scopes, status, and optional expiry.
+
+An API key derives its Workspace from the key record and ignores a caller's
+Workspace selector. Public Bot/Webhook routes similarly derive Workspace from
+the opaque owning resource rather than a header.

 Route groups self-register via `@group.group_class(name, path)` and are
 discovered by `importutil.import_modules_in_pkg`.
@@ -29,13 +29,19 @@ Authorization: Bearer <api-key>

 Two kinds of key are accepted:

-1. **Web-UI key** — created in the web UI (sidebar → API Keys), prefixed `lbk_`,
-   stored in the database.
+1. **Web-UI key** — created in the web UI (sidebar → API Keys), prefixed `lbk_`.
+   The secret is shown once; only its SHA-256 hash is stored. Each key is bound
+   to one Workspace and has explicit scopes, status, optional expiry, and
+   last-used metadata. The key determines the Workspace; callers cannot switch
+   it with `X-Workspace-Id`.
 2. **Global API key** — set in `data/config.yaml` under `api.global_api_key`.
   Requires no login session and no DB record; does not need the `lbk_` prefix.
-   Leave empty to disable. See the `langbot-deploy` skill for config details.
+   It is accepted only by a community instance with exactly one local
+   Workspace and is disabled for SaaS multi-Workspace operation. Leave empty to
+   disable. See the `langbot-deploy` skill for config details.

-Requests without a valid key get `401 Unauthorized`.
+Invalid, revoked, or expired keys get `401 Unauthorized`. A valid key whose
+scopes do not authorize a tool gets `403 Forbidden`.

 ## Client configuration

@@ -66,7 +72,9 @@ The tools wrap the LangBot service layer. Current tools (v1):

 Mutating tools (`create_*`, `update_*`) take a JSON object matching the same
 shape as the corresponding HTTP API request body. Discover resources with the
-`list_*` / `get_*` tools before mutating; identifiers are UUIDs.
+`list_*` / `get_*` tools before mutating; identifiers are UUIDs. Reads require
+`resource.view`; mutations require `resource.manage`. All service calls inherit
+the immutable Workspace context authenticated at the MCP transport boundary.

 ## How to use

@@ -93,7 +101,8 @@ shape as the corresponding HTTP API request body. Discover resources with the
 - `/mcp` is the **server** LangBot exposes. The `/api/v1/mcp` routes are the
  **client** side (managing external MCP servers LangBot connects to). Don't
  confuse them.
- A `401` means the key is wrong, missing, or (for the global key)
-  `api.global_api_key` is empty in config.yaml.
+- A `401` means the key is wrong, missing, revoked, expired, or (for the global
+  key) `api.global_api_key` is empty or the instance is not an OSS singleton.
+- A `403` means the key is valid but lacks the permission required by the tool.
 - The global key is plaintext in config.yaml — only enable it on trusted/internal
  deployments and serve over HTTPS.
@@ -20,8 +20,7 @@ asciiart = r"""
 """


-async def main_entry(loop: asyncio.AbstractEventLoop):
-    """Main entry point for LangBot"""
+def _build_parser() -> argparse.ArgumentParser:
    parser = argparse.ArgumentParser(description='LangBot')
    parser.add_argument(
        '--standalone-runtime',
@@ -36,7 +35,20 @@ async def main_entry(loop: asyncio.AbstractEventLoop):
        default=False,
    )
    parser.add_argument('--debug', action='store_true', help='Debug mode / 调试模式', default=False)
-    args = parser.parse_args()
+    subparsers = parser.add_subparsers(dest='command')
+    migrate_parser = subparsers.add_parser('migrate', help='Run an operator-only database migration')
+    migrate_parser.add_argument(
+        '--cloud',
+        action='store_true',
+        required=True,
+        help='Migrate and validate the Cloud PostgreSQL business database',
+    )
+    return parser
+
+
+async def main_entry(loop: asyncio.AbstractEventLoop):
+    """Main entry point for LangBot"""
+    args = _build_parser().parse_args()

    if args.standalone_runtime:
        from langbot.pkg.utils import platform
@@ -55,22 +67,26 @@ async def main_entry(loop: asyncio.AbstractEventLoop):

    print(asciiart)

-    # Check dependencies
-    from langbot.pkg.core.bootutils import deps
+    # A release migration is a deterministic one-shot deployment Job. It must
+    # fail with the current image when a dependency is absent, never mutate its
+    # environment and ask an orchestrator to restart it.
+    if args.command != 'migrate':
+        from langbot.pkg.core.bootutils import deps

-    missing_deps = await deps.check_deps()
+        missing_deps = await deps.check_deps()

-    if missing_deps:
-        print('以下依赖包未安装，将自动安装，请完成后重启程序：')
-        print(
-            'These dependencies are missing, they will be installed automatically, please restart the program after completion:'
-        )
-        for dep in missing_deps:
-            print('-', dep)
-        await deps.install_deps(missing_deps)
-        print('已自动安装缺失的依赖包，请重启程序。')
-        print('The missing dependencies have been installed automatically, please restart the program.')
-        sys.exit(0)
+        if missing_deps:
+            print('以下依赖包未安装，将自动安装，请完成后重启程序：')
+            print(
+                'These dependencies are missing, they will be installed automatically, '
+                'please restart the program after completion:'
+            )
+            for dep in missing_deps:
+                print('-', dep)
+            await deps.install_deps(missing_deps)
+            print('已自动安装缺失的依赖包，请重启程序。')
+            print('The missing dependencies have been installed automatically, please restart the program.')
+            sys.exit(0)

    # Check configuration files
    from langbot.pkg.core.bootutils import files
@@ -83,6 +99,12 @@ async def main_entry(loop: asyncio.AbstractEventLoop):
        for file in generated_files:
            print('-', file)

+    if args.command == 'migrate':
+        from langbot.pkg.persistence.release_migration import run_cloud_release_migration_from_config
+
+        await run_cloud_release_migration_from_config(loop)
+        return
+
    from langbot.pkg.core import boot

    await boot.main(loop)
@@ -6,6 +6,22 @@ from typing import Dict, List, Any, AsyncGenerator
 import os
 from pathlib import Path

+from langbot.pkg.utils import httpclient
+
+_MAX_COZE_RESPONSE_BYTES = 16 * 1024 * 1024
+_MAX_COZE_EVENT_BYTES = 1024 * 1024
+_MAX_COZE_MEDIA_BYTES = 10 * 1024 * 1024
+
+
+def _read_local_media_limited(path: Path) -> bytes:
+    if path.stat().st_size > _MAX_COZE_MEDIA_BYTES:
+        raise ValueError('Coze upload exceeds the size limit')
+    with path.open('rb') as handle:
+        body = handle.read(_MAX_COZE_MEDIA_BYTES + 1)
+    if len(body) > _MAX_COZE_MEDIA_BYTES:
+        raise ValueError('Coze upload exceeds the size limit')
+    return body
+

 class AsyncCozeAPIClient:
    def __init__(self, api_key: str, api_base: str = 'https://api.coze.cn'):
@@ -58,19 +74,24 @@ class AsyncCozeAPIClient:
        if isinstance(file, Path):
            if not file.exists():
                raise ValueError(f'File not found: {file}')
-            with open(file, 'rb') as f:
-                file = f.read()
+            file = await asyncio.to_thread(_read_local_media_limited, file)

        # 处理文件路径字符串
        elif isinstance(file, str):
            if not os.path.isfile(file):
                raise ValueError(f'File not found: {file}')
-            with open(file, 'rb') as f:
-                file = f.read()
+            file = await asyncio.to_thread(
+                _read_local_media_limited,
+                Path(file),
+            )

        # 处理文件对象
        elif hasattr(file, 'read'):
-            file = file.read()
+            file = await asyncio.to_thread(file.read, _MAX_COZE_MEDIA_BYTES + 1)
+        if not isinstance(file, (bytes, bytearray)):
+            raise ValueError('Unsupported Coze upload type')
+        if len(file) > _MAX_COZE_MEDIA_BYTES:
+            raise ValueError('Coze upload exceeds the size limit')

        session = await self.coze_session()
        url = f'{self.api_base}/v1/files/upload'
@@ -87,13 +108,18 @@ class AsyncCozeAPIClient:
                if response.status == 401:
                    raise Exception('Coze API 认证失败，请检查 API Key 是否正确')

-                response_text = await response.text()
+                response_text = (
+                    await httpclient.read_limited(
+                        response,
+                        max_bytes=_MAX_COZE_EVENT_BYTES,
+                    )
+                ).decode('utf-8', errors='replace')

                if response.status != 200:
                    raise Exception(f'文件上传失败，状态码: {response.status}, 响应: {response_text}')
                try:
-                    result = await response.json()
-                except json.JSONDecodeError:
+                    result = json.loads(response_text)
+                except (json.JSONDecodeError, TypeError):
                    raise Exception(f'文件上传响应解析失败: {response_text}')

                if result.get('code') != 0:
@@ -158,7 +184,15 @@ class AsyncCozeAPIClient:
                if response.status != 200:
                    raise Exception(f'Coze API 流式请求失败，状态码: {response.status}')

+                total_bytes = 0
+                chunk_type = 'message'
+                chunk_data = ''
                async for chunk in response.content:
+                    total_bytes += len(chunk)
+                    if total_bytes > _MAX_COZE_RESPONSE_BYTES:
+                        raise Exception('Coze API stream exceeds the runtime limit')
+                    if len(chunk) > _MAX_COZE_EVENT_BYTES:
+                        raise Exception('Coze API event exceeds the runtime limit')
                    chunk = chunk.decode('utf-8')
                    if chunk != '\n':
                        if chunk.startswith('event:'):
@@ -12,10 +12,26 @@ from collections.abc import AsyncGenerator

 import httpx

+from langbot.pkg.utils import httpclient
+
 from .errors import DeerFlowAPIError


 SSE_MAX_BUFFER_CHARS = 1_048_576
+SSE_MAX_TOTAL_BYTES = 16 * 1024 * 1024
+ERROR_BODY_MAX_BYTES = 1024 * 1024
+
+
+async def _read_error_body(response: httpx.Response) -> str:
+    body = bytearray()
+    async for chunk in response.aiter_bytes(8192):
+        body.extend(chunk)
+        if len(body) > ERROR_BODY_MAX_BYTES:
+            raise DeerFlowAPIError(
+                operation='read error response',
+                body='response exceeds the runtime limit',
+            )
+    return body.decode('utf-8', errors='replace')


 def _normalize_sse_newlines(text: str) -> str:
@@ -94,6 +110,7 @@ class AsyncDeerFlowClient:
        async with httpx.AsyncClient(
            trust_env=True,
            timeout=timeout,
+            event_hooks=httpclient.httpx_response_limit_hooks(),
        ) as client:
            response = await client.post(
                url,
@@ -101,13 +118,14 @@ class AsyncDeerFlowClient:
                json=payload,
            )
            if response.status_code not in (200, 201):
+                body = await httpclient.response_text(response)
                raise DeerFlowAPIError(
                    operation='create thread',
                    status=response.status_code,
-                    body=response.text,
+                    body=body,
                    url=url,
                )
-            return response.json()
+            return await httpclient.parse_json_response(response)

    async def delete_thread(self, thread_id: str, timeout: float = 20) -> None:
        """删除指定 thread"""
@@ -116,13 +134,15 @@ class AsyncDeerFlowClient:
        async with httpx.AsyncClient(
            trust_env=True,
            timeout=timeout,
+            event_hooks=httpclient.httpx_response_limit_hooks(),
        ) as client:
            response = await client.delete(url, headers=self.headers)
            if response.status_code not in (200, 202, 204, 404):
+                body = await httpclient.response_text(response)
                raise DeerFlowAPIError(
                    operation='delete thread',
                    status=response.status_code,
-                    body=response.text,
+                    body=body,
                    url=url,
                    thread_id=thread_id,
                )
@@ -163,19 +183,27 @@ class AsyncDeerFlowClient:
                json=payload,
            ) as resp:
                if resp.status_code != 200:
-                    body = await resp.aread()
                    raise DeerFlowAPIError(
                        operation='runs/stream request',
                        status=resp.status_code,
-                        body=body.decode('utf-8', errors='replace'),
+                        body=await _read_error_body(resp),
                        url=url,
                        thread_id=thread_id,
                    )

                decoder = codecs.getincrementaldecoder('utf-8')('replace')
                buffer = ''
+                total_bytes = 0

                async for chunk in resp.aiter_bytes(8192):
+                    total_bytes += len(chunk)
+                    if total_bytes > SSE_MAX_TOTAL_BYTES:
+                        raise DeerFlowAPIError(
+                            operation='runs/stream response',
+                            body='response exceeds the runtime limit',
+                            url=url,
+                            thread_id=thread_id,
+                        )
                    buffer += _normalize_sse_newlines(decoder.decode(chunk))

                    while '\n\n' in buffer:
@@ -1,5 +1,6 @@
 from __future__ import annotations

+import asyncio
 import httpx
 import typing
 import json
@@ -8,6 +9,75 @@ from .errors import DifyAPIError
 from pathlib import Path
 import os

+_MAX_DIFY_RESPONSE_BYTES = 1024 * 1024
+_MAX_DIFY_SSE_LINE_BYTES = 1024 * 1024
+_MAX_DIFY_STREAM_BYTES = 16 * 1024 * 1024
+_MAX_DIFY_UPLOAD_BYTES = 10 * 1024 * 1024
+
+
+async def _read_limited_response(
+    response: httpx.Response,
+    *,
+    max_bytes: int = _MAX_DIFY_RESPONSE_BYTES,
+) -> bytes:
+    content_length = response.headers.get('Content-Length')
+    if content_length is not None:
+        try:
+            if int(content_length) > max_bytes:
+                raise DifyAPIError(f'Remote response exceeds the {max_bytes}-byte limit')
+        except (TypeError, ValueError):
+            pass
+
+    body = bytearray()
+    async for chunk in response.aiter_bytes(chunk_size=8192):
+        body.extend(chunk)
+        if len(body) > max_bytes:
+            raise DifyAPIError(f'Remote response exceeds the {max_bytes}-byte limit')
+    return bytes(body)
+
+
+async def _iter_sse_json(
+    response: httpx.Response,
+) -> typing.AsyncGenerator[dict[str, typing.Any], None]:
+    """Parse Dify's one-JSON-per-data-line SSE without unbounded line buffering."""
+
+    buffer = bytearray()
+    total = 0
+    async for chunk in response.aiter_bytes(chunk_size=8192):
+        total += len(chunk)
+        if total > _MAX_DIFY_STREAM_BYTES:
+            raise DifyAPIError('Dify SSE stream exceeds the runtime limit')
+        buffer.extend(chunk)
+        while b'\n' in buffer:
+            raw_line, _, remainder = buffer.partition(b'\n')
+            buffer = bytearray(remainder)
+            if len(raw_line) > _MAX_DIFY_SSE_LINE_BYTES:
+                raise DifyAPIError('Dify SSE event exceeds the runtime limit')
+            line = raw_line.rstrip(b'\r').strip()
+            if not line or not line.startswith(b'data:'):
+                continue
+            payload = json.loads(line[5:].decode('utf-8', errors='replace'))
+            if isinstance(payload, dict):
+                yield payload
+        if len(buffer) > _MAX_DIFY_SSE_LINE_BYTES:
+            raise DifyAPIError('Dify SSE event exceeds the runtime limit')
+
+    line = bytes(buffer).rstrip(b'\r').strip()
+    if line.startswith(b'data:'):
+        payload = json.loads(line[5:].decode('utf-8', errors='replace'))
+        if isinstance(payload, dict):
+            yield payload
+
+
+def _read_local_file_limited(path: Path) -> bytes:
+    if path.stat().st_size > _MAX_DIFY_UPLOAD_BYTES:
+        raise ValueError('Dify upload exceeds the size limit')
+    with path.open('rb') as handle:
+        body = handle.read(_MAX_DIFY_UPLOAD_BYTES + 1)
+    if len(body) > _MAX_DIFY_UPLOAD_BYTES:
+        raise ValueError('Dify upload exceeds the size limit')
+    return body
+

 class AsyncDifyServiceClient:
    """Dify Service API 客户端"""
@@ -22,6 +92,21 @@ class AsyncDifyServiceClient:
    ) -> None:
        self.api_key = api_key
        self.base_url = base_url
+        self._client: httpx.AsyncClient | None = None
+
+    def _get_client(self) -> httpx.AsyncClient:
+        if self._client is None:
+            self._client = httpx.AsyncClient(
+                base_url=self.base_url,
+                trust_env=True,
+            )
+        return self._client
+
+    async def aclose(self) -> None:
+        client = self._client
+        self._client = None
+        if client is not None:
+            await client.aclose()

    async def chat_messages(
        self,
@@ -38,37 +123,32 @@ class AsyncDifyServiceClient:
        if response_mode != 'streaming':
            raise DifyAPIError('当前仅支持 streaming 模式')

-        async with httpx.AsyncClient(
-            base_url=self.base_url,
-            trust_env=True,
-            timeout=timeout,
-        ) as client:
-            payload = {
-                'inputs': inputs,
-                'query': query,
-                'user': user,
-                'response_mode': response_mode,
-                'conversation_id': conversation_id,
-                'files': files,
-                'model_config': model_config or {},
-            }
+        client = self._get_client()
+        payload = {
+            'inputs': inputs,
+            'query': query,
+            'user': user,
+            'response_mode': response_mode,
+            'conversation_id': conversation_id,
+            'files': files,
+            'model_config': model_config or {},
+        }

-            async with client.stream(
-                'POST',
-                '/chat-messages',
-                headers={
-                    'Authorization': f'Bearer {self.api_key}',
-                    'Content-Type': 'application/json',
-                },
-                json=payload,
-            ) as r:
-                async for chunk in r.aiter_lines():
-                    if r.status_code != 200:
-                        raise DifyAPIError(f'{r.status_code} {chunk}')
-                    if chunk.strip() == '':
-                        continue
-                    if chunk.startswith('data:'):
-                        yield json.loads(chunk[5:])
+        async with client.stream(
+            'POST',
+            '/chat-messages',
+            headers={
+                'Authorization': f'Bearer {self.api_key}',
+                'Content-Type': 'application/json',
+            },
+            json=payload,
+            timeout=timeout,
+        ) as r:
+            if r.status_code != 200:
+                body = await _read_limited_response(r)
+                raise DifyAPIError(f'{r.status_code} {body.decode(errors="replace")}')
+            async for event in _iter_sse_json(r):
+                yield event

    async def workflow_run(
        self,
@@ -82,32 +162,27 @@ class AsyncDifyServiceClient:
        if response_mode != 'streaming':
            raise DifyAPIError('当前仅支持 streaming 模式')

-        async with httpx.AsyncClient(
-            base_url=self.base_url,
-            trust_env=True,
+        client = self._get_client()
+        async with client.stream(
+            'POST',
+            '/workflows/run',
+            headers={
+                'Authorization': f'Bearer {self.api_key}',
+                'Content-Type': 'application/json',
+            },
+            json={
+                'inputs': inputs,
+                'user': user,
+                'response_mode': response_mode,
+                'files': files,
+            },
            timeout=timeout,
-        ) as client:
-            async with client.stream(
-                'POST',
-                '/workflows/run',
-                headers={
-                    'Authorization': f'Bearer {self.api_key}',
-                    'Content-Type': 'application/json',
-                },
-                json={
-                    'inputs': inputs,
-                    'user': user,
-                    'response_mode': response_mode,
-                    'files': files,
-                },
-            ) as r:
-                async for chunk in r.aiter_lines():
-                    if r.status_code != 200:
-                        raise DifyAPIError(f'{r.status_code} {chunk}')
-                    if chunk.strip() == '':
-                        continue
-                    if chunk.startswith('data:'):
-                        yield json.loads(chunk[5:])
+        ) as r:
+            if r.status_code != 200:
+                body = await _read_limited_response(r)
+                raise DifyAPIError(f'{r.status_code} {body.decode(errors="replace")}')
+            async for event in _iter_sse_json(r):
+                yield event

    async def workflow_submit(
        self,
@@ -129,41 +204,38 @@ class AsyncDifyServiceClient:
            'Content-Type': 'application/json',
        }

-        async with httpx.AsyncClient(
-            base_url=self.base_url,
-            trust_env=True,
+        client = self._get_client()
+        # Step 1: Submit the form
+        payload: dict[str, typing.Any] = {
+            'inputs': inputs if isinstance(inputs, dict) else {},
+            'user': user,
+            'action': action,
+        }
+
+        async with client.stream(
+            'POST',
+            f'/form/human_input/{form_token}',
+            headers=headers,
+            json=payload,
            timeout=timeout,
-        ) as client:
-            # Step 1: Submit the form
-            payload: dict[str, typing.Any] = {
-                'inputs': inputs if isinstance(inputs, dict) else {},
-                'user': user,
-                'action': action,
-            }
-
-            submit_resp = await client.post(
-                f'/form/human_input/{form_token}',
-                headers=headers,
-                json=payload,
-            )
+        ) as submit_resp:
+            submit_body = await _read_limited_response(submit_resp)
            if submit_resp.status_code != 200:
-                raise DifyAPIError(f'{submit_resp.status_code} {submit_resp.text}')
+                raise DifyAPIError(f'{submit_resp.status_code} {submit_body.decode(errors="replace")}')

-            # Step 2: Stream resumed workflow events
-            async with client.stream(
-                'GET',
-                f'/workflow/{workflow_run_id}/events',
-                headers={'Authorization': f'Bearer {self.api_key}'},
-                params={'user': user},
-            ) as r:
-                if r.status_code != 200:
-                    body = (await r.aread()).decode(errors='replace')
-                    raise DifyAPIError(f'{r.status_code} {body}')
-                async for chunk in r.aiter_lines():
-                    if chunk.strip() == '':
-                        continue
-                    if chunk.startswith('data:'):
-                        yield json.loads(chunk[5:])
+        # Step 2: Stream resumed workflow events
+        async with client.stream(
+            'GET',
+            f'/workflow/{workflow_run_id}/events',
+            headers={'Authorization': f'Bearer {self.api_key}'},
+            params={'user': user},
+            timeout=timeout,
+        ) as r:
+            if r.status_code != 200:
+                body = await _read_limited_response(r)
+                raise DifyAPIError(f'{r.status_code} {body.decode(errors="replace")}')
+            async for event in _iter_sse_json(r):
+                yield event

    async def upload_file(
        self,
@@ -175,37 +247,30 @@ class AsyncDifyServiceClient:
        if isinstance(file, Path):
            if not file.exists():
                raise ValueError(f'File not found: {file}')
-            with open(file, 'rb') as f:
-                file = f.read()
+            file = await asyncio.to_thread(_read_local_file_limited, file)

        # 处理文件路径字符串
        elif isinstance(file, str):
            if not os.path.isfile(file):
                raise ValueError(f'File not found: {file}')
-            with open(file, 'rb') as f:
-                file = f.read()
+            file = await asyncio.to_thread(_read_local_file_limited, Path(file))

        # 处理文件对象
        elif hasattr(file, 'read'):
-            file = file.read()
-        async with httpx.AsyncClient(
-            base_url=self.base_url,
-            trust_env=True,
+            file = await asyncio.to_thread(file.read, _MAX_DIFY_UPLOAD_BYTES + 1)
+            if len(file) > _MAX_DIFY_UPLOAD_BYTES:
+                raise ValueError('Dify upload exceeds the size limit')
+        client = self._get_client()
+        # multipart/form-data
+        async with client.stream(
+            'POST',
+            '/files/upload',
+            headers={'Authorization': f'Bearer {self.api_key}'},
+            files={'file': file},
+            data={'user': user},
            timeout=timeout,
-        ) as client:
-            # multipart/form-data
-            response = await client.post(
-                '/files/upload',
-                headers={'Authorization': f'Bearer {self.api_key}'},
-                files={
-                    'file': file,
-                },
-                data={
-                    'user': user,
-                },
-            )
-
+        ) as response:
+            body = await _read_limited_response(response)
            if response.status_code != 201:
-                raise DifyAPIError(f'{response.status_code} {response.text}')
-
-            return response.json()
+                raise DifyAPIError(f'{response.status_code} {body.decode(errors="replace")}')
+        return json.loads(body)
@@ -7,6 +7,7 @@ import time
 import typing
 import uuid
 import urllib.parse
+from contextlib import asynccontextmanager
 from typing import Awaitable, Callable, Optional
 import dingtalk_stream  # type: ignore
 import websockets
@@ -15,12 +16,42 @@ from .card_callback import DingTalkCardActionHandler
 from .dingtalkevent import DingTalkEvent
 import httpx
 import traceback
+from langbot.pkg.utils import httpclient


 _stdout_logger = logging.getLogger('langbot.dingtalk_api')


 DINGTALK_OPENAPI_BASE = 'https://api.dingtalk.com'
+_MAX_MEDIA_BYTES = 10 * 1024 * 1024
+_MAX_GATEWAY_MESSAGE_BYTES = 1024 * 1024
+
+
+def _read_local_media_limited(file_path: str) -> bytes:
+    if os.path.getsize(file_path) > _MAX_MEDIA_BYTES:
+        raise ValueError('DingTalk media exceeds the size limit')
+    with open(file_path, 'rb') as file:
+        body = file.read(_MAX_MEDIA_BYTES + 1)
+    if len(body) > _MAX_MEDIA_BYTES:
+        raise ValueError('DingTalk media exceeds the size limit')
+    return body
+
+
+async def _read_httpx_media_limited(response: httpx.Response) -> bytes:
+    content_length = response.headers.get('Content-Length')
+    if content_length is not None:
+        try:
+            if int(content_length) > _MAX_MEDIA_BYTES:
+                raise ValueError('DingTalk media exceeds the size limit')
+        except (TypeError, ValueError) as exc:
+            if 'exceeds' in str(exc):
+                raise
+    body = bytearray()
+    async for chunk in response.aiter_bytes():
+        body.extend(chunk)
+        if len(body) > _MAX_MEDIA_BYTES:
+            raise ValueError('DingTalk media exceeds the size limit')
+    return bytes(body)


 def _stringify_card_param_map(card_param_map: Optional[dict]) -> dict:
@@ -44,6 +75,8 @@ def _stringify_card_param_map(card_param_map: Optional[dict]) -> dict:


 class DingTalkClient:
+    _MAX_INBOUND_TASKS = 100
+
    def __init__(
        self,
        client_id: str,
@@ -86,6 +119,37 @@ class DingTalkClient:
        self.legacy_access_token = ''
        self.legacy_access_token_expiry_time: typing.Optional[float] = None
        self._stopped = False  # Flag to control the event loop
+        self._inbound_tasks: set[asyncio.Task] = set()
+        self._http_client: httpx.AsyncClient | None = None
+
+    @asynccontextmanager
+    async def _http_client_context(self):
+        """Reuse one connection pool while preserving existing call structure."""
+
+        if self._http_client is None or self._http_client.is_closed:
+            self._http_client = httpx.AsyncClient(event_hooks=httpclient.httpx_response_limit_hooks())
+        yield self._http_client
+
+    def _start_inbound_task(self, coro: typing.Coroutine) -> bool:
+        """Start one bounded inbound callback task."""
+
+        for task in tuple(self._inbound_tasks):
+            if task.done():
+                self._inbound_tasks.discard(task)
+        if len(self._inbound_tasks) >= self._MAX_INBOUND_TASKS:
+            coro.close()
+            return False
+
+        task = asyncio.create_task(coro)
+        self._inbound_tasks.add(task)
+
+        def done(done_task: asyncio.Task) -> None:
+            self._inbound_tasks.discard(done_task)
+            if not done_task.cancelled():
+                done_task.exception()
+
+        task.add_done_callback(done)
+        return True

    async def _on_card_action(self, payload: dict) -> None:
        """Dispatch a parsed card-action payload to the adapter callback."""
@@ -101,11 +165,11 @@ class DingTalkClient:
        url = 'https://api.dingtalk.com/v1.0/oauth2/accessToken'
        headers = {'Content-Type': 'application/json'}
        data = {'appKey': self.key, 'appSecret': self.secret}
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            try:
                response = await client.post(url, json=data, headers=headers)
                if response.status_code == 200:
-                    response_data = response.json()
+                    response_data = await httpclient.parse_json_response(response)
                    self.access_token = response_data.get('accessToken')
                    expires_in = int(response_data.get('expireIn', 7200))
                    self.access_token_expiry_time = time.time() + expires_in - 60
@@ -129,28 +193,28 @@ class DingTalkClient:
        url = 'https://api.dingtalk.com/v1.0/robot/messageFiles/download'
        params = {'downloadCode': download_code, 'robotCode': self.robot_code}
        headers = {'x-acs-dingtalk-access-token': self.access_token}
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            response = await client.post(url, headers=headers, json=params)
            if response.status_code == 200:
-                result = response.json()
+                result = await httpclient.parse_json_response(response)
                download_url = result.get('downloadUrl')
            else:
-                await self.logger.error(f'failed to get download url: {response.json()}')
+                error_payload = await httpclient.parse_json_response(response)
+                await self.logger.error(f'failed to get download url: {error_payload}')

        if download_url:
            return await self.download_url_to_base64(download_url)

    async def download_url_to_base64(self, download_url):
-        async with httpx.AsyncClient() as client:
-            response = await client.get(download_url)
-
-            if response.status_code == 200:
-                file_bytes = response.content
-                mime_type = response.headers.get('Content-Type', 'application/octet-stream')
-                base64_str = base64.b64encode(file_bytes).decode('utf-8')
-                return f'data:{mime_type};base64,{base64_str}'
-            else:
-                await self.logger.error(f'failed to get files: {response.json()}')
+        async with self._http_client_context() as client:
+            async with client.stream('GET', download_url) as response:
+                if response.status_code == 200:
+                    file_bytes = await _read_httpx_media_limited(response)
+                    mime_type = response.headers.get('Content-Type', 'application/octet-stream')
+                    base64_str = (await asyncio.to_thread(base64.b64encode, file_bytes)).decode('utf-8')
+                    return f'data:{mime_type};base64,{base64_str}'
+                error_body = await _read_httpx_media_limited(response)
+                await self.logger.error(f'failed to get files: {error_body[:300]!r}')

    async def get_audio_url(self, download_code: str):
        if not await self.check_access_token():
@@ -158,17 +222,19 @@ class DingTalkClient:
        url = 'https://api.dingtalk.com/v1.0/robot/messageFiles/download'
        params = {'downloadCode': download_code, 'robotCode': self.robot_code}
        headers = {'x-acs-dingtalk-access-token': self.access_token}
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            response = await client.post(url, headers=headers, json=params)
            if response.status_code == 200:
-                result = response.json()
+                result = await httpclient.parse_json_response(response)
                download_url = result.get('downloadUrl')
                if download_url:
                    return await self.download_url_to_base64(download_url)
                else:
-                    await self.logger.error(f'failed to get audio: {response.json()}')
+                    error_payload = await httpclient.parse_json_response(response)
+                    await self.logger.error(f'failed to get audio: {error_payload}')
            else:
-                raise Exception(f'Error: {response.status_code}, {response.text}')
+                body = await httpclient.response_text(response)
+                raise Exception(f'Error: {response.status_code}, {body}')

    async def get_file_url(self, download_code: str):
        if not await self.check_access_token():
@@ -176,17 +242,19 @@ class DingTalkClient:
        url = 'https://api.dingtalk.com/v1.0/robot/messageFiles/download'
        params = {'downloadCode': download_code, 'robotCode': self.robot_code}
        headers = {'x-acs-dingtalk-access-token': self.access_token}
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            response = await client.post(url, headers=headers, json=params)
            if response.status_code == 200:
-                result = response.json()
+                result = await httpclient.parse_json_response(response)
                download_url = result.get('downloadUrl')
                if download_url:
                    return download_url
                else:
-                    await self.logger.error(f'failed to get file: {response.json()}')
+                    error_payload = await httpclient.parse_json_response(response)
+                    await self.logger.error(f'failed to get file: {error_payload}')
            else:
-                raise Exception(f'Error: {response.status_code}, {response.text}')
+                body = await httpclient.response_text(response)
+                raise Exception(f'Error: {response.status_code}, {body}')

    async def update_incoming_message(self, message):
        """异步更新 DingTalkClient 中的 incoming_message"""
@@ -503,12 +571,13 @@ class DingTalkClient:
            len(content),
        )
        try:
-            async with httpx.AsyncClient() as client:
+            async with self._http_client_context() as client:
                response = await client.post(url, headers=headers, json=data)
+                response_body = await httpclient.response_text(response, max_chars=500)
                _stdout_logger.info(
                    'DingTalk send_proactive_message_to_one response: status=%d body=%s',
                    response.status_code,
-                    response.text[:500],
+                    response_body,
                )
                if response.status_code == 200:
                    return
@@ -535,7 +604,7 @@ class DingTalkClient:
            'msgParam': json.dumps({'content': content}),
        }
        try:
-            async with httpx.AsyncClient() as client:
+            async with self._http_client_context() as client:
                response = await client.post(url, headers=headers, json=data)
                if response.status_code == 200:
                    return
@@ -667,22 +736,23 @@ class DingTalkClient:
                'DingTalk createAndDeliver request body: %s',
                json.dumps(body, ensure_ascii=False)[:1500],
            )
-            async with httpx.AsyncClient() as client:
+            async with self._http_client_context() as client:
                response = await client.post(url, headers=headers, json=body, timeout=30.0)
+                response_body = await httpclient.response_text(response, max_chars=500)
                if response.status_code == 200:
                    _stdout_logger.info(
                        'DingTalk createAndDeliver response: %s',
-                        response.text[:500],
+                        response_body,
                    )
                    return True
                _stdout_logger.error(
                    'DingTalk createAndDeliver failed: status=%s body=%s',
                    response.status_code,
-                    response.text,
+                    response_body,
                )
                if self.logger:
                    await self.logger.error(
-                        f'DingTalk createAndDeliver failed: status={response.status_code} body={response.text}'
+                        f'DingTalk createAndDeliver failed: status={response.status_code} body={response_body}'
                    )
                return False
        except Exception:
@@ -725,13 +795,14 @@ class DingTalkClient:
            'Content-Type': 'application/json',
        }
        try:
-            async with httpx.AsyncClient() as client:
+            async with self._http_client_context() as client:
                response = await client.put(url, headers=headers, json=body, timeout=30.0)
                if response.status_code == 200:
                    return True
                if self.logger:
+                    response_body = await httpclient.response_text(response)
                    await self.logger.error(
-                        f'DingTalk card streaming failed: status={response.status_code} body={response.text}'
+                        f'DingTalk card streaming failed: status={response.status_code} body={response_body}'
                    )
                return False
        except Exception:
@@ -768,18 +839,19 @@ class DingTalkClient:
                out_track_id,
                json.dumps(body, ensure_ascii=False)[:1500],
            )
-            async with httpx.AsyncClient() as client:
+            async with self._http_client_context() as client:
                response = await client.put(url, headers=headers, json=body, timeout=30.0)
+                response_body = await httpclient.response_text(response, max_chars=300)
                _stdout_logger.info(
                    'DingTalk update_card_data response: status=%d body=%s',
                    response.status_code,
-                    response.text[:300],
+                    response_body,
                )
                if response.status_code == 200:
                    return True
                if self.logger:
                    await self.logger.error(
-                        f'DingTalk update card failed: status={response.status_code} body={response.text}'
+                        f'DingTalk update card failed: status={response.status_code} body={response_body}'
                    )
                return False
        except Exception:
@@ -808,17 +880,18 @@ class DingTalkClient:

        url = 'https://oapi.dingtalk.com/gettoken'
        try:
-            async with httpx.AsyncClient() as client:
+            async with self._http_client_context() as client:
                response = await client.get(url, params={'appkey': self.key, 'appsecret': self.secret}, timeout=15.0)
-            data = response.json() if response.status_code == 200 else {}
+            data = await httpclient.parse_json_response(response) if response.status_code == 200 else {}
            if data.get('errcode') == 0 and data.get('access_token'):
                self.legacy_access_token = data['access_token']
                expires_in = int(data.get('expires_in', 7200))
                self.legacy_access_token_expiry_time = now + expires_in - 60
                return self.legacy_access_token
            if self.logger:
+                response_body = await httpclient.response_text(response, max_chars=200)
                await self.logger.error(
-                    f'DingTalk legacy gettoken failed: status={response.status_code} body={response.text[:200]}'
+                    f'DingTalk legacy gettoken failed: status={response.status_code} body={response_body}'
                )
        except Exception:
            _stdout_logger.exception('DingTalk legacy gettoken error')
@@ -848,8 +921,7 @@ class DingTalkClient:

        url = 'https://oapi.dingtalk.com/media/upload'
        try:
-            with open(file_path, 'rb') as f:
-                file_bytes = f.read()
+            file_bytes = await asyncio.to_thread(_read_local_media_limited, file_path)
            file_name = os.path.basename(file_path)
            # Best-effort content-type guess; DingTalk accepts the major image
            # mime types and otherwise infers from the bytes.
@@ -857,20 +929,21 @@ class DingTalkClient:
            mime = {'png': 'image/png', 'jpg': 'image/jpeg', 'jpeg': 'image/jpeg', 'gif': 'image/gif'}.get(
                ext, 'application/octet-stream'
            )
-            async with httpx.AsyncClient() as client:
+            async with self._http_client_context() as client:
                response = await client.post(
                    url,
                    params={'access_token': token, 'type': 'image'},
                    files={'media': (file_name, file_bytes, mime)},
                    timeout=30.0,
                )
-            data = response.json() if response.status_code == 200 else {}
+            data = await httpclient.parse_json_response(response) if response.status_code == 200 else {}
            if data.get('errcode') == 0 and data.get('media_id'):
                _stdout_logger.info('DingTalk upload_image_media OK: media_id=%s', data['media_id'])
                return data['media_id']
            if self.logger:
+                response_body = await httpclient.response_text(response, max_chars=300)
                await self.logger.error(
-                    f'DingTalk upload_image_media failed: status={response.status_code} body={response.text[:300]}'
+                    f'DingTalk upload_image_media failed: status={response.status_code} body={response_body}'
                )
        except Exception:
            _stdout_logger.exception('DingTalk upload_image_media error')
@@ -885,7 +958,10 @@ class DingTalkClient:

        while not self._stopped:
            try:
-                connection = self.client.open_connection()
+                # open_connection performs blocking network I/O in the DingTalk SDK.
+                # Run it off the event loop so connection stalls do not block the
+                # LangBot HTTP server and other async tasks.
+                connection = await asyncio.to_thread(self.client.open_connection)

                if not connection:
                    if self.logger:
@@ -894,15 +970,19 @@ class DingTalkClient:
                    continue

                uri = '%s?ticket=%s' % (connection['endpoint'], urllib.parse.quote_plus(connection['ticket']))
-                async with websockets.connect(uri) as websocket:
+                async with websockets.connect(uri, max_size=_MAX_GATEWAY_MESSAGE_BYTES) as websocket:
                    self.client.websocket = websocket
                    keepalive_task = asyncio.create_task(self._keepalive(websocket))
                    try:
                        async for raw_message in websocket:
                            if self._stopped:
                                break
-                            json_message = json.loads(raw_message)
-                            asyncio.create_task(self.client.background_task(json_message))
+                            json_message = await asyncio.to_thread(json.loads, raw_message)
+                            if not self._start_inbound_task(self.client.background_task(json_message)):
+                                if self.logger:
+                                    await self.logger.warning(
+                                        'DingTalk inbound task capacity reached; dropping message'
+                                    )
                    finally:
                        keepalive_task.cancel()
                        try:
@@ -945,5 +1025,15 @@ class DingTalkClient:
                await self.client.websocket.close()
            except Exception:
                pass
+        inbound_tasks = list(self._inbound_tasks)
+        for task in inbound_tasks:
+            if not task.done():
+                task.cancel()
+        if inbound_tasks:
+            await asyncio.gather(*inbound_tasks, return_exceptions=True)
+        self._inbound_tasks.clear()
        # Clear message handlers to prevent stale callbacks
        self._message_handlers = {'example': []}
+        if self._http_client is not None:
+            await self._http_client.aclose()
+            self._http_client = None
@@ -21,8 +21,14 @@ xml_template = """
 </xml>
 """

+_MAX_CALLBACK_BODY_BYTES = 1024 * 1024
+

 class OAClient:
+    _STATE_TTL_SECONDS = 600
+    _STATE_MAX = 4096
+    _MAX_CONTENT_CHARS = 200000
+
    def __init__(
        self,
        token: str,
@@ -41,6 +47,7 @@ class OAClient:
        self.access_token = ''
        self.unified_mode = unified_mode
        self.app = Quart(__name__)
+        self.app.config['MAX_CONTENT_LENGTH'] = _MAX_CALLBACK_BODY_BYTES

        # 只有在非统一模式下才注册独立路由
        if not self.unified_mode:
@@ -57,8 +64,38 @@ class OAClient:
        self.access_token_expiry_time = None
        self.msg_id_map = {}
        self.generated_content = {}
+        self._msg_seen_at = {}
+        self._generated_at = {}
+        self._last_state_prune = 0.0
        self.logger = logger

+    def _prune_state(self) -> None:
+        now = time.monotonic()
+        if now - self._last_state_prune >= 60:
+            self._last_state_prune = now
+            for message_id, seen_at in tuple(self._msg_seen_at.items()):
+                if now - seen_at > self._STATE_TTL_SECONDS:
+                    self._msg_seen_at.pop(message_id, None)
+                    self.msg_id_map.pop(message_id, None)
+            for message_id, generated_at in tuple(self._generated_at.items()):
+                if now - generated_at > self._STATE_TTL_SECONDS:
+                    self._generated_at.pop(message_id, None)
+                    self.generated_content.pop(message_id, None)
+        while len(self.msg_id_map) > self._STATE_MAX:
+            message_id = next(iter(self.msg_id_map))
+            self.msg_id_map.pop(message_id, None)
+            self._msg_seen_at.pop(message_id, None)
+        while len(self.generated_content) > self._STATE_MAX:
+            message_id = next(iter(self.generated_content))
+            self.generated_content.pop(message_id, None)
+            self._generated_at.pop(message_id, None)
+
+    def clear(self) -> None:
+        self.msg_id_map.clear()
+        self.generated_content.clear()
+        self._msg_seen_at.clear()
+        self._generated_at.clear()
+
    async def handle_callback_request(self):
        """处理回调请求（独立端口模式，使用全局 request）。"""
        return await self._handle_callback_internal(request)
@@ -104,8 +141,16 @@ class OAClient:
                    raise Exception('拒绝请求')
            elif req.method == 'POST':
                encryt_msg = await req.data
+                if len(encryt_msg) > _MAX_CALLBACK_BODY_BYTES:
+                    raise ValueError('Official Account callback body exceeds the size limit')
                wxcpt = WXBizMsgCrypt(self.token, self.aes, self.appid)
-                ret, xml_msg = wxcpt.DecryptMsg(encryt_msg, msg_signature, timestamp, nonce)
+                ret, xml_msg = await asyncio.to_thread(
+                    wxcpt.DecryptMsg,
+                    encryt_msg,
+                    msg_signature,
+                    timestamp,
+                    nonce,
+                )
                xml_msg = xml_msg.decode('utf-8')

                if ret != 0:
@@ -118,7 +163,7 @@ class OAClient:
                    if event:
                        await self._handle_message(event)

-                root = ET.fromstring(xml_msg)
+                root = await asyncio.to_thread(ET.fromstring, xml_msg)
                from_user = root.find('FromUserName').text  # 发送者
                to_user = root.find('ToUserName').text  # 机器人

@@ -126,6 +171,7 @@ class OAClient:
                interval = 0.1
                while True:
                    content = self.generated_content.pop(message_data['MsgId'], None)
+                    self._generated_at.pop(message_data['MsgId'], None)
                    if content:
                        response_xml = xml_template.format(
                            to_user=from_user,
@@ -156,7 +202,7 @@ class OAClient:
            traceback.print_exc()

    async def get_message(self, xml_msg: str):
-        root = ET.fromstring(xml_msg)
+        root = await asyncio.to_thread(ET.fromstring, xml_msg)

        message_data = {
            'ToUserName': root.find('ToUserName').text,
@@ -193,21 +239,30 @@ class OAClient:
        处理消息事件。
        """
        message_id = event.message_id
+        self._prune_state()
        if message_id in self.msg_id_map.keys():
            self.msg_id_map[message_id] += 1
+            self._msg_seen_at[message_id] = time.monotonic()
            return

        self.msg_id_map[message_id] = 1
+        self._msg_seen_at[message_id] = time.monotonic()
        msg_type = event.type
        if msg_type in self._message_handlers:
            for handler in self._message_handlers[msg_type]:
                await handler(event)

    async def set_message(self, msg_id: int, content: str):
-        self.generated_content[msg_id] = content
+        self.generated_content[msg_id] = str(content)[: self._MAX_CONTENT_CHARS]
+        self._generated_at[msg_id] = time.monotonic()
+        self._prune_state()


 class OAClientForLongerResponse:
+    _MAX_USERS = 4096
+    _MAX_MESSAGES_PER_USER = 20
+    _MAX_CONTENT_CHARS = 200000
+
    def __init__(
        self,
        token: str,
@@ -227,6 +282,7 @@ class OAClientForLongerResponse:
        self.access_token = ''
        self.unified_mode = unified_mode
        self.app = Quart(__name__)
+        self.app.config['MAX_CONTENT_LENGTH'] = _MAX_CALLBACK_BODY_BYTES

        # 只有在非统一模式下才注册独立路由
        if not self.unified_mode:
@@ -244,8 +300,28 @@ class OAClientForLongerResponse:
        self.loading_message = LoadingMessage
        self.msg_queue = {}
        self.user_msg_queue = {}
+        self._last_queue_cleanup = 0.0
        self.logger = logger

+    def _prune_queues(self) -> None:
+        now = time.monotonic()
+        if now - self._last_queue_cleanup >= 60:
+            self._last_queue_cleanup = now
+            for user_id, queue in tuple(self.msg_queue.items()):
+                if not queue:
+                    self.msg_queue.pop(user_id, None)
+            for user_id, queue in tuple(self.user_msg_queue.items()):
+                if not queue:
+                    self.user_msg_queue.pop(user_id, None)
+        while len(self.msg_queue) > self._MAX_USERS:
+            self.msg_queue.pop(next(iter(self.msg_queue)), None)
+        while len(self.user_msg_queue) > self._MAX_USERS:
+            self.user_msg_queue.pop(next(iter(self.user_msg_queue)), None)
+
+    def clear(self) -> None:
+        self.msg_queue.clear()
+        self.user_msg_queue.clear()
+
    async def handle_callback_request(self):
        """处理回调请求（独立端口模式，使用全局 request）。"""
        return await self._handle_callback_internal(request)
@@ -285,8 +361,16 @@ class OAClientForLongerResponse:

            elif req.method == 'POST':
                encryt_msg = await req.data
+                if len(encryt_msg) > _MAX_CALLBACK_BODY_BYTES:
+                    raise ValueError('Official Account callback body exceeds the size limit')
                wxcpt = WXBizMsgCrypt(self.token, self.aes, self.appid)
-                ret, xml_msg = wxcpt.DecryptMsg(encryt_msg, msg_signature, timestamp, nonce)
+                ret, xml_msg = await asyncio.to_thread(
+                    wxcpt.DecryptMsg,
+                    encryt_msg,
+                    msg_signature,
+                    timestamp,
+                    nonce,
+                )
                xml_msg = xml_msg.decode('utf-8')

                if ret != 0:
@@ -294,7 +378,7 @@ class OAClientForLongerResponse:
                    raise Exception('消息解密失败')

                # 解析 XML
-                root = ET.fromstring(xml_msg)
+                root = await asyncio.to_thread(ET.fromstring, xml_msg)
                from_user = root.find('FromUserName').text
                to_user = root.find('ToUserName').text

@@ -305,6 +389,7 @@ class OAClientForLongerResponse:
                    # 弹出用户消息
                    if self.user_msg_queue.get(from_user) and self.user_msg_queue[from_user]:
                        self.user_msg_queue[from_user].pop(0)
+                    self._prune_queues()

                    response_xml = xml_template.format(
                        to_user=from_user,
@@ -332,9 +417,13 @@ class OAClientForLongerResponse:
                            if event:
                                self.user_msg_queue.setdefault(from_user, []).append(
                                    {
-                                        'content': event.message,
+                                        'content': str(event.message)[: self._MAX_CONTENT_CHARS],
                                    }
                                )
+                                self.user_msg_queue[from_user] = self.user_msg_queue[from_user][
+                                    -self._MAX_MESSAGES_PER_USER :
+                                ]
+                                self._prune_queues()
                                await self._handle_message(event)

                        return response_xml
@@ -344,7 +433,7 @@ class OAClientForLongerResponse:
            traceback.print_exc()

    async def get_message(self, xml_msg: str):
-        root = ET.fromstring(xml_msg)
+        root = await asyncio.to_thread(ET.fromstring, xml_msg)

        message_data = {
            'ToUserName': root.find('ToUserName').text,
@@ -393,6 +482,8 @@ class OAClientForLongerResponse:
        self.msg_queue[from_user].append(
            {
                'msg_id': message_id,
-                'content': content,
+                'content': str(content)[: self._MAX_CONTENT_CHARS],
            }
        )
+        self.msg_queue[from_user] = self.msg_queue[from_user][-self._MAX_MESSAGES_PER_USER :]
+        self._prune_queues()
@@ -10,7 +10,9 @@ from __future__ import annotations

 import asyncio
 import base64
+import hashlib
 import io
+import json
 import logging
 import os
 import struct
@@ -21,6 +23,8 @@ from urllib.parse import quote

 import aiohttp

+from langbot.pkg.utils import httpclient
+
 from .types import (
    ApiError,
    CDNMedia,
@@ -58,6 +62,51 @@ DEFAULT_BOT_TYPE = '3'

 # Maximum text length per message chunk (WeChat limit)
 MAX_TEXT_CHUNK_SIZE = 2000
+MAX_CDN_MEDIA_BYTES = 10 * 1024 * 1024
+
+
+async def _response_text(response: aiohttp.ClientResponse) -> str:
+    body = await httpclient.read_limited(
+        response,
+        max_bytes=MAX_CDN_MEDIA_BYTES,
+    )
+    return body.decode('utf-8', errors='replace')
+
+
+async def _response_json(response: aiohttp.ClientResponse) -> dict:
+    payload = json.loads(await _response_text(response))
+    if not isinstance(payload, dict):
+        raise ApiError('OpenClaw API returned a non-object response', status=response.status)
+    return payload
+
+
+def _decrypt_cdn_payload(encrypted: bytes, aes_key: bytes) -> bytes:
+    from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
+    from cryptography.hazmat.primitives.padding import PKCS7
+
+    cipher = Cipher(algorithms.AES(aes_key), modes.ECB())
+    decryptor = cipher.decryptor()
+    padded = decryptor.update(encrypted) + decryptor.finalize()
+    unpadder = PKCS7(128).unpadder()
+    return unpadder.update(padded) + unpadder.finalize()
+
+
+def _encrypt_cdn_payload(
+    file_bytes: bytes,
+) -> tuple[str, str, bytes, str]:
+    from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
+    from cryptography.hazmat.primitives.padding import PKCS7
+
+    raw_key = os.urandom(16)
+    aes_key_hex = raw_key.hex()
+    encoded_key = base64.b64encode(aes_key_hex.encode('utf-8')).decode('utf-8')
+    padder = PKCS7(128).padder()
+    padded = padder.update(file_bytes) + padder.finalize()
+    cipher = Cipher(algorithms.AES(raw_key), modes.ECB())
+    encryptor = cipher.encryptor()
+    encrypted = encryptor.update(padded) + encryptor.finalize()
+    raw_md5 = hashlib.md5(file_bytes).hexdigest()
+    return aes_key_hex, encoded_key, encrypted, raw_md5


 def _random_wechat_uin() -> str:
@@ -125,12 +174,12 @@ class OpenClawWeixinClient:
            url, json=payload, headers=headers, timeout=aiohttp.ClientTimeout(total=timeout)
        ) as resp:
            if resp.status != 200:
-                text = await resp.text()
+                text = await _response_text(resp)
                raise ApiError(
                    f'OpenClaw API error {resp.status}: {text}',
                    status=resp.status,
                )
-            data = await resp.json(content_type=None)
+            data = await _response_json(resp)

        # Check for application-level errors in the response body
        errcode = data.get('errcode') or data.get('ret')
@@ -170,12 +219,12 @@ class OpenClawWeixinClient:
                timeout=aiohttp.ClientTimeout(total=timeout),
            ) as resp:
                if resp.status != 200:
-                    text = await resp.text()
+                    text = await _response_text(resp)
                    raise ApiError(
                        f'OpenClaw API error {resp.status}: {text}',
                        status=resp.status,
                    )
-                data = await resp.json(content_type=None)
+                data = await _response_json(resp)

        except (asyncio.TimeoutError, aiohttp.ServerTimeoutError):
            return GetUpdatesResponse(ret=0, msgs=[], get_updates_buf=get_updates_buf)
@@ -258,9 +307,6 @@ class OpenClawWeixinClient:
        Returns:
            Decrypted file bytes.
        """
-        from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
-        from cryptography.hazmat.primitives.padding import PKCS7
-
        if not media.encrypt_query_param:
            raise ApiError('CDN media has no encrypt_query_param', status=0)
        if not media.aes_key:
@@ -285,17 +331,14 @@ class OpenClawWeixinClient:

        async with session.get(cdn_url, timeout=aiohttp.ClientTimeout(total=120)) as resp:
            if resp.status != 200:
-                text = await resp.text()
+                text = await _response_text(resp)
                raise ApiError(f'CDN download failed: {resp.status} {text}', status=resp.status)
-            encrypted = await resp.read()
+            encrypted = await httpclient.read_limited(
+                resp,
+                max_bytes=MAX_CDN_MEDIA_BYTES,
+            )

-        # Decrypt AES-128-ECB with PKCS7 padding
-        cipher = Cipher(algorithms.AES(aes_key), modes.ECB())
-        decryptor = cipher.decryptor()
-        padded = decryptor.update(encrypted) + decryptor.finalize()
-
-        unpadder = PKCS7(128).unpadder()
-        return unpadder.update(padded) + unpadder.finalize()
+        return await asyncio.to_thread(_decrypt_cdn_payload, encrypted, aes_key)

    async def upload_media(
        self,
@@ -313,28 +356,13 @@ class OpenClawWeixinClient:
        Returns:
            CDNMedia with encrypt_query_param and aes_key for use in sendMessage.
        """
-        import hashlib
+        if len(file_bytes) > MAX_CDN_MEDIA_BYTES:
+            raise ApiError('CDN media exceeds the size limit', status=0)

-        from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
-        from cryptography.hazmat.primitives.padding import PKCS7
-
-        # 1. Generate random 16-byte AES key
-        raw_key = os.urandom(16)
-        aes_key_hex = raw_key.hex()  # 32-char hex string
-
-        # 2. Encode key for CDNMedia: base64(hex_string) — same for all media types
-        # Matches official SDK: Buffer.from(aeskey_hex).toString("base64")
-        encoded_key = base64.b64encode(aes_key_hex.encode('utf-8')).decode('utf-8')
-
-        # 3. Encrypt file with AES-128-ECB + PKCS7
-        padder = PKCS7(128).padder()
-        padded = padder.update(file_bytes) + padder.finalize()
-        cipher = Cipher(algorithms.AES(raw_key), modes.ECB())
-        encryptor = cipher.encryptor()
-        encrypted = encryptor.update(padded) + encryptor.finalize()
-
-        # 4. Get upload URL
-        raw_md5 = hashlib.md5(file_bytes).hexdigest()
+        aes_key_hex, encoded_key, encrypted, raw_md5 = await asyncio.to_thread(
+            _encrypt_cdn_payload,
+            file_bytes,
+        )
        filekey = os.urandom(16).hex()  # 32-char hex, matches official SDK

        upload_resp = await self.get_upload_url(
@@ -370,7 +398,7 @@ class OpenClawWeixinClient:
            timeout=aiohttp.ClientTimeout(total=120),
        ) as resp:
            if resp.status != 200:
-                text = await resp.text()
+                text = await _response_text(resp)
                logger.error('CDN upload failed: status=%d url=%s body=%s', resp.status, cdn_url, text[:500])
                raise ApiError(f'CDN upload failed: {resp.status} {text}', status=resp.status)
            download_param = resp.headers.get('x-encrypted-param', '')
@@ -491,12 +519,12 @@ class OpenClawWeixinClient:

        async with session.get(url, timeout=aiohttp.ClientTimeout(total=DEFAULT_API_TIMEOUT)) as resp:
            if resp.status != 200:
-                text = await resp.text()
+                text = await _response_text(resp)
                raise ApiError(
                    f'Failed to fetch QR code: {resp.status} {text}',
                    status=resp.status,
                )
-            data = await resp.json(content_type=None)
+            data = await _response_json(resp)

        logger.debug(
            'fetch_qrcode response: qrcode=%s, img=%s', data.get('qrcode'), bool(data.get('qrcode_img_content'))
@@ -536,12 +564,12 @@ class OpenClawWeixinClient:
                url, headers=headers, timeout=aiohttp.ClientTimeout(total=DEFAULT_QR_POLL_TIMEOUT)
            ) as resp:
                if resp.status != 200:
-                    text = await resp.text()
+                    text = await _response_text(resp)
                    raise ApiError(
                        f'Failed to poll QR status: {resp.status} {text}',
                        status=resp.status,
                    )
-                data = await resp.json(content_type=None)
+                data = await _response_json(resp)
                logger.debug('QR status poll response: %s', data)
        except (asyncio.TimeoutError, aiohttp.ServerTimeoutError):
            return QRStatusResponse(status='wait')
@@ -9,10 +9,13 @@ import langbot_plugin.api.entities.builtin.platform.events as platform_events
 from .qqofficialevent import QQOfficialEvent
 import json
 import traceback
+from contextlib import asynccontextmanager
 from cryptography.hazmat.primitives.asymmetric import ed25519
+from langbot.pkg.utils import httpclient


 QQ_SELECT_ACTION_PREFIX = '__langbot_select__:'
+_MAX_CALLBACK_BODY_BYTES = 1024 * 1024


 def get_select_field_options(form_data: dict) -> tuple[str, list[str]]:
@@ -152,6 +155,7 @@ class QQOfficialClient:
    def __init__(self, secret: str, token: str, app_id: str, logger: None, unified_mode: bool = False):
        self.unified_mode = unified_mode
        self.app = Quart(__name__)
+        self.app.config['MAX_CONTENT_LENGTH'] = _MAX_CALLBACK_BODY_BYTES

        # 只有在非统一模式下才注册独立路由
        if not self.unified_mode:
@@ -176,6 +180,32 @@ class QQOfficialClient:
        self.logger = logger
        self._msg_seq_counter = 0
        self._token_refresh_task: Optional[asyncio.Task] = None
+        self._http_clients: dict[float | None, httpx.AsyncClient] = {}
+
+    @asynccontextmanager
+    async def _http_client_context(self, timeout: float | None = None):
+        client = self._http_clients.get(timeout)
+        if client is None or client.is_closed:
+            response_hooks = httpclient.httpx_response_limit_hooks()
+            client = (
+                httpx.AsyncClient(event_hooks=response_hooks)
+                if timeout is None
+                else httpx.AsyncClient(timeout=timeout, event_hooks=response_hooks)
+            )
+            self._http_clients[timeout] = client
+        yield client
+
+    async def close(self) -> None:
+        """Stop client-owned background work."""
+
+        if self._token_refresh_task and not self._token_refresh_task.done():
+            self._token_refresh_task.cancel()
+            await asyncio.gather(self._token_refresh_task, return_exceptions=True)
+        self._token_refresh_task = None
+        clients = list(self._http_clients.values())
+        self._http_clients.clear()
+        if clients:
+            await asyncio.gather(*(client.aclose() for client in clients), return_exceptions=True)

    async def check_access_token(self):
        """检查access_token是否存在"""
@@ -186,7 +216,7 @@ class QQOfficialClient:
    async def get_access_token(self):
        """获取access_token"""
        url = 'https://bots.qq.com/app/getAppAccessToken'
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            params = {
                'appId': self.app_id,
                'clientSecret': self.secret,
@@ -196,8 +226,9 @@ class QQOfficialClient:
            }
            response = await client.post(url, json=params, headers=headers)
            if response.status_code != 200:
-                raise Exception(f'Failed to get access_token: HTTP {response.status_code} {response.text}')
-            response_data = response.json()
+                body = await httpclient.response_text(response)
+                raise Exception(f'Failed to get access_token: HTTP {response.status_code} {body}')
+            response_data = await httpclient.parse_json_response(response)
            access_token = response_data.get('access_token')
            expires_in = int(response_data.get('expires_in', 7200))
            self.access_token_expiry_time = time.time() + expires_in - 60
@@ -236,8 +267,10 @@ class QQOfficialClient:
            if not body or len(body) == 0:
                await self.logger.info('Received empty body, might be health check or GET request')
                return {'code': 0, 'message': 'ok'}, 200
+            if len(body) > _MAX_CALLBACK_BODY_BYTES:
+                return {'error': 'callback body exceeds the size limit'}, 413

-            payload = json.loads(body)
+            payload = await asyncio.to_thread(json.loads, body)

            if payload.get('op') == 13:
                validation_data = payload.get('d')
@@ -367,7 +400,7 @@ class QQOfficialClient:
            await self.get_access_token()

        url = self.base_url + '/v2/users/' + user_openid + '/messages'
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            headers = {
                'Authorization': f'QQBot {self.access_token}',
                'Content-Type': 'application/json',
@@ -382,7 +415,7 @@ class QQOfficialClient:
            if event_id:
                data['event_id'] = event_id
            response = await client.post(url, headers=headers, json=data)
-            response_data = response.json()
+            response_data = await httpclient.parse_json_response(response)
            if response.status_code == 200:
                return
            else:
@@ -406,7 +439,7 @@ class QQOfficialClient:
            await self.get_access_token()

        url = self.base_url + '/v2/groups/' + group_openid + '/messages'
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            headers = {
                'Authorization': f'QQBot {self.access_token}',
                'Content-Type': 'application/json',
@@ -424,8 +457,9 @@ class QQOfficialClient:
            if response.status_code == 200:
                return
            else:
-                await self.logger.error(f'Failed to send group message: {response.json()}')
-                raise Exception(response.read().decode())
+                error_payload = await httpclient.parse_json_response(response)
+                await self.logger.error(f'Failed to send group message: {error_payload}')
+                raise Exception(str(error_payload))

    async def send_channle_group_text_msg(self, channel_id: str, content: str, msg_id: str):
        """发送频道群聊消息"""
@@ -433,7 +467,7 @@ class QQOfficialClient:
            await self.get_access_token()

        url = self.base_url + '/channels/' + channel_id + '/messages'
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            headers = {
                'Authorization': f'QQBot {self.access_token}',
                'Content-Type': 'application/json',
@@ -447,7 +481,8 @@ class QQOfficialClient:
            if response.status_code == 200:
                return True
            else:
-                await self.logger.error(f'Failed to send channel group message: {response.json()}')
+                error_payload = await httpclient.parse_json_response(response)
+                await self.logger.error(f'Failed to send channel group message: {error_payload}')
                raise Exception(response)

    async def send_channle_private_text_msg(self, guild_id: str, content: str, msg_id: str):
@@ -456,7 +491,7 @@ class QQOfficialClient:
            await self.get_access_token()

        url = self.base_url + '/dms/' + guild_id + '/messages'
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            headers = {
                'Authorization': f'QQBot {self.access_token}',
                'Content-Type': 'application/json',
@@ -470,7 +505,8 @@ class QQOfficialClient:
            if response.status_code == 200:
                return True
            else:
-                await self.logger.error(f'Failed to send channel private message: {response.json()}')
+                error_payload = await httpclient.parse_json_response(response)
+                await self.logger.error(f'Failed to send channel private message: {error_payload}')
                raise Exception(response)

    # ---- 富媒体消息 ----
@@ -532,20 +568,21 @@ class QQOfficialClient:
        if file_type == self.MEDIA_TYPE_FILE and file_name:
            body['file_name'] = file_name

-        async with httpx.AsyncClient(timeout=120) as client:
+        async with self._http_client_context(timeout=120) as client:
            headers = {
                'Authorization': f'QQBot {self.access_token}',
                'Content-Type': 'application/json',
            }
            response = await client.post(url, headers=headers, json=body)
            if response.status_code == 200:
-                data = response.json()
+                data = await httpclient.parse_json_response(response)
                file_info = data.get('file_info', '')
                preview = file_info[:80] + '...' if len(file_info) > 80 else file_info
                await self.logger.info(f'Upload media success, file_info={preview}')
                return file_info
            else:
-                raise Exception(f'Failed to upload media: HTTP {response.status_code} {response.text}')
+                body = await httpclient.response_text(response)
+                raise Exception(f'Failed to upload media: HTTP {response.status_code} {body}')

    async def _send_media_msg(
        self,
@@ -578,7 +615,7 @@ class QQOfficialClient:
        if msg_id:
            body['msg_id'] = msg_id

-        async with httpx.AsyncClient(timeout=120) as client:
+        async with self._http_client_context(timeout=120) as client:
            headers = {
                'Authorization': f'QQBot {self.access_token}',
                'Content-Type': 'application/json',
@@ -586,7 +623,8 @@ class QQOfficialClient:
            await self.logger.info(f'Sending rich media: {json.dumps(body, ensure_ascii=False)[:200]}')
            response = await client.post(url, headers=headers, json=body)
            if response.status_code != 200:
-                raise Exception(f'Failed to send rich media message: HTTP {response.status_code} {response.text}')
+                response_body = await httpclient.response_text(response)
+                raise Exception(f'Failed to send rich media message: HTTP {response.status_code} {response_body}')

    async def send_image_msg(
        self,
@@ -678,15 +716,16 @@ class QQOfficialClient:
        if stream_msg_id:
            body['stream_msg_id'] = stream_msg_id

-        async with httpx.AsyncClient(timeout=120) as client:
+        async with self._http_client_context(timeout=120) as client:
            headers = {
                'Authorization': f'QQBot {self.access_token}',
                'Content-Type': 'application/json',
            }
            response = await client.post(url, headers=headers, json=body)
            if response.status_code != 200:
-                raise Exception(f'Failed to send stream message: HTTP {response.status_code} {response.text}')
-            return response.json()
+                response_body = await httpclient.response_text(response)
+                raise Exception(f'Failed to send stream message: HTTP {response.status_code} {response_body}')
+            return await httpclient.parse_json_response(response)

    async def send_markdown_keyboard(
        self,
@@ -743,18 +782,19 @@ class QQOfficialClient:
        if event_id:
            body['event_id'] = event_id

-        async with httpx.AsyncClient(timeout=30) as client:
+        async with self._http_client_context(timeout=30) as client:
            headers = {
                'Authorization': f'QQBot {self.access_token}',
                'Content-Type': 'application/json',
            }
            response = await client.post(url, headers=headers, json=body)
            if response.status_code != 200:
+                response_body = await httpclient.response_text(response)
                await self.logger.error(
-                    f'Failed to send markdown+keyboard: HTTP {response.status_code} {response.text}'
+                    f'Failed to send markdown+keyboard: HTTP {response.status_code} {response_body}'
                )
-                raise Exception(f'Failed to send markdown+keyboard: HTTP {response.status_code} {response.text}')
-            return response.json()
+                raise Exception(f'Failed to send markdown+keyboard: HTTP {response.status_code} {response_body}')
+            return await httpclient.parse_json_response(response)

    async def ack_interaction(self, interaction_id: str, code: int = 0) -> None:
        """Acknowledge a button-click INTERACTION_CREATE event.
@@ -775,7 +815,7 @@ class QQOfficialClient:
            await self.get_access_token()

        url = f'{self.base_url}/interactions/{interaction_id}'
-        async with httpx.AsyncClient(timeout=10) as client:
+        async with self._http_client_context(timeout=10) as client:
            headers = {
                'Authorization': f'QQBot {self.access_token}',
                'Content-Type': 'application/json',
@@ -783,8 +823,9 @@ class QQOfficialClient:
            try:
                response = await client.put(url, headers=headers, json={'code': code})
                if response.status_code >= 400:
+                    response_body = await httpclient.response_text(response)
                    await self.logger.warning(
-                        f'ack_interaction non-success: HTTP {response.status_code} {response.text}'
+                        f'ack_interaction non-success: HTTP {response.status_code} {response_body}'
                    )
            except Exception as e:
                await self.logger.warning(f'ack_interaction error (non-fatal): {e}')
@@ -796,10 +837,11 @@ class QQOfficialClient:
        return time.time() > self.access_token_expiry_time

    async def repeat_seed(self, bot_secret: str, target_size: int = 32) -> bytes:
-        seed = bot_secret
-        while len(seed) < target_size:
-            seed *= 2
-        return seed[:target_size].encode('utf-8')
+        if not bot_secret:
+            raise ValueError('QQ bot secret must not be empty')
+        target_size = max(int(target_size), 1)
+        repeats = (target_size + len(bot_secret) - 1) // len(bot_secret)
+        return (bot_secret * repeats)[:target_size].encode('utf-8')

    async def verify(self, validation_payload: dict):
        seed = await self.repeat_seed(self.secret)
@@ -843,19 +885,20 @@ class QQOfficialClient:
            await self.get_access_token()

        url = f'{self.base_url}/gateway'
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            headers = {
                'Authorization': f'QQBot {self.access_token}',
            }
            response = await client.get(url, headers=headers)
            if response.status_code == 200:
-                data = response.json()
+                data = await httpclient.parse_json_response(response)
                ws_url = data.get('url', '')
                if not ws_url:
                    raise Exception('Gateway URL is empty')
                return ws_url
            else:
-                raise Exception(f'Failed to get Gateway URL: HTTP {response.status_code} {response.text}')
+                body = await httpclient.response_text(response)
+                raise Exception(f'Failed to get Gateway URL: HTTP {response.status_code} {body}')

    async def _background_token_refresh(self):
        """在 token 到期前主动刷新"""
@@ -935,7 +978,7 @@ class QQOfficialClient:

            try:
                await self.logger.info('Connecting to WebSocket gateway...')
-                ws = await websockets.connect(ws_url)
+                ws = await websockets.connect(ws_url, max_size=_MAX_CALLBACK_BODY_BYTES)
                await self.logger.info('WebSocket connected')
            except Exception as e:
                await self.logger.error(f'WebSocket connection failed: {e}')
@@ -948,7 +991,7 @@ class QQOfficialClient:
            try:
                async for raw_msg in ws:
                    try:
-                        payload = json.loads(raw_msg)
+                        payload = await asyncio.to_thread(json.loads, raw_msg)
                    except json.JSONDecodeError:
                        await self.logger.error(f'Failed to parse message: {raw_msg}')
                        continue
@@ -1,3 +1,4 @@
+import asyncio
 import json
 import traceback
 from quart import Quart, jsonify, request
@@ -6,6 +7,8 @@ from .slackevent import SlackEvent
 from typing import Callable
 import langbot_plugin.api.entities.builtin.platform.events as platform_events

+_MAX_CALLBACK_BODY_BYTES = 1024 * 1024
+

 class SlackClient:
    def __init__(self, bot_token: str, signing_secret: str, logger: None, unified_mode: bool = False):
@@ -13,6 +16,7 @@ class SlackClient:
        self.signing_secret = signing_secret
        self.unified_mode = unified_mode
        self.app = Quart(__name__)
+        self.app.config['MAX_CONTENT_LENGTH'] = _MAX_CALLBACK_BODY_BYTES
        self.client = AsyncWebClient(self.bot_token)

        # 只有在非统一模式下才注册独立路由
@@ -50,7 +54,9 @@ class SlackClient:
        """
        try:
            body = await req.get_data()
-            data = json.loads(body)
+            if len(body) > _MAX_CALLBACK_BODY_BYTES:
+                raise ValueError('Slack callback body exceeds the size limit')
+            data = await asyncio.to_thread(json.loads, body)
            if 'type' in data:
                if data['type'] == 'url_verification':
                    return data['challenge']
@@ -1,7 +1,32 @@
-from langbot.libs.wechatpad_api.util.http_util import post_json
-import httpx
+import asyncio
 import base64

+import httpx
+
+from langbot.libs.wechatpad_api.util.http_util import post_json
+from langbot.pkg.utils import httpclient
+
+
+_MAX_WECHATPAD_MEDIA_BYTES = 16 * 1024 * 1024
+
+
+async def _read_media_limited(response: httpx.Response) -> bytes:
+    content_length = response.headers.get('content-length')
+    if content_length is not None:
+        try:
+            declared_size = int(content_length)
+        except ValueError:
+            declared_size = None
+        if declared_size is not None and declared_size > _MAX_WECHATPAD_MEDIA_BYTES:
+            raise RuntimeError('WeChatPad media exceeds the runtime limit')
+
+    body = bytearray()
+    async for chunk in response.aiter_bytes(chunk_size=64 * 1024):
+        body.extend(chunk)
+        if len(body) > _MAX_WECHATPAD_MEDIA_BYTES:
+            raise RuntimeError('WeChatPad media exceeds the runtime limit')
+    return bytes(body)
+

 class DownloadApi:
    def __init__(self, base_url, token):
@@ -19,12 +44,13 @@ class DownloadApi:
        return post_json(url, token=self.token, data=json_data)

    async def download_url_to_base64(self, download_url):
-        async with httpx.AsyncClient() as client:
-            response = await client.get(download_url)
-
-            if response.status_code == 200:
-                file_bytes = response.content
-                base64_str = base64.b64encode(file_bytes).decode('utf-8')  # 返回字符串格式
-                return base64_str
-            else:
-                raise Exception('获取文件失败')
+        async with httpx.AsyncClient(
+            timeout=30,
+            event_hooks=httpclient.httpx_response_limit_hooks(_MAX_WECHATPAD_MEDIA_BYTES),
+        ) as client:
+            async with client.stream('GET', download_url) as response:
+                if response.status_code != 200:
+                    raise RuntimeError('获取文件失败')
+                file_bytes = await _read_media_limited(response)
+        encoded = await asyncio.to_thread(base64.b64encode, file_bytes)
+        return encoded.decode('utf-8')
@@ -1,6 +1,29 @@
+import json as json_module
+
 import requests
 from langbot.pkg.utils import httpclient

+_MAX_WECHATPAD_RESPONSE_BYTES = 16 * 1024 * 1024
+
+
+def _read_requests_response_limited(response: requests.Response) -> dict:
+    content_length = response.headers.get('Content-Length')
+    if content_length is not None:
+        try:
+            if int(content_length) > _MAX_WECHATPAD_RESPONSE_BYTES:
+                raise RuntimeError('WeChatPad response exceeds the runtime limit')
+        except (TypeError, ValueError):
+            pass
+    body = bytearray()
+    for chunk in response.iter_content(chunk_size=64 * 1024):
+        body.extend(chunk)
+        if len(body) > _MAX_WECHATPAD_RESPONSE_BYTES:
+            raise RuntimeError('WeChatPad response exceeds the runtime limit')
+    result = json_module.loads(body)
+    if not isinstance(result, dict):
+        raise RuntimeError('WeChatPad returned a non-object response')
+    return result
+

 def post_json(base_url, token, data=None):
    headers = {'Content-Type': 'application/json'}
@@ -8,16 +31,21 @@ def post_json(base_url, token, data=None):
    url = base_url + f'?key={token}'

    try:
-        response = requests.post(url, json=data, headers=headers, timeout=60)
-        response.raise_for_status()
-        result = response.json()
+        with requests.post(
+            url,
+            json=data,
+            headers=headers,
+            timeout=60,
+            stream=True,
+        ) as response:
+            response.raise_for_status()
+            result = _read_requests_response_limited(response)

        if result:
            return result
        else:
-            raise RuntimeError(response.text)
+            raise RuntimeError('WeChatPad returned an empty response')
    except Exception as e:
-        print(f'http请求失败, url={url}, exception={e}')
        raise RuntimeError(str(e))


@@ -27,16 +55,20 @@ def get_json(base_url, token):
    url = base_url + f'?key={token}'

    try:
-        response = requests.get(url, headers=headers, timeout=60)
-        response.raise_for_status()
-        result = response.json()
+        with requests.get(
+            url,
+            headers=headers,
+            timeout=60,
+            stream=True,
+        ) as response:
+            response.raise_for_status()
+            result = _read_requests_response_limited(response)

        if result:
            return result
        else:
-            raise RuntimeError(response.text)
+            raise RuntimeError('WeChatPad returned an empty response')
    except Exception as e:
-        print(f'http请求失败, url={url}, exception={e}')
        raise RuntimeError(str(e))


@@ -68,7 +100,12 @@ async def async_request(
        method=method, url=url, params=params, headers=headers, data=data, json=json
    ) as response:
        response.raise_for_status()  # 如果状态码不是200，抛出异常
-        result = await response.json()
+        result = json_module.loads(
+            await httpclient.read_limited(
+                response,
+                max_bytes=_MAX_WECHATPAD_RESPONSE_BYTES,
+            )
+        )
        # print(result)
        return result
        # if result.get('Code') == 200:
@@ -10,13 +10,16 @@ import re
 from typing import Any, Callable, Optional, Tuple
 from urllib.parse import unquote

-import httpx
 from Crypto.Cipher import AES
 from quart import Quart, request, Response, jsonify

 from langbot.libs.wecom_ai_bot_api import wecombotevent
 from langbot.libs.wecom_ai_bot_api.WXBizMsgCrypt3 import WXBizMsgCrypt
 from langbot.pkg.platform.logger import EventLogger
+from langbot.pkg.utils import httpclient
+
+_CLIENT_TRANSIENT_CACHE_MAX = 4096
+_MAX_STREAM_CONTENT_CHARS = 200000


@dataclass
@@ -56,7 +59,7 @@ class StreamSession:
    last_access: float = field(default_factory=time.time)

    # 将流水线增量结果缓存到队列，刷新请求逐条消费
-    queue: asyncio.Queue = field(default_factory=asyncio.Queue)
+    queue: asyncio.Queue = field(default_factory=lambda: asyncio.Queue(maxsize=1))

    # 是否已经完成（收到最终片段）
    finished: bool = False
@@ -85,6 +88,7 @@ class StreamSessionManager:
    # full like → cancel → dislike feedback flow. Must align with the adapter's
    # _stream_to_monitoring_msg TTL (wecombot.py).
    _FEEDBACK_SESSION_TTL = 600  # 10 minutes
+    _MAX_SESSIONS = 4096

    def __init__(self, logger: EventLogger, ttl: int = 60) -> None:
        self.logger = logger
@@ -165,6 +169,26 @@ class StreamSessionManager:
        if task_id:
            self._task_index.pop(task_id, None)

+    def clear(self) -> None:
+        """Release every retained stream and reverse index."""
+
+        self._sessions.clear()
+        self._msg_index.clear()
+        self._feedback_index.clear()
+        self._task_index.clear()
+
+    def _drop_session(self, stream_id: str) -> StreamSession | None:
+        session = self._sessions.pop(stream_id, None)
+        if session is None:
+            return None
+        if session.msg_id and self._msg_index.get(session.msg_id) == stream_id:
+            self._msg_index.pop(session.msg_id, None)
+        if session.feedback_id:
+            self._feedback_index.pop(session.feedback_id, None)
+        if session.pending_form_task_id:
+            self._task_index.pop(session.pending_form_task_id, None)
+        return session
+
    def create_or_get(self, msg_json: dict[str, Any]) -> tuple[StreamSession, bool]:
        """根据企业微信回调创建或获取会话。

@@ -185,6 +209,14 @@ class StreamSessionManager:
                session.last_access = time.time()
                return session, False

+        self.cleanup()
+        while len(self._sessions) >= self._MAX_SESSIONS:
+            oldest_stream_id = min(
+                self._sessions,
+                key=lambda candidate: self._sessions[candidate].last_access,
+            )
+            self._drop_session(oldest_stream_id)
+
        stream_id = str(uuid.uuid4())
        session = StreamSession(
            stream_id=stream_id,
@@ -221,8 +253,13 @@ class StreamSessionManager:
        try:
            session.queue.put_nowait(chunk)
        except asyncio.QueueFull:
-            # 默认无界队列，此处兜底防御
-            await session.queue.put(chunk)
+            # Each chunk is a complete snapshot. Coalesce a slow consumer to
+            # the newest value instead of retaining every intermediate body.
+            try:
+                session.queue.get_nowait()
+            except asyncio.QueueEmpty:
+                pass
+            session.queue.put_nowait(chunk)

        if chunk.is_final:
            session.finished = True
@@ -265,7 +302,7 @@ class StreamSessionManager:
            session.finished = True
            session.last_access = time.time()

-    def cleanup(self) -> None:
+    def cleanup(self) -> list[str]:
        """定期清理过期会话，防止队列与映射无上限累积。

        已注册 feedback_id 的会话使用更长的 TTL，确保用户在点赞/取消/点踩流程中
@@ -279,16 +316,14 @@ class StreamSessionManager:
            if now - session.last_access > effective_ttl:
                expired.append(stream_id)

+        removed_msg_ids: list[str] = []
        for stream_id in expired:
-            session = self._sessions.pop(stream_id, None)
+            session = self._drop_session(stream_id)
            if not session:
                continue
-            msg_id = session.msg_id
-            if msg_id and self._msg_index.get(msg_id) == stream_id:
-                self._msg_index.pop(msg_id, None)
-            # Clean up feedback index for expired sessions
-            if session.feedback_id:
-                self._feedback_index.pop(session.feedback_id, None)
+            if session.msg_id:
+                removed_msg_ids.append(session.msg_id)
+        return removed_msg_ids


 def _decrypt_file(encrypted_data: bytes, aes_key_str: str) -> bytes:
@@ -405,19 +440,19 @@ async def download_encrypted_file(

    filename: Optional[str] = None
    try:
-        async with httpx.AsyncClient(timeout=30.0) as client:
-            response = await client.get(download_url)
-            if response.status_code != 200:
-                await logger.error(f'Failed to download file (HTTP {response.status_code}): {response.text[:200]}')
+        client = httpclient.get_session()
+        async with client.get(download_url, timeout=30.0) as response:
+            if response.status != 200:
+                await logger.error(f'Failed to download file (HTTP {response.status})')
                return None, None
-            encrypted_bytes = response.content
+            encrypted_bytes = await httpclient.read_limited(response)
            filename = _extract_filename(response.headers.get('content-disposition', ''))
    except Exception:
        await logger.error(f'Failed to download file: {traceback.format_exc()}')
        return None, None

    try:
-        decrypted = _decrypt_file(encrypted_bytes, aes_key)
+        decrypted = await asyncio.to_thread(_decrypt_file, encrypted_bytes, aes_key)
        return decrypted, filename
    except Exception:
        await logger.error(f'Failed to decrypt file: {traceback.format_exc()}')
@@ -466,7 +501,7 @@ async def parse_wecom_bot_message(
        """Download, decrypt, and convert to data URI for backward compatibility."""
        data, _filename = await _safe_download(url, per_msg_aeskey)
        if data:
-            return _bytes_to_data_uri(data)
+            return await asyncio.to_thread(_bytes_to_data_uri, data)
        return None

    if msg_type == 'text':
@@ -579,7 +614,10 @@ async def parse_wecom_bot_message(
                if (file_data.get('filesize') or 0) <= max_inline_file_size:
                    file_bytes, dl_filename = await _safe_download(download_url, item_aeskey)
                    if file_bytes:
-                        file_data['base64'] = _bytes_to_data_uri(file_bytes)
+                        file_data['base64'] = await asyncio.to_thread(
+                            _bytes_to_data_uri,
+                            file_bytes,
+                        )
                        if dl_filename and not file_data.get('filename'):
                            file_data['filename'] = dl_filename
                files.append(file_data)
@@ -1567,6 +1605,8 @@ def build_multiple_interaction_update_card(


 class WecomBotClient:
+    _MAX_DISPATCH_TASKS = 100
+
    def __init__(
        self,
        Token: str,
@@ -1613,6 +1653,7 @@ class WecomBotClient:
        self._feedback_callback: Optional[Callable] = None
        self._card_action_callback: Optional[Callable] = None
        self._stream_last_content: dict[str, str] = {}
+        self._dispatch_tasks: set[asyncio.Task] = set()
        # Optional `source` block injected into every interactive template_card
        # the client builds. Set via `set_card_source` from the adapter after
        # reading config. Format: {icon_url, desc, desc_color}.
@@ -1695,7 +1736,12 @@ class WecomBotClient:
        """
        reply_plain_str = json.dumps(payload, ensure_ascii=False)
        reply_timestamp = str(int(time.time()))
-        ret, encrypt_text = self.wxcpt.EncryptMsg(reply_plain_str, nonce, reply_timestamp)
+        ret, encrypt_text = await asyncio.to_thread(
+            self.wxcpt.EncryptMsg,
+            reply_plain_str,
+            nonce,
+            reply_timestamp,
+        )
        if ret != 0:
            await self.logger.error(f'加密失败: {ret}')
            return jsonify({'error': 'encrypt_failed'}), 500
@@ -1718,6 +1764,41 @@ class WecomBotClient:
        except Exception:
            await self.logger.error(traceback.format_exc())

+    def _start_dispatch_task(self, event: wecombotevent.WecomBotEvent) -> bool:
+        """Start one bounded pipeline dispatch task."""
+
+        for task in tuple(self._dispatch_tasks):
+            if task.done():
+                self._dispatch_tasks.discard(task)
+        if len(self._dispatch_tasks) >= self._MAX_DISPATCH_TASKS:
+            return False
+
+        task = asyncio.create_task(self._dispatch_event(event))
+        self._dispatch_tasks.add(task)
+
+        def done(done_task: asyncio.Task) -> None:
+            self._dispatch_tasks.discard(done_task)
+            if not done_task.cancelled():
+                done_task.exception()
+
+        task.add_done_callback(done)
+        return True
+
+    async def close(self) -> None:
+        """Cancel callbacks and release retained webhook state."""
+
+        dispatch_tasks = list(self._dispatch_tasks)
+        for task in dispatch_tasks:
+            if not task.done():
+                task.cancel()
+        if dispatch_tasks:
+            await asyncio.gather(*dispatch_tasks, return_exceptions=True)
+        self._dispatch_tasks.clear()
+        self.generated_content.clear()
+        self.msg_id_map.clear()
+        self._stream_last_content.clear()
+        self.stream_sessions.clear()
+
    async def _handle_post_initial_response(self, msg_json: dict[str, Any], nonce: str) -> tuple[Response, int]:
        """处理企业微信首次推送的消息，返回 stream_id 并开启流水线。

@@ -1747,7 +1828,8 @@ class WecomBotClient:
                await self.logger.error(traceback.format_exc())
            else:
                if is_new:
-                    asyncio.create_task(self._dispatch_event(event))
+                    if not self._start_dispatch_task(event):
+                        await self.logger.warning('WeCom webhook dispatch capacity reached; dropping message')

        payload = self._build_stream_payload(session.stream_id, '', False, feedback_id)
        return await self._encrypt_and_reply(payload, nonce)
@@ -1870,7 +1952,10 @@ class WecomBotClient:
    async def _handle_post_callback(self, req) -> tuple[Response, int] | Response:
        """处理企业微信的 POST 回调请求。"""

-        self.stream_sessions.cleanup()
+        for expired_msg_id in self.stream_sessions.cleanup():
+            self.generated_content.pop(expired_msg_id, None)
+            self._stream_last_content.pop(expired_msg_id, None)
+            self.msg_id_map.pop(expired_msg_id, None)

        msg_signature = unquote(req.args.get('msg_signature', ''))
        timestamp = unquote(req.args.get('timestamp', ''))
@@ -1883,12 +1968,18 @@ class WecomBotClient:
            return Response('Bad Request', status=400)

        xml_post_data = f'<xml><Encrypt><![CDATA[{encrypted_msg}]]></Encrypt></xml>'
-        ret, decrypted_xml = self.wxcpt.DecryptMsg(xml_post_data, msg_signature, timestamp, nonce)
+        ret, decrypted_xml = await asyncio.to_thread(
+            self.wxcpt.DecryptMsg,
+            xml_post_data,
+            msg_signature,
+            timestamp,
+            nonce,
+        )
        if ret != 0:
            await self.logger.error('解密失败')
            return Response('解密失败', status=400)

-        msg_json = json.loads(decrypted_xml)
+        msg_json = await asyncio.to_thread(json.loads, decrypted_xml)

        event_type = extract_wecom_event_type(msg_json)

@@ -2014,6 +2105,8 @@ class WecomBotClient:
                self.msg_id_map[message_id] += 1
                return
            self.msg_id_map[message_id] = 1
+            while len(self.msg_id_map) > _CLIENT_TRANSIENT_CACHE_MAX:
+                self.msg_id_map.pop(next(iter(self.msg_id_map)), None)
            msg_type = event.type
            if msg_type in self._message_handlers:
                for handler in self._message_handlers[msg_type]:
@@ -2047,6 +2140,8 @@ class WecomBotClient:
            next_content = previous_content
        else:
            next_content = previous_content + content if previous_content else content
+        if len(next_content) > _MAX_STREAM_CONTENT_CHARS:
+            next_content = next_content[-_MAX_STREAM_CONTENT_CHARS:]

        if not is_final and next_content == previous_content:
            return True
@@ -2096,7 +2191,9 @@ class WecomBotClient:
        """
        handled = await self.push_stream_chunk(msg_id, content, is_final=True)
        if not handled:
-            self.generated_content[msg_id] = content
+            self.generated_content[msg_id] = content[-_MAX_STREAM_CONTENT_CHARS:]
+            while len(self.generated_content) > _CLIENT_TRANSIENT_CACHE_MAX:
+                self.generated_content.pop(next(iter(self.generated_content)), None)

    def on_message(self, msg_type: str):
        def decorator(func: Callable[[wecombotevent.WecomBotEvent], None]):
@@ -2119,7 +2216,7 @@ class WecomBotClient:
    async def download_url_to_base64(self, download_url, encoding_aes_key):
        data, _filename = await download_encrypted_file(download_url, encoding_aes_key, self.logger)
        if data:
-            return _bytes_to_data_uri(data)
+            return await asyncio.to_thread(_bytes_to_data_uri, data)
        return None

    async def run_task(self, host: str, port: int, *args, **kwargs):
@@ -47,6 +47,17 @@ CMD_RESPOND_WELCOME = 'aibot_respond_welcome_msg'
 CMD_RESPOND_UPDATE = 'aibot_respond_update_msg'
 CMD_SEND_MSG = 'aibot_send_msg'

+_DEDUP_CACHE_MAX = 4096
+_STREAM_CACHE_MAX = 1024
+_FEEDBACK_CACHE_MAX = 4096
+_PENDING_FORM_MAX = 1024
+_PENDING_FORM_TTL_SECONDS = 1800
+_MAX_STREAM_CONTENT_CHARS = 200000
+_MAX_CALLBACK_TASKS = 100
+_MAX_REPLY_WORKERS = 100
+_MAX_REPLY_QUEUE_SIZE = 100
+_MAX_PENDING_ACKS = 256
+

 def _generate_req_id(prefix: str) -> str:
    """Generate a unique request ID in the format: {prefix}_{timestamp}_{random}."""
@@ -106,6 +117,7 @@ class WecomBotWsClient:
        # Per-req_id serial reply queues
        self._reply_queues: dict[str, asyncio.Queue] = {}
        self._reply_workers: dict[str, asyncio.Task] = {}
+        self._callback_tasks: set[asyncio.Task] = set()
        self._reply_ack_timeout = 5.0

        # Stream ID tracking for WebSocket mode
@@ -135,6 +147,31 @@ class WecomBotWsClient:
        # `set_card_source` from the adapter after reading config.
        self.card_source: Optional[dict] = None

+    @staticmethod
+    def _cap_mapping(mapping: dict, max_entries: int) -> None:
+        while len(mapping) > max_entries:
+            mapping.pop(next(iter(mapping)), None)
+
+    def _prune_stream_state(self) -> None:
+        while len(self._stream_sessions) > _STREAM_CACHE_MAX:
+            msg_id = next(iter(self._stream_sessions))
+            self._stream_sessions.pop(msg_id, None)
+            self._stream_ids.pop(msg_id, None)
+            self._stream_last_content.pop(msg_id, None)
+            task_id = self._task_id_by_msg.pop(msg_id, None)
+            if task_id:
+                self._pending_forms_by_task.pop(task_id, None)
+
+    def _prune_pending_forms(self) -> None:
+        cutoff = time.monotonic() - _PENDING_FORM_TTL_SECONDS
+        for task_id, pending in tuple(self._pending_forms_by_task.items()):
+            if float(pending.get('created_at', 0.0)) <= cutoff:
+                self._drop_pending_form_task(task_id, pending)
+        while len(self._pending_forms_by_task) > _PENDING_FORM_MAX:
+            task_id = next(iter(self._pending_forms_by_task))
+            pending = self._pending_forms_by_task.get(task_id, {})
+            self._drop_pending_form_task(task_id, pending)
+
    # ── Public API ──────────────────────────────────────────────────

    async def connect(self):
@@ -173,17 +210,40 @@ class WecomBotWsClient:
    async def disconnect(self):
        """Gracefully disconnect from the WebSocket server."""
        self._running = False
+        heartbeat_tasks = []
        if self._heartbeat_task and not self._heartbeat_task.done():
            self._heartbeat_task.cancel()
-        for task in self._reply_workers.values():
+            heartbeat_tasks.append(self._heartbeat_task)
+        reply_workers = list(self._reply_workers.values())
+        for task in reply_workers:
            if not task.done():
                task.cancel()
+        callback_tasks = list(self._callback_tasks)
+        for task in callback_tasks:
+            if not task.done():
+                task.cancel()
+        shutdown_tasks = [*heartbeat_tasks, *reply_workers, *callback_tasks]
+        if shutdown_tasks:
+            await asyncio.gather(*shutdown_tasks, return_exceptions=True)
+        self._clear_pending_acks('Connection closed')
        if self._ws and not self._ws.closed:
            await self._ws.close()
        self._ws = None
        if self._session and not self._session.closed:
            await self._session.close()
        self._session = None
+        self._heartbeat_task = None
+        self._reply_queues.clear()
+        self._reply_workers.clear()
+        self._callback_tasks.clear()
+        self._stream_ids.clear()
+        self._stream_last_content.clear()
+        self._stream_sessions.clear()
+        self._feedback_sessions.clear()
+        self._msg_feedback_ids.clear()
+        self._pending_forms_by_task.clear()
+        self._task_id_by_msg.clear()
+        self._msg_id_map.clear()

    def on_message(self, msg_type: str) -> Callable:
        """Decorator to register a message handler.
@@ -366,8 +426,10 @@ class WecomBotWsClient:
            'chat_id': session_info.get('chat_id', ''),
            'stream_id': stream_id,
            'req_id': req_id,
+            'created_at': time.monotonic(),
        }
        self._task_id_by_msg[msg_id] = task_id
+        self._prune_pending_forms()

        card_payload = build_human_input_template_card_payload(
            form_data,
@@ -458,6 +520,8 @@ class WecomBotWsClient:
                next_content = previous_content
            else:
                next_content = previous_content + content if previous_content else content
+            if len(next_content) > _MAX_STREAM_CONTENT_CHARS:
+                next_content = next_content[-_MAX_STREAM_CONTENT_CHARS:]

            # Skip sending if content hasn't changed (e.g. during tool call argument streaming)
            if not is_final and next_content == previous_content:
@@ -485,6 +549,8 @@ class WecomBotWsClient:
                session_info = self._stream_sessions.get(msg_id)
                if session_info:
                    self._feedback_sessions[feedback_id] = session_info
+                    self._cap_mapping(self._feedback_sessions, _FEEDBACK_CACHE_MAX)
+                self._cap_mapping(self._msg_feedback_ids, _FEEDBACK_CACHE_MAX)

            # WeCom replaces the displayed stream content on each refresh, so
            # every frame must contain the complete snapshot, not only a delta.
@@ -516,7 +582,7 @@ class WecomBotWsClient:

        self._session = aiohttp.ClientSession()
        try:
-            self._ws = await self._session.ws_connect(self.ws_url)
+            self._ws = await self._session.ws_connect(self.ws_url, max_msg_size=1024 * 1024)
            self._missed_pong_count = 0
            self._reconnect_attempts = 0
            await self.logger.info('WebSocket connected, sending auth...')
@@ -539,6 +605,8 @@ class WecomBotWsClient:
            finally:
                if self._heartbeat_task and not self._heartbeat_task.done():
                    self._heartbeat_task.cancel()
+                    await asyncio.gather(self._heartbeat_task, return_exceptions=True)
+                self._heartbeat_task = None
                self._clear_pending_acks('Connection closed')
        finally:
            if self._ws and not self._ws.closed:
@@ -565,7 +633,7 @@ class WecomBotWsClient:
        try:
            msg = await asyncio.wait_for(self._ws.receive(), timeout=10.0)
            if msg.type in (aiohttp.WSMsgType.TEXT,):
-                frame = json.loads(msg.data)
+                frame = await asyncio.to_thread(json.loads, msg.data)
                req_id = frame.get('headers', {}).get('req_id', '')
                if req_id.startswith(CMD_SUBSCRIBE) and frame.get('errcode') == 0:
                    return True
@@ -614,7 +682,7 @@ class WecomBotWsClient:
                break
            if msg.type == aiohttp.WSMsgType.TEXT:
                try:
-                    frame = json.loads(msg.data)
+                    frame = await asyncio.to_thread(json.loads, msg.data)
                    await self._handle_frame(frame)
                except json.JSONDecodeError:
                    await self.logger.error(f'Failed to parse WebSocket message: {str(msg.data)[:200]}')
@@ -622,7 +690,7 @@ class WecomBotWsClient:
                    await self.logger.error(f'Error handling frame: {traceback.format_exc()}')
            elif msg.type == aiohttp.WSMsgType.BINARY:
                try:
-                    frame = json.loads(msg.data)
+                    frame = await asyncio.to_thread(json.loads, msg.data)
                    await self._handle_frame(frame)
                except Exception:
                    await self.logger.error(f'Error handling binary frame: {traceback.format_exc()}')
@@ -638,12 +706,14 @@ class WecomBotWsClient:

        # Message push
        if cmd == CMD_MSG_CALLBACK:
-            asyncio.create_task(self._handle_message_callback(frame))
+            if not self._start_callback_task(self._handle_message_callback(frame)):
+                await self.logger.warning('WeCom WebSocket callback capacity reached; dropping message')
            return

        # Event push
        if cmd == CMD_EVENT_CALLBACK:
-            asyncio.create_task(self._handle_event_callback(frame))
+            if not self._start_callback_task(self._handle_event_callback(frame)):
+                await self.logger.warning('WeCom WebSocket callback capacity reached; dropping event')
            return

        # No cmd → response/ACK frame, dispatch by req_id prefix
@@ -665,6 +735,27 @@ class WecomBotWsClient:
        # Unknown frame
        await self.logger.warning(f'Unknown frame: {_frame_snippet(frame)}')

+    def _start_callback_task(self, coro) -> bool:
+        """Start one bounded inbound frame callback."""
+
+        for task in tuple(self._callback_tasks):
+            if task.done():
+                self._callback_tasks.discard(task)
+        if len(self._callback_tasks) >= _MAX_CALLBACK_TASKS:
+            coro.close()
+            return False
+
+        task = asyncio.create_task(coro)
+        self._callback_tasks.add(task)
+
+        def done(done_task: asyncio.Task) -> None:
+            self._callback_tasks.discard(done_task)
+            if not done_task.cancelled():
+                done_task.exception()
+
+        task.add_done_callback(done)
+        return True
+
    async def _handle_message_callback(self, frame: dict):
        """Handle an incoming message callback frame."""
        try:
@@ -697,6 +788,7 @@ class WecomBotWsClient:
                    'chat_id': message_data.get('chatid', ''),
                    'chat_type': message_data.get('type', 'single'),
                }
+                self._prune_stream_state()
            message_data['stream_id'] = stream_id
            message_data['req_id'] = req_id

@@ -748,7 +840,7 @@ class WecomBotWsClient:
                )

                # Look up session by feedback_id
-                session_info = self._feedback_sessions.get(feedback_id)
+                session_info = self._feedback_sessions.pop(feedback_id, None)
                session = None
                if session_info:
                    session = StreamSession(
@@ -806,6 +898,10 @@ class WecomBotWsClient:
        if pending is None:
            await self.logger.warning(f'No pending_form found for task_id={task_id} (ws); card event ignored')
            return
+        if time.monotonic() - float(pending.get('created_at', 0.0)) > _PENDING_FORM_TTL_SECONDS:
+            self._drop_pending_form_task(task_id, pending)
+            await self.logger.warning(f'Pending form expired for task_id={task_id} (ws)')
+            return

        req_id_for_update = frame.get('headers', {}).get('req_id', '')
        form_data = pending.get('form_data', {}) or {}
@@ -868,6 +964,7 @@ class WecomBotWsClient:
                self._msg_id_map[message_id] += 1
                return
            self._msg_id_map[message_id] = 1
+            self._cap_mapping(self._msg_id_map, _DEDUP_CACHE_MAX)

            msg_type = event.type
            if msg_type in self._message_handlers:
@@ -899,40 +996,61 @@ class WecomBotWsClient:

        # Ensure serial delivery per req_id
        if req_id not in self._reply_queues:
-            self._reply_queues[req_id] = asyncio.Queue()
+            if len(self._reply_queues) >= _MAX_REPLY_WORKERS:
+                await self.logger.warning('WeCom WebSocket reply worker capacity reached; dropping reply')
+                return None
+            self._reply_queues[req_id] = asyncio.Queue(maxsize=_MAX_REPLY_QUEUE_SIZE)
            self._reply_workers[req_id] = asyncio.create_task(self._reply_queue_worker(req_id))

        future: asyncio.Future = asyncio.get_event_loop().create_future()
-        await self._reply_queues[req_id].put((frame, future))
+        try:
+            self._reply_queues[req_id].put_nowait((frame, future))
+        except asyncio.QueueFull:
+            await self.logger.warning(f'WeCom WebSocket reply queue full for req_id={req_id}; dropping reply')
+            return None
        return await future

    async def _reply_queue_worker(self, req_id: str):
        """Process reply queue items serially for a given req_id."""
        queue = self._reply_queues[req_id]
+        current_future: asyncio.Future | None = None
        try:
            while self._running:
                try:
-                    frame, future = await asyncio.wait_for(queue.get(), timeout=60.0)
+                    frame, current_future = await asyncio.wait_for(queue.get(), timeout=60.0)
                except asyncio.TimeoutError:
                    # Queue idle, clean up worker
                    break

                try:
                    ack = await self._send_and_wait_ack(frame)
-                    if not future.done():
-                        future.set_result(ack)
+                    if not current_future.done():
+                        current_future.set_result(ack)
                except Exception as e:
-                    if not future.done():
-                        future.set_exception(e)
+                    if not current_future.done():
+                        current_future.set_exception(e)
+                finally:
+                    current_future = None
        except asyncio.CancelledError:
-            pass
+            if current_future is not None and not current_future.done():
+                current_future.set_exception(ConnectionError('Connection closed'))
        finally:
+            while True:
+                try:
+                    _, future = queue.get_nowait()
+                except asyncio.QueueEmpty:
+                    break
+                if not future.done():
+                    future.set_exception(ConnectionError('Reply worker stopped'))
            self._reply_queues.pop(req_id, None)
            self._reply_workers.pop(req_id, None)

    async def _send_and_wait_ack(self, frame: dict) -> Optional[dict]:
        """Send a frame and wait for the corresponding ACK."""
        req_id = frame['headers']['req_id']
+        if len(self._pending_acks) >= _MAX_PENDING_ACKS:
+            await self.logger.warning('WeCom WebSocket pending ACK capacity reached; dropping frame')
+            return None
        ack_future: asyncio.Future = asyncio.get_event_loop().create_future()
        self._pending_acks[req_id] = ack_future

@@ -1,16 +1,82 @@
 from quart import request
 from .WXBizMsgCrypt3 import WXBizMsgCrypt
+import asyncio
 import base64
 import binascii
+import contextvars
+import functools
 import httpx
+import os
 import traceback
 from urllib.parse import quote
 from quart import Quart
 import xml.etree.ElementTree as ET
+from contextlib import asynccontextmanager
 from typing import Callable, Dict, Any
 from .wecomevent import WecomEvent
 import langbot_plugin.api.entities.builtin.platform.message as platform_message
 import aiofiles
+from langbot.pkg.utils import httpclient
+
+_MAX_MEDIA_BYTES = 10 * 1024 * 1024
+_MAX_CALLBACK_BODY_BYTES = 1024 * 1024
+_EXTENDED_HTTP_TIMEOUT_SECONDS = 120
+
+
+async def _read_httpx_media_limited(response: httpx.Response) -> bytes:
+    content_length = response.headers.get('Content-Length')
+    if content_length is not None:
+        try:
+            if int(content_length) > _MAX_MEDIA_BYTES:
+                raise ValueError('WeCom media exceeds the size limit')
+        except (TypeError, ValueError) as exc:
+            if 'exceeds' in str(exc):
+                raise
+    content = bytearray()
+    async for chunk in response.aiter_bytes():
+        content.extend(chunk)
+        if len(content) > _MAX_MEDIA_BYTES:
+            raise ValueError('WeCom media exceeds the size limit')
+    return bytes(content)
+
+
+async def _read_local_media_limited(path: str) -> bytes:
+    if await asyncio.to_thread(os.path.getsize, path) > _MAX_MEDIA_BYTES:
+        raise ValueError('WeCom media exceeds the size limit')
+    async with aiofiles.open(path, 'rb') as file:
+        content = await file.read(_MAX_MEDIA_BYTES + 1)
+    if len(content) > _MAX_MEDIA_BYTES:
+        raise ValueError('WeCom media exceeds the size limit')
+    return content
+
+
+async def _decode_media_base64_limited(value: str) -> bytes:
+    max_encoded_chars = 4 * ((_MAX_MEDIA_BYTES + 2) // 3) + 4
+    if len(value) > max_encoded_chars:
+        raise ValueError('WeCom media exceeds the size limit')
+    content = await asyncio.to_thread(base64.b64decode, value)
+    if len(content) > _MAX_MEDIA_BYTES:
+        raise ValueError('WeCom media exceeds the size limit')
+    return content
+
+
+def _bounded_token_retry(method):
+    """Allow one token-refresh retry without unbounded async recursion."""
+
+    depth = contextvars.ContextVar(f'{method.__name__}_token_retry_depth', default=0)
+
+    @functools.wraps(method)
+    async def wrapped(*args, **kwargs):
+        current_depth = depth.get()
+        if current_depth >= 2:
+            raise RuntimeError(f'{method.__name__} exceeded the token refresh retry limit')
+        token = depth.set(current_depth + 1)
+        try:
+            return await method(*args, **kwargs)
+        finally:
+            depth.reset(token)
+
+    return wrapped


 class WecomClient:
@@ -36,6 +102,7 @@ class WecomClient:
        self.logger = logger
        self.unified_mode = unified_mode
        self.app = Quart(__name__)
+        self.app.config['MAX_CONTENT_LENGTH'] = _MAX_CALLBACK_BODY_BYTES

        # 只有在非统一模式下才注册独立路由
        if not self.unified_mode:
@@ -49,6 +116,29 @@ class WecomClient:
        self._message_handlers = {
            'example': [],
        }
+        self._http_clients: dict[bool, httpx.AsyncClient] = {}
+
+    @asynccontextmanager
+    async def _http_client_context(self, *, unbounded_timeout: bool = False):
+        client = self._http_clients.get(unbounded_timeout)
+        if client is None or client.is_closed:
+            response_hooks = httpclient.httpx_response_limit_hooks()
+            client = (
+                httpx.AsyncClient(
+                    timeout=_EXTENDED_HTTP_TIMEOUT_SECONDS,
+                    event_hooks=response_hooks,
+                )
+                if unbounded_timeout
+                else httpx.AsyncClient(event_hooks=response_hooks)
+            )
+            self._http_clients[unbounded_timeout] = client
+        yield client
+
+    async def close(self) -> None:
+        clients = list(self._http_clients.values())
+        self._http_clients.clear()
+        if clients:
+            await asyncio.gather(*(client.aclose() for client in clients), return_exceptions=True)

    # access——token操作
    async def check_access_token(self):
@@ -59,15 +149,16 @@ class WecomClient:

    async def get_access_token(self, secret):
        url = f'{self.base_url}/gettoken?corpid={self.corpid}&corpsecret={secret}'
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            response = await client.get(url)
-            data = response.json()
+            data = await httpclient.parse_json_response(response)
            if 'access_token' in data:
                return data['access_token']
            else:
-                await self.logger.error(f'获取accesstoken失败:{response.json()}')
+                await self.logger.error(f'获取accesstoken失败:{data}')
                raise Exception(f'未获取access token: {data}')

+    @_bounded_token_retry
    async def get_user_info(self, userid: str) -> dict:
        """
        Get user information by user ID using the application secret.
@@ -82,9 +173,9 @@ class WecomClient:
            self.access_token = await self.get_access_token(self.secret)

        url = self.base_url + '/user/get?access_token=' + self.access_token + '&userid=' + quote(userid)
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            response = await client.get(url)
-            data = response.json()
+            data = await httpclient.parse_json_response(response)
            if data.get('errcode') == 40014 or data.get('errcode') == 42001:
                self.access_token = await self.get_access_token(self.secret)
                return await self.get_user_info(userid)
@@ -98,13 +189,13 @@ class WecomClient:
            self.access_token_for_contacts = await self.get_access_token(self.secret_for_contacts)

        url = self.base_url + '/user/list_id?access_token=' + self.access_token_for_contacts
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            params = {
                'cursor': '',
                'limit': 10000,
            }
            response = await client.post(url, json=params)
-            data = response.json()
+            data = await httpclient.parse_json_response(response)
            if data['errcode'] == 0:
                dept_users = data['dept_user']
                userid = []
@@ -121,7 +212,7 @@ class WecomClient:
            url = self.base_url + '/message/send?access_token=' + self.access_token_for_contacts
            user_ids = await self.get_users()
            user_ids_string = '|'.join(user_ids)
-            async with httpx.AsyncClient() as client:
+            async with self._http_client_context() as client:
                params = {
                    'touser': user_ids_string,
                    'msgtype': 'text',
@@ -135,16 +226,17 @@ class WecomClient:
                    'duplicate_check_interval': 1800,
                }
                response = await client.post(url, json=params)
-                data = response.json()
+                data = await httpclient.parse_json_response(response)
                if data['errcode'] != 0:
                    raise Exception('Failed to send message: ' + str(data))

+    @_bounded_token_retry
    async def send_image(self, user_id: str, agent_id: int, media_id: str):
        if not await self.check_access_token():
            self.access_token = await self.get_access_token(self.secret)

        url = self.base_url + '/message/send?access_token=' + self.access_token
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            params = {
                'touser': user_id,
                'msgtype': 'image',
@@ -158,7 +250,7 @@ class WecomClient:
                'duplicate_check_interval': 1800,
            }
            response = await client.post(url, json=params)
-            data = response.json()
+            data = await httpclient.parse_json_response(response)
            if data['errcode'] == 40014 or data['errcode'] == 42001:
                self.access_token = await self.get_access_token(self.secret)
                return await self.send_image(user_id, agent_id, media_id)
@@ -166,11 +258,12 @@ class WecomClient:
                await self.logger.error(f'发送图片失败:{data}')
                raise Exception('Failed to send image: ' + str(data))

+    @_bounded_token_retry
    async def send_voice(self, user_id: str, agent_id: int, media_id: str):
        if not await self.check_access_token():
            self.access_token = await self.get_access_token(self.secret)
        url = self.base_url + '/message/send?access_token=' + self.access_token
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            params = {
                'touser': user_id,
                'msgtype': 'voice',
@@ -184,7 +277,7 @@ class WecomClient:
                'duplicate_check_interval': 1800,
            }
            response = await client.post(url, json=params)
-            data = response.json()
+            data = await httpclient.parse_json_response(response)
            if data['errcode'] == 40014 or data['errcode'] == 42001:
                self.access_token = await self.get_access_token(self.secret)
                return await self.send_voice(user_id, agent_id, media_id)
@@ -192,11 +285,12 @@ class WecomClient:
                await self.logger.error(f'发送语音失败:{data}')
                raise Exception('Failed to send voice: ' + str(data))

+    @_bounded_token_retry
    async def send_file(self, user_id: str, agent_id: int, media_id: str):
        if not await self.check_access_token():
            self.access_token = await self.get_access_token(self.secret)
        url = self.base_url + '/message/send?access_token=' + self.access_token
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            params = {
                'touser': user_id,
                'msgtype': 'file',
@@ -210,7 +304,7 @@ class WecomClient:
                'duplicate_check_interval': 1800,
            }
            response = await client.post(url, json=params)
-            data = response.json()
+            data = await httpclient.parse_json_response(response)
            if data['errcode'] == 40014 or data['errcode'] == 42001:
                self.access_token = await self.get_access_token(self.secret)
                return await self.send_file(user_id, agent_id, media_id)
@@ -218,12 +312,13 @@ class WecomClient:
                await self.logger.error(f'发送文件失败:{data}')
                raise Exception('Failed to send file: ' + str(data))

+    @_bounded_token_retry
    async def send_private_msg(self, user_id: str, agent_id: int, content: str):
        if not await self.check_access_token():
            self.access_token = await self.get_access_token(self.secret)

        url = self.base_url + '/message/send?access_token=' + self.access_token
-        async with httpx.AsyncClient(timeout=None) as client:
+        async with self._http_client_context(unbounded_timeout=True) as client:
            params = {
                'touser': user_id,
                'msgtype': 'text',
@@ -237,7 +332,7 @@ class WecomClient:
                'duplicate_check_interval': 1800,
            }
            response = await client.post(url, json=params)
-            data = response.json()
+            data = await httpclient.parse_json_response(response)
            if data['errcode'] == 40014 or data['errcode'] == 42001:
                self.access_token = await self.get_access_token(self.secret)
                return await self.send_private_msg(user_id, agent_id, content)
@@ -283,7 +378,15 @@ class WecomClient:

            elif req.method == 'POST':
                encrypt_msg = await req.data
-                ret, xml_msg = wxcpt.DecryptMsg(encrypt_msg, msg_signature, timestamp, nonce)
+                if len(encrypt_msg) > _MAX_CALLBACK_BODY_BYTES:
+                    raise ValueError('WeCom callback body exceeds the size limit')
+                ret, xml_msg = await asyncio.to_thread(
+                    wxcpt.DecryptMsg,
+                    encrypt_msg,
+                    msg_signature,
+                    timestamp,
+                    nonce,
+                )
                if ret != 0:
                    await self.logger.error('消息解密失败')
                    raise Exception(f'消息解密失败，错误码: {ret}')
@@ -332,7 +435,7 @@ class WecomClient:
        """
        解析微信返回的 XML 消息并转换为字典。
        """
-        root = ET.fromstring(xml_msg)
+        root = await asyncio.to_thread(ET.fromstring, xml_msg)
        message_data = {
            'ToUserName': root.find('ToUserName').text,
            'FromUserName': root.find('FromUserName').text,
@@ -366,6 +469,7 @@ class WecomClient:
                return ext
        return 'jpg'  # 默认返回jpg

+    @_bounded_token_retry
    async def upload_image_to_work(self, image: platform_message.Image):
        """
        获取 media_id
@@ -379,9 +483,8 @@ class WecomClient:

        # 获取文件的二进制数据
        if image.path:
-            async with aiofiles.open(image.path, 'rb') as f:
-                file_bytes = await f.read()
-                file_name = image.path.split('/')[-1]
+            file_bytes = await _read_local_media_limited(image.path)
+            file_name = image.path.split('/')[-1]
        elif image.url:
            file_bytes = await self.download_media_to_bytes(image.url)
            file_name = image.url.split('/')[-1]
@@ -392,7 +495,7 @@ class WecomClient:
                    base64_data = base64_data.split(',', 1)[1]
                padding = 4 - (len(base64_data) % 4) if len(base64_data) % 4 else 0
                padded_base64 = base64_data + '=' * padding
-                file_bytes = base64.b64decode(padded_base64)
+                file_bytes = await _decode_media_base64_limited(padded_base64)
            except binascii.Error as e:
                raise ValueError(f'Invalid base64 string: {str(e)}')
        else:
@@ -400,6 +503,8 @@ class WecomClient:
            raise ValueError('image对象出错')

        # 设置 multipart/form-data 格式的文件
+        if len(file_bytes) > _MAX_MEDIA_BYTES:
+            raise ValueError('WeCom media exceeds the size limit')
        boundary = '-------------------------acebdf13572468'
        headers = {'Content-Type': f'multipart/form-data; boundary={boundary}'}
        body = (
@@ -413,9 +518,9 @@ class WecomClient:
        )

        # 上传文件
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            response = await client.post(url, headers=headers, content=body)
-            data = response.json()
+            data = await httpclient.parse_json_response(response)
            if data['errcode'] == 40014 or data['errcode'] == 42001:
                self.access_token = await self.get_access_token(self.secret)
                media_id = await self.upload_image_to_work(image)
@@ -426,6 +531,7 @@ class WecomClient:
            media_id = data.get('media_id')
            return media_id

+    @_bounded_token_retry
    async def upload_voice_to_work(self, voice: platform_message.Voice):
        """
        上传语音文件到企业微信
@@ -437,9 +543,8 @@ class WecomClient:
        file_name = 'voice.mp3'

        if voice.path:
-            async with aiofiles.open(voice.path, 'rb') as f:
-                file_bytes = await f.read()
-                file_name = voice.path.split('/')[-1]
+            file_bytes = await _read_local_media_limited(voice.path)
+            file_name = voice.path.split('/')[-1]
        elif voice.url:
            file_bytes = await self.download_media_to_bytes(voice.url)
            file_name = voice.url.split('/')[-1]
@@ -450,13 +555,15 @@ class WecomClient:
                    base64_data = base64_data.split(',', 1)[1]
                padding = 4 - (len(base64_data) % 4) if len(base64_data) % 4 else 0
                padded_base64 = base64_data + '=' * padding
-                file_bytes = base64.b64decode(padded_base64)
+                file_bytes = await _decode_media_base64_limited(padded_base64)
            except binascii.Error as e:
                raise ValueError(f'Invalid base64 string: {str(e)}')
        else:
            await self.logger.error('Voice对象出错')
            raise ValueError('voice对象出错')

+        if len(file_bytes) > _MAX_MEDIA_BYTES:
+            raise ValueError('WeCom media exceeds the size limit')
        boundary = '-------------------------acebdf13572468'
        headers = {'Content-Type': f'multipart/form-data; boundary={boundary}'}
        body = (
@@ -470,9 +577,9 @@ class WecomClient:
        )

        # print(body)
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            response = await client.post(url, headers=headers, content=body)
-            data = response.json()
+            data = await httpclient.parse_json_response(response)
            if data['errcode'] == 40014 or data['errcode'] == 42001:
                self.access_token = await self.get_access_token(self.secret)
                media_id = await self.upload_voice_to_work(voice)
@@ -482,6 +589,7 @@ class WecomClient:
            media_id = data.get('media_id')
            return media_id

+    @_bounded_token_retry
    async def upload_file_to_work(self, file: platform_message.File):
        """
        上传文件到企业微信
@@ -492,9 +600,8 @@ class WecomClient:
        file_bytes = None
        file_name = 'file.txt'
        if file.path:
-            async with aiofiles.open(file.path, 'rb') as f:
-                file_bytes = await f.read()
-                file_name = file.path.split('/')[-1]
+            file_bytes = await _read_local_media_limited(file.path)
+            file_name = file.path.split('/')[-1]
        elif file.url:
            file_bytes = await self.download_media_to_bytes(file.url)
            file_name = file.url.split('/')[-1]
@@ -505,12 +612,14 @@ class WecomClient:
                    base64_data = base64_data.split(',', 1)[1]
                padding = 4 - (len(base64_data) % 4) if len(base64_data) % 4 else 0
                padded_base64 = base64_data + '=' * padding
-                file_bytes = base64.b64decode(padded_base64)
+                file_bytes = await _decode_media_base64_limited(padded_base64)
            except binascii.Error as e:
                raise ValueError(f'Invalid base64 string: {str(e)}')
        else:
            await self.logger.error('File对象出错')
            raise ValueError('file对象出错')
+        if len(file_bytes) > _MAX_MEDIA_BYTES:
+            raise ValueError('WeCom media exceeds the size limit')
        boundary = '-------------------------acebdf13572468'
        headers = {'Content-Type': f'multipart/form-data; boundary={boundary}'}
        body = (
@@ -522,9 +631,9 @@ class WecomClient:
            + file_bytes
            + f'\r\n--{boundary}--\r\n'.encode('utf-8')
        )
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            response = await client.post(url, headers=headers, content=body)
-            data = response.json()
+            data = await httpclient.parse_json_response(response)
            if data['errcode'] == 40014 or data['errcode'] == 42001:
                self.access_token = await self.get_access_token(self.secret)
                media_id = await self.upload_file_to_work(file)
@@ -535,10 +644,10 @@ class WecomClient:
            return media_id

    async def download_media_to_bytes(self, url: str) -> bytes:
-        async with httpx.AsyncClient() as client:
-            response = await client.get(url)
-            response.raise_for_status()
-            return response.content
+        async with self._http_client_context() as client:
+            async with client.stream('GET', url) as response:
+                response.raise_for_status()
+                return await _read_httpx_media_limited(response)

    # 进行media_id的获取
    async def get_media_id(self, media: platform_message.Image | platform_message.Voice | platform_message.File):
@@ -1,8 +1,13 @@
 from quart import request
 from ..wecom_api.WXBizMsgCrypt3 import WXBizMsgCrypt
+import asyncio
 import base64
 import binascii
+import contextvars
+import functools
 import httpx
+import json
+import os
 import traceback
 from quart import Quart
 import xml.etree.ElementTree as ET
@@ -11,9 +16,72 @@ from .wecomcsevent import WecomCSEvent
 import langbot_plugin.api.entities.builtin.platform.message as platform_message
 import aiofiles
 import time
+from contextlib import asynccontextmanager
+from langbot.pkg.utils import httpclient
+
+_MAX_MEDIA_BYTES = 10 * 1024 * 1024
+_MAX_CALLBACK_BODY_BYTES = 1024 * 1024
+
+
+async def _read_httpx_media_limited(response: httpx.Response) -> bytes:
+    content_length = response.headers.get('Content-Length')
+    if content_length is not None:
+        try:
+            if int(content_length) > _MAX_MEDIA_BYTES:
+                raise ValueError('WeCom customer-service media exceeds the size limit')
+        except (TypeError, ValueError) as exc:
+            if 'exceeds' in str(exc):
+                raise
+    content = bytearray()
+    async for chunk in response.aiter_bytes():
+        content.extend(chunk)
+        if len(content) > _MAX_MEDIA_BYTES:
+            raise ValueError('WeCom customer-service media exceeds the size limit')
+    return bytes(content)
+
+
+async def _read_local_media_limited(path: str) -> bytes:
+    if await asyncio.to_thread(os.path.getsize, path) > _MAX_MEDIA_BYTES:
+        raise ValueError('WeCom customer-service media exceeds the size limit')
+    async with aiofiles.open(path, 'rb') as file:
+        content = await file.read(_MAX_MEDIA_BYTES + 1)
+    if len(content) > _MAX_MEDIA_BYTES:
+        raise ValueError('WeCom customer-service media exceeds the size limit')
+    return content
+
+
+async def _decode_media_base64_limited(value: str) -> bytes:
+    max_encoded_chars = 4 * ((_MAX_MEDIA_BYTES + 2) // 3) + 4
+    if len(value) > max_encoded_chars:
+        raise ValueError('WeCom customer-service media exceeds the size limit')
+    content = await asyncio.to_thread(base64.b64decode, value)
+    if len(content) > _MAX_MEDIA_BYTES:
+        raise ValueError('WeCom customer-service media exceeds the size limit')
+    return content
+
+
+def _bounded_token_retry(method):
+    """Allow one token-refresh retry without unbounded async recursion."""
+
+    depth = contextvars.ContextVar(f'{method.__name__}_token_retry_depth', default=0)
+
+    @functools.wraps(method)
+    async def wrapped(*args, **kwargs):
+        current_depth = depth.get()
+        if current_depth >= 2:
+            raise RuntimeError(f'{method.__name__} exceeded the token refresh retry limit')
+        token = depth.set(current_depth + 1)
+        try:
+            return await method(*args, **kwargs)
+        finally:
+            depth.reset(token)
+
+    return wrapped


 class WecomCSClient:
+    _CUSTOMER_CACHE_MAX = 4096
+
    def __init__(
        self,
        corpid: str,
@@ -34,10 +102,12 @@ class WecomCSClient:
        self.logger = logger
        self.unified_mode = unified_mode
        self.app = Quart(__name__)
+        self.app.config['MAX_CONTENT_LENGTH'] = _MAX_CALLBACK_BODY_BYTES

        # Customer info cache: {external_userid: (info_dict, timestamp)}
        self._customer_cache: dict[str, tuple[dict, float]] = {}
        self._cache_ttl = 60  # Cache TTL in seconds (1 minute)
+        self._customer_cache_cleanup_at = 0.0

        # 只有在非统一模式下才注册独立路由
        if not self.unified_mode:
@@ -48,29 +118,40 @@ class WecomCSClient:
        self._message_handlers = {
            'example': [],
        }
+        self._http_client: httpx.AsyncClient | None = None

+    @asynccontextmanager
+    async def _http_client_context(self):
+        if self._http_client is None or self._http_client.is_closed:
+            self._http_client = httpx.AsyncClient(event_hooks=httpclient.httpx_response_limit_hooks())
+        yield self._http_client
+
+    async def close(self) -> None:
+        if self._http_client is not None:
+            await self._http_client.aclose()
+            self._http_client = None
+
+    @_bounded_token_retry
    async def get_pic_url(self, media_id: str):
        if not await self.check_access_token():
            self.access_token = await self.get_access_token(self.secret)

        url = f'{self.base_url}/media/get?access_token={self.access_token}&media_id={media_id}'

-        async with httpx.AsyncClient() as client:
-            response = await client.get(url)
-            if response.headers.get('Content-Type', '').startswith('application/json'):
-                data = response.json()
-                if data.get('errcode') in [40014, 42001]:
-                    self.access_token = await self.get_access_token(self.secret)
-                    return await self.get_pic_url(media_id)
-                else:
+        async with self._http_client_context() as client:
+            async with client.stream('GET', url) as response:
+                image_bytes = await _read_httpx_media_limited(response)
+                content_type = response.headers.get('Content-Type', '')
+                if content_type.startswith('application/json'):
+                    data = json.loads(image_bytes)
+                    if data.get('errcode') in [40014, 42001]:
+                        self.access_token = await self.get_access_token(self.secret)
+                        return await self.get_pic_url(media_id)
                    raise Exception('Failed to get image: ' + str(data))

-            # 否则是图片，转成 base64
-            image_bytes = response.content
-            content_type = response.headers.get('Content-Type', '')
-            base64_str = base64.b64encode(image_bytes).decode('utf-8')
-            base64_str = f'data:{content_type};base64,{base64_str}'
-            return base64_str
+                # 否则是图片，转成 base64
+                base64_str = (await asyncio.to_thread(base64.b64encode, image_bytes)).decode('utf-8')
+                return f'data:{content_type};base64,{base64_str}'

    # access——token操作
    async def check_access_token(self):
@@ -81,19 +162,20 @@ class WecomCSClient:

    async def get_access_token(self, secret):
        url = f'{self.base_url}/gettoken?corpid={self.corpid}&corpsecret={secret}'
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            response = await client.get(url)
-            data = response.json()
+            data = await httpclient.parse_json_response(response)
            if 'access_token' in data:
                return data['access_token']
            else:
                raise Exception(f'未获取access token: {data}')

+    @_bounded_token_retry
    async def get_detailed_message_list(self, xml_msg: str):
        # 在本方法中解析消息，并且获得消息的具体内容
        if isinstance(xml_msg, bytes):
            xml_msg = xml_msg.decode('utf-8')
-        root = ET.fromstring(xml_msg)
+        root = await asyncio.to_thread(ET.fromstring, xml_msg)
        token = root.find('Token').text
        open_kfid = root.find('OpenKfId').text

@@ -106,14 +188,14 @@ class WecomCSClient:
            self.access_token = await self.get_access_token(self.secret)

        url = self.base_url + '/kf/sync_msg?access_token=' + self.access_token
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            params = {
                'token': token,
                'voice_format': 0,
                'open_kfid': open_kfid,
            }
            response = await client.post(url, json=params)
-            data = response.json()
+            data = await httpclient.parse_json_response(response)
            if data['errcode'] == 40014 or data['errcode'] == 42001:
                self.access_token = await self.get_access_token(self.secret)
                return await self.get_detailed_message_list(xml_msg)
@@ -130,11 +212,12 @@ class WecomCSClient:
            # await self.change_service_status(userid=external_userid,openkfid=open_kfid,servicer=servicer)
            return last_msg_data

+    @_bounded_token_retry
    async def change_service_status(self, userid: str, openkfid: str, servicer: str):
        if not await self.check_access_token():
            self.access_token = await self.get_access_token(self.secret)
        url = self.base_url + '/kf/service_state/get?access_token=' + self.access_token
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            params = {
                'open_kfid': openkfid,
                'external_userid': userid,
@@ -142,18 +225,19 @@ class WecomCSClient:
                'servicer_userid': servicer,
            }
            response = await client.post(url, json=params)
-            data = response.json()
+            data = await httpclient.parse_json_response(response)
            if data['errcode'] == 40014 or data['errcode'] == 42001:
                self.access_token = await self.get_access_token(self.secret)
-                return await self.change_service_status(userid, openkfid)
+                return await self.change_service_status(userid, openkfid, servicer)
            if data['errcode'] != 0:
                raise Exception('Failed to change service status: ' + str(data))

+    @_bounded_token_retry
    async def send_image(self, user_id: str, agent_id: int, media_id: str):
        if not await self.check_access_token():
            self.access_token = await self.get_access_token(self.secret)
        url = self.base_url + '/media/upload?access_token=' + self.access_token
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            params = {
                'touser': user_id,
                'toparty': '',
@@ -170,7 +254,7 @@ class WecomCSClient:
            }
            try:
                response = await client.post(url, json=params)
-                data = response.json()
+                data = await httpclient.parse_json_response(response)
            except Exception as e:
                raise Exception('Failed to send image: ' + str(e))

@@ -182,6 +266,7 @@ class WecomCSClient:
            if data['errcode'] != 0:
                raise Exception('Failed to send image: ' + str(data))

+    @_bounded_token_retry
    async def send_text_msg(self, open_kfid: str, external_userid: str, msgid: str, content: str):
        if not await self.check_access_token():
            self.access_token = await self.get_access_token(self.secret)
@@ -198,10 +283,10 @@ class WecomCSClient:
            },
        }

-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            response = await client.post(url, json=payload)

-            data = response.json()
+            data = await httpclient.parse_json_response(response)
            if data['errcode'] == 40014 or data['errcode'] == 42001:
                self.access_token = await self.get_access_token(self.secret)
                return await self.send_text_msg(open_kfid, external_userid, msgid, content)
@@ -250,7 +335,15 @@ class WecomCSClient:

            elif req.method == 'POST':
                encrypt_msg = await req.data
-                ret, xml_msg = wxcpt.DecryptMsg(encrypt_msg, msg_signature, timestamp, nonce)
+                if len(encrypt_msg) > _MAX_CALLBACK_BODY_BYTES:
+                    raise ValueError('WeCom customer-service callback body exceeds the size limit')
+                ret, xml_msg = await asyncio.to_thread(
+                    wxcpt.DecryptMsg,
+                    encrypt_msg,
+                    msg_signature,
+                    timestamp,
+                    nonce,
+                )
                if ret != 0:
                    raise Exception(f'消息解密失败，错误码: {ret}')

@@ -315,6 +408,7 @@ class WecomCSClient:
                return ext
        return 'jpg'  # 默认返回jpg

+    @_bounded_token_retry
    async def upload_to_work(self, image: platform_message.Image):
        """
        获取 media_id
@@ -328,9 +422,8 @@ class WecomCSClient:

        # 获取文件的二进制数据
        if image.path:
-            async with aiofiles.open(image.path, 'rb') as f:
-                file_bytes = await f.read()
-                file_name = image.path.split('/')[-1]
+            file_bytes = await _read_local_media_limited(image.path)
+            file_name = image.path.split('/')[-1]
        elif image.url:
            file_bytes = await self.download_image_to_bytes(image.url)
            file_name = image.url.split('/')[-1]
@@ -341,13 +434,15 @@ class WecomCSClient:
                    base64_data = base64_data.split(',', 1)[1]
                padding = 4 - (len(base64_data) % 4) if len(base64_data) % 4 else 0
                padded_base64 = base64_data + '=' * padding
-                file_bytes = base64.b64decode(padded_base64)
+                file_bytes = await _decode_media_base64_limited(padded_base64)
            except binascii.Error as e:
                raise ValueError(f'Invalid base64 string: {str(e)}')
        else:
            raise ValueError('image对象出错')

        # 设置 multipart/form-data 格式的文件
+        if len(file_bytes) > _MAX_MEDIA_BYTES:
+            raise ValueError('WeCom customer-service media exceeds the size limit')
        boundary = '-------------------------acebdf13572468'
        headers = {'Content-Type': f'multipart/form-data; boundary={boundary}'}
        body = (
@@ -361,9 +456,9 @@ class WecomCSClient:
        )

        # 上传文件
-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            response = await client.post(url, headers=headers, content=body)
-            data = response.json()
+            data = await httpclient.parse_json_response(response)
            if data['errcode'] == 40014 or data['errcode'] == 42001:
                self.access_token = await self.get_access_token(self.secret)
                media_id = await self.upload_to_work(image)
@@ -374,16 +469,17 @@ class WecomCSClient:
            return media_id

    async def download_image_to_bytes(self, url: str) -> bytes:
-        async with httpx.AsyncClient() as client:
-            response = await client.get(url)
-            response.raise_for_status()
-            return response.content
+        async with self._http_client_context() as client:
+            async with client.stream('GET', url) as response:
+                response.raise_for_status()
+                return await _read_httpx_media_limited(response)

    # 进行media_id的获取
    async def get_media_id(self, image: platform_message.Image):
        media_id = await self.upload_to_work(image=image)
        return media_id

+    @_bounded_token_retry
    async def get_customer_info(self, external_userid: str) -> dict | None:
        """
        Get customer information by external_userid with caching.
@@ -398,6 +494,11 @@ class WecomCSClient:
        """
        # Check cache first
        current_time = time.time()
+        if current_time - self._customer_cache_cleanup_at >= 30:
+            self._customer_cache_cleanup_at = current_time
+            for user_id, (_, cached_time) in tuple(self._customer_cache.items()):
+                if current_time - cached_time >= self._cache_ttl:
+                    self._customer_cache.pop(user_id, None)
        if external_userid in self._customer_cache:
            cached_info, cached_time = self._customer_cache[external_userid]
            if current_time - cached_time < self._cache_ttl:
@@ -413,9 +514,9 @@ class WecomCSClient:
            'external_userid_list': [external_userid],
        }

-        async with httpx.AsyncClient() as client:
+        async with self._http_client_context() as client:
            response = await client.post(url, json=payload)
-            data = response.json()
+            data = await httpclient.parse_json_response(response)

            if data.get('errcode') in [40014, 42001]:
                self.access_token = await self.get_access_token(self.secret)
@@ -431,5 +532,10 @@ class WecomCSClient:
                customer_info = customer_list[0]
                # Store in cache
                self._customer_cache[external_userid] = (customer_info, current_time)
+                while len(self._customer_cache) > self._CUSTOMER_CACHE_MAX:
+                    self._customer_cache.pop(next(iter(self._customer_cache)), None)
                return customer_info
            return None
+
+    def clear(self) -> None:
+        self._customer_cache.clear()
@@ -6,6 +6,56 @@ import json

 from .errors import WeKnoraAPIError

+_MAX_WENKORA_RESPONSE_BYTES = 1024 * 1024
+_MAX_WENKORA_STREAM_BYTES = 16 * 1024 * 1024
+_MAX_WENKORA_SSE_LINE_BYTES = 1024 * 1024
+
+
+async def _read_limited_response(response: httpx.Response) -> bytes:
+    body = bytearray()
+    async for chunk in response.aiter_bytes(chunk_size=8192):
+        body.extend(chunk)
+        if len(body) > _MAX_WENKORA_RESPONSE_BYTES:
+            raise WeKnoraAPIError('WeKnora response exceeds the runtime limit')
+    return bytes(body)
+
+
+async def _iter_sse_json(
+    response: httpx.Response,
+) -> typing.AsyncGenerator[dict[str, typing.Any], None]:
+    buffer = bytearray()
+    total = 0
+    async for chunk in response.aiter_bytes(chunk_size=8192):
+        total += len(chunk)
+        if total > _MAX_WENKORA_STREAM_BYTES:
+            raise WeKnoraAPIError('WeKnora stream exceeds the runtime limit')
+        buffer.extend(chunk)
+        while b'\n' in buffer:
+            raw_line, _, remainder = buffer.partition(b'\n')
+            buffer = bytearray(remainder)
+            if len(raw_line) > _MAX_WENKORA_SSE_LINE_BYTES:
+                raise WeKnoraAPIError('WeKnora SSE event exceeds the runtime limit')
+            line = raw_line.rstrip(b'\r').strip()
+            if not line.startswith(b'data:'):
+                continue
+            try:
+                data = json.loads(line[5:].strip())
+            except json.JSONDecodeError:
+                continue
+            if isinstance(data, dict):
+                yield data
+        if len(buffer) > _MAX_WENKORA_SSE_LINE_BYTES:
+            raise WeKnoraAPIError('WeKnora SSE event exceeds the runtime limit')
+
+    line = bytes(buffer).rstrip(b'\r').strip()
+    if line.startswith(b'data:'):
+        try:
+            data = json.loads(line[5:].strip())
+        except json.JSONDecodeError:
+            return
+        if isinstance(data, dict):
+            yield data
+

 class AsyncWeKnoraClient:
    """WeKnora API 客户端"""
@@ -39,19 +89,19 @@ class AsyncWeKnoraClient:
            if description:
                payload['description'] = description

-            response = await client.post(
+            async with client.stream(
+                'POST',
                '/sessions',
                headers={
                    'X-API-Key': self.api_key,
                    'Content-Type': 'application/json',
                },
                json=payload,
-            )
-
-            if response.status_code not in (200, 201):
-                raise WeKnoraAPIError(f'{response.status_code} {response.text}')
-
-            data = response.json()
+            ) as response:
+                body = await _read_limited_response(response)
+                if response.status_code not in (200, 201):
+                    raise WeKnoraAPIError(f'{response.status_code} {body.decode("utf-8", errors="replace")}')
+            data = json.loads(body)
            return data['data']['id']

    async def agent_chat(
@@ -107,20 +157,13 @@ class AsyncWeKnoraClient:
                },
                json=payload,
            ) as r:
-                async for chunk in r.aiter_lines():
-                    if r.status_code != 200:
-                        raise WeKnoraAPIError(f'{r.status_code} {chunk}')
-                    if chunk.strip() == '':
-                        continue
-                    if chunk.startswith('data:'):
-                        try:
-                            data = json.loads(chunk[5:].strip())
-                        except json.JSONDecodeError:
-                            continue
-                        yield data
-                        # 收到 error 事件后主动结束流，避免上层未 raise 时持续等待
-                        if data.get('response_type') == 'error':
-                            return
+                if r.status_code != 200:
+                    body = await _read_limited_response(r)
+                    raise WeKnoraAPIError(f'{r.status_code} {body.decode("utf-8", errors="replace")}')
+                async for data in _iter_sse_json(r):
+                    yield data
+                    if data.get('response_type') == 'error':
+                        return

    async def knowledge_chat(
        self,
@@ -164,17 +207,10 @@ class AsyncWeKnoraClient:
                },
                json=payload,
            ) as r:
-                async for chunk in r.aiter_lines():
-                    if r.status_code != 200:
-                        raise WeKnoraAPIError(f'{r.status_code} {chunk}')
-                    if chunk.strip() == '':
-                        continue
-                    if chunk.startswith('data:'):
-                        try:
-                            data = json.loads(chunk[5:].strip())
-                        except json.JSONDecodeError:
-                            continue
-                        yield data
-                        # 收到 error 事件后主动结束流，避免上层未 raise 时持续等待
-                        if data.get('response_type') == 'error':
-                            return
+                if r.status_code != 200:
+                    body = await _read_limited_response(r)
+                    raise WeKnoraAPIError(f'{r.status_code} {body.decode("utf-8", errors="replace")}')
+                async for data in _iter_sse_json(r):
+                    yield data
+                    if data.get('response_type') == 'error':
+                        return
@@ -0,0 +1,116 @@
+from __future__ import annotations
+
+import enum
+import types
+import typing
+
+from .context import RequestContext
+
+
+class WorkspaceRole(enum.StrEnum):
+    OWNER = 'owner'
+    ADMIN = 'admin'
+    DEVELOPER = 'developer'
+    OPERATOR = 'operator'
+    VIEWER = 'viewer'
+
+
+class Permission(enum.StrEnum):
+    WORKSPACE_VIEW = 'workspace.view'
+    WORKSPACE_UPDATE = 'workspace.update'
+    WORKSPACE_DELETE = 'workspace.delete'
+    OWNER_TRANSFER = 'owner.transfer'
+    MEMBER_VIEW = 'member.view'
+    MEMBER_INVITE = 'member.invite'
+    MEMBER_UPDATE_ROLE = 'member.update_role'
+    MEMBER_REMOVE = 'member.remove'
+    RESOURCE_VIEW = 'resource.view'
+    RESOURCE_MANAGE = 'resource.manage'
+    RUNTIME_OPERATE = 'runtime.operate'
+    PROVIDER_SECRET_MANAGE = 'provider_secret.manage'
+    API_KEY_MANAGE = 'api_key.manage'
+    AUDIT_VIEW = 'audit.view'
+    DATA_EXPORT = 'data.export'
+    BILLING_LINK_MANAGE = 'billing_link.manage'
+
+
+_VIEW_PERMISSIONS = {
+    Permission.WORKSPACE_VIEW,
+    Permission.MEMBER_VIEW,
+    Permission.RESOURCE_VIEW,
+}
+
+_ROLE_PERMISSIONS: typing.Final = types.MappingProxyType(
+    {
+        WorkspaceRole.OWNER: frozenset(Permission),
+        WorkspaceRole.ADMIN: frozenset(
+            permission
+            for permission in Permission
+            if permission
+            not in {
+                Permission.WORKSPACE_DELETE,
+                Permission.OWNER_TRANSFER,
+                Permission.BILLING_LINK_MANAGE,
+            }
+        ),
+        WorkspaceRole.DEVELOPER: frozenset(
+            _VIEW_PERMISSIONS
+            | {
+                Permission.RESOURCE_MANAGE,
+                Permission.RUNTIME_OPERATE,
+                Permission.PROVIDER_SECRET_MANAGE,
+            }
+        ),
+        WorkspaceRole.OPERATOR: frozenset(_VIEW_PERMISSIONS | {Permission.RUNTIME_OPERATE}),
+        WorkspaceRole.VIEWER: frozenset(_VIEW_PERMISSIONS),
+    }
+)
+
+
+class AuthorizationError(Exception):
+    """Base class for errors that map to an HTTP authorization response."""
+
+    status_code = 403
+    error_code = 'forbidden'
+
+
+class WorkspaceRequiredError(AuthorizationError):
+    status_code = 400
+    error_code = 'workspace_required'
+
+
+class PermissionDeniedError(AuthorizationError):
+    error_code = 'permission_denied'
+
+    def __init__(self, permission: str) -> None:
+        super().__init__(f'Missing Workspace permission: {permission}')
+        self.permission = permission
+
+
+class EditionLimitError(AuthorizationError):
+    error_code = 'edition_limit'
+
+
+def permissions_for_role(role: str | WorkspaceRole) -> frozenset[str]:
+    """Return the canonical fixed permissions for a Workspace role."""
+
+    try:
+        parsed_role = WorkspaceRole(role)
+    except ValueError:
+        return frozenset()
+    return frozenset(permission.value for permission in _ROLE_PERMISSIONS[parsed_role])
+
+
+def has_permission(ctx: RequestContext, permission: str | Permission) -> bool:
+    """Return whether the context contains one effective permission."""
+
+    permission_value = permission.value if isinstance(permission, Permission) else permission
+    return permission_value in ctx.workspace.permissions
+
+
+def require_permission(ctx: RequestContext, permission: str | Permission) -> None:
+    """Raise a stable authorization error when a permission is missing."""
+
+    permission_value = permission.value if isinstance(permission, Permission) else permission
+    if not has_permission(ctx, permission_value):
+        raise PermissionDeniedError(permission_value)
@@ -0,0 +1,94 @@
+from __future__ import annotations
+
+import dataclasses
+import enum
+
+
+class PrincipalType(enum.StrEnum):
+    """Kinds of authenticated principals accepted by LangBot."""
+
+    ACCOUNT = 'account'
+    API_KEY = 'api_key'
+    SYSTEM = 'system'
+    PUBLIC_BOT = 'public_bot'
+
+
+@dataclasses.dataclass(frozen=True, slots=True)
+class PrincipalContext:
+    """Authenticated identity before Workspace authorization is applied."""
+
+    principal_type: PrincipalType
+    account_uuid: str | None = None
+    api_key_uuid: str | None = None
+
+
+@dataclasses.dataclass(frozen=True, slots=True)
+class WorkspaceContext:
+    """Workspace membership and effective permissions for one request."""
+
+    workspace_uuid: str
+    membership_uuid: str | None
+    role: str | None
+    permissions: frozenset[str]
+    membership_revision: int = 0
+
+
+@dataclasses.dataclass(frozen=True, slots=True)
+class RequestContext:
+    """Trusted authorization context passed to HTTP services."""
+
+    instance_uuid: str
+    placement_generation: int
+    request_id: str
+    auth_type: str
+    principal: PrincipalContext
+    workspace: WorkspaceContext
+    entitlement_revision: int = 0
+
+    @property
+    def workspace_uuid(self) -> str:
+        """Return the selected Workspace UUID."""
+
+        return self.workspace.workspace_uuid
+
+    @property
+    def account_uuid(self) -> str | None:
+        """Return the Account UUID when the principal is an Account."""
+
+        return self.principal.account_uuid
+
+
+@dataclasses.dataclass(frozen=True, slots=True)
+class ExecutionContext:
+    """Workspace context propagated to asynchronous and runtime work."""
+
+    instance_uuid: str
+    workspace_uuid: str
+    placement_generation: int
+    bot_uuid: str | None = None
+    pipeline_uuid: str | None = None
+    query_uuid: str | None = None
+    trigger_principal: PrincipalContext | None = None
+    entitlement_revision: int = 0
+
+    @classmethod
+    def from_request(
+        cls,
+        ctx: RequestContext,
+        *,
+        bot_uuid: str | None = None,
+        pipeline_uuid: str | None = None,
+        query_uuid: str | None = None,
+    ) -> ExecutionContext:
+        """Create a runtime context without losing the tenant generation."""
+
+        return cls(
+            instance_uuid=ctx.instance_uuid,
+            workspace_uuid=ctx.workspace_uuid,
+            placement_generation=ctx.placement_generation,
+            bot_uuid=bot_uuid,
+            pipeline_uuid=pipeline_uuid,
+            query_uuid=query_uuid,
+            trigger_principal=ctx.principal,
+            entitlement_revision=ctx.entitlement_revision,
+        )
@@ -5,9 +5,21 @@ import typing
 import enum
 import quart
 import traceback
+import inspect
+import uuid
 from quart.typing import RouteCallable

-from ....core import app
+from ....utils import constants
+from ....utils import bounded_executor
+from ....workspace.collaboration import MembershipPermissionError, WorkspaceCollaborationError
+from ....workspace.errors import WorkspaceNotFoundError
+from ....cloud.entitlements import EntitlementUnavailableError
+from ....core.errors import TaskCapacityError
+from ..authz import AuthorizationError, Permission, permissions_for_role, require_permission
+from ..context import PrincipalContext, PrincipalType, RequestContext, WorkspaceContext
+
+if typing.TYPE_CHECKING:
+    from ....core.app import Application

 # Maximum file upload size limit (10MB)
 MAX_FILE_SIZE = 10 * 1024 * 1024  # 10MB
@@ -33,6 +45,7 @@ class AuthType(enum.Enum):
    """Authentication type"""

    NONE = 'none'
+    ACCOUNT_TOKEN = 'account-token'
    USER_TOKEN = 'user-token'
    API_KEY = 'api-key'
    USER_TOKEN_OR_API_KEY = 'user-token-or-api-key'
@@ -43,11 +56,11 @@ class RouterGroup(abc.ABC):

    path: str

-    ap: app.Application
+    ap: Application

    quart_app: quart.Quart

-    def __init__(self, ap: app.Application, quart_app: quart.Quart) -> None:
+    def __init__(self, ap: Application, quart_app: quart.Quart) -> None:
        self.ap = ap
        self.quart_app = quart_app

@@ -59,16 +72,38 @@ class RouterGroup(abc.ABC):
        self,
        rule: str,
        auth_type: AuthType = AuthType.USER_TOKEN,
+        permission: Permission | str | None = None,
        **options: typing.Any,
    ) -> typing.Callable[[RouteCallable], RouteCallable]:  # decorator
        """Register a route"""

+        if auth_type == AuthType.ACCOUNT_TOKEN and permission is not None:
+            raise ValueError('Account-token routes cannot declare Workspace permissions')
+
        def decorator(f: RouteCallable) -> RouteCallable:
            nonlocal rule
            rule = self.path + rule

            async def handler_error(*args, **kwargs):
-                if auth_type == AuthType.USER_TOKEN:
+                request_context: RequestContext | None = None
+                if auth_type == AuthType.ACCOUNT_TOKEN:
+                    authorization = quart.request.headers.get('Authorization', '')
+                    if not authorization.startswith('Bearer '):
+                        return self.http_status(401, -1, 'No valid user token provided')
+                    token = authorization.removeprefix('Bearer ')
+                    if not token:
+                        return self.http_status(401, -1, 'No valid user token provided')
+
+                    try:
+                        account, user_email = await self._authenticate_account(token)
+                        # Account-token routes deliberately stop before Workspace
+                        # selection. They may bootstrap a selector, but cannot
+                        # receive RequestContext or enforce Workspace permissions.
+                        self._inject_handler_context(f, kwargs, user_email, None, account)
+                    except Exception as e:
+                        return self._auth_error_response(e)
+
+                elif auth_type == AuthType.USER_TOKEN:
                    # get token from Authorization header
                    token = quart.request.headers.get('Authorization', '').replace('Bearer ', '')

@@ -76,18 +111,15 @@ class RouterGroup(abc.ABC):
                        return self.http_status(401, -1, 'No valid user token provided')

                    try:
-                        user_email = await self.ap.user_service.verify_jwt_token(token)
-
-                        # check if this account exists
-                        user = await self.ap.user_service.get_user_by_email(user_email)
-                        if not user:
-                            return self.http_status(401, -1, 'User not found')
-
-                        # check if f accepts user_email parameter
-                        if 'user_email' in f.__code__.co_varnames:
-                            kwargs['user_email'] = user_email
+                        account, user_email = await self._authenticate_account(token)
+                        request_context = await self._resolve_account_context(account, auth_type)
+                        if permission is not None:
+                            if request_context is None:
+                                raise AuthorizationError('Workspace authorization is unavailable')
+                            require_permission(request_context, permission)
+                        self._inject_handler_context(f, kwargs, user_email, request_context)
                    except Exception as e:
-                        return self.http_status(401, -1, str(e))
+                        return self._auth_error_response(e)

                elif auth_type == AuthType.API_KEY:
                    # get API key from Authorization header or X-API-Key header
@@ -101,11 +133,12 @@ class RouterGroup(abc.ABC):
                        return self.http_status(401, -1, 'No valid API key provided')

                    try:
-                        is_valid = await self.ap.apikey_service.verify_api_key(api_key)
-                        if not is_valid:
-                            return self.http_status(401, -1, 'Invalid API key')
+                        request_context = await self._authenticate_api_key(api_key, auth_type)
+                        if permission is not None:
+                            require_permission(request_context, permission)
+                        self._inject_handler_context(f, kwargs, None, request_context)
                    except Exception as e:
-                        return self.http_status(401, -1, str(e))
+                        return self._auth_error_response(e)

                elif auth_type == AuthType.USER_TOKEN_OR_API_KEY:
                    # Try API key first (check X-API-Key header)
@@ -114,11 +147,12 @@ class RouterGroup(abc.ABC):
                    if api_key:
                        # API key authentication
                        try:
-                            is_valid = await self.ap.apikey_service.verify_api_key(api_key)
-                            if not is_valid:
-                                return self.http_status(401, -1, 'Invalid API key')
+                            request_context = await self._authenticate_api_key(api_key, auth_type)
+                            if permission is not None:
+                                require_permission(request_context, permission)
+                            self._inject_handler_context(f, kwargs, None, request_context)
                        except Exception as e:
-                            return self.http_status(401, -1, str(e))
+                            return self._auth_error_response(e)
                    else:
                        # Try user token authentication (Authorization header)
                        token = quart.request.headers.get('Authorization', '').replace('Bearer ', '')
@@ -129,35 +163,89 @@ class RouterGroup(abc.ABC):
                            )

                        try:
-                            user_email = await self.ap.user_service.verify_jwt_token(token)
-
-                            # check if this account exists
-                            user = await self.ap.user_service.get_user_by_email(user_email)
-                            if not user:
-                                return self.http_status(401, -1, 'User not found')
-
-                            # check if f accepts user_email parameter
-                            if 'user_email' in f.__code__.co_varnames:
-                                kwargs['user_email'] = user_email
+                            account, user_email = await self._authenticate_account(token)
+                            request_context = await self._resolve_account_context(account, auth_type)
+                            if permission is not None:
+                                if request_context is None:
+                                    raise AuthorizationError('Workspace authorization is unavailable')
+                                require_permission(request_context, permission)
+                            self._inject_handler_context(f, kwargs, user_email, request_context)
+                        except (AuthorizationError, WorkspaceNotFoundError, MembershipPermissionError) as e:
+                            # Authentication succeeded and authorization was
+                            # evaluated. Do not reinterpret a denied user token
+                            # as an API key, which would mask the stable 403/404.
+                            return self._auth_error_response(e)
                        except Exception:
                            # If user token fails, maybe it's an API key in Authorization header
                            try:
-                                is_valid = await self.ap.apikey_service.verify_api_key(token)
-                                if not is_valid:
-                                    return self.http_status(401, -1, 'Invalid authentication credentials')
+                                request_context = await self._authenticate_api_key(token, auth_type)
+                                if permission is not None:
+                                    require_permission(request_context, permission)
+                                self._inject_handler_context(f, kwargs, None, request_context)
                            except Exception as e:
-                                return self.http_status(401, -1, str(e))
+                                return self._auth_error_response(e)

                try:
+                    if request_context is not None:
+                        with bounded_executor.blocking_work_scope(request_context.workspace_uuid):
+                            persistence_mgr = getattr(
+                                self.ap,
+                                'persistence_mgr',
+                                None,
+                            )
+                            tenant_scope_descriptor = getattr(
+                                type(persistence_mgr),
+                                'tenant_scope',
+                                None,
+                            )
+                            if callable(tenant_scope_descriptor):
+                                # Authorization discovery is complete. Carry
+                                # the trusted Workspace identity across the
+                                # handler, but do not reserve a database
+                                # connection while it waits on providers,
+                                # runtimes, uploads, or streamed clients.
+                                # Services that need atomic writes open a UoW.
+                                async with persistence_mgr.tenant_scope(request_context.workspace_uuid):
+                                    return await f(*args, **kwargs)
+                            return await f(*args, **kwargs)
                    return await f(*args, **kwargs)

                except Exception as e:  # 自动 500
-                    traceback.print_exc()
-                    # return self.http_status(500, -2, str(e))
-                    return self.http_status(500, -2, str(e))
+                    if isinstance(e, AuthorizationError):
+                        return self.http_status(e.status_code, e.error_code, str(e))
+                    if isinstance(e, WorkspaceNotFoundError):
+                        return self.http_status(404, 'resource_not_found', 'Resource not found')
+                    if isinstance(e, MembershipPermissionError):
+                        return self.http_status(403, e.code, str(e))
+                    if isinstance(e, WorkspaceCollaborationError):
+                        return self.http_status(400, e.code, str(e))
+                    if isinstance(e, TaskCapacityError):
+                        return self.http_status(429, 'task_capacity_exceeded', str(e))
+                    if isinstance(
+                        e,
+                        bounded_executor.BlockingWorkCapacityError,
+                    ):
+                        return self.http_status(
+                            429,
+                            'blocking_work_capacity_exceeded',
+                            str(e),
+                        )
+                    request_id = self.request_id()
+                    logger = getattr(self.ap, 'logger', self.quart_app.logger)
+                    logger.error(
+                        f'Unhandled HTTP error request_id={request_id} '
+                        f'method={quart.request.method} path={quart.request.path}\n{traceback.format_exc()}'
+                    )
+                    return self.internal_error_response(request_id)

            new_f = handler_error
-            new_f.__name__ = (self.name + rule).replace('/', '__')
+            # Quart/Flask requires a unique endpoint name even when the same URL
+            # intentionally has separate handlers for different HTTP methods.
+            # Include the method set so CRUD routes can declare distinct
+            # permissions without colliding during application startup.
+            methods = options.get('methods') or ['GET']
+            method_suffix = '__'.join(sorted(str(method).upper() for method in methods))
+            new_f.__name__ = (self.name + rule + '__' + method_suffix).replace('/', '__')
            new_f.__doc__ = f.__doc__

            self.quart_app.route(rule, **options)(new_f)
@@ -165,6 +253,192 @@ class RouterGroup(abc.ABC):

        return decorator

+    async def _authenticate_account(self, token: str) -> tuple[typing.Any, str]:
+        account: typing.Any = None
+        resolver = getattr(self.ap.user_service, 'get_authenticated_account', None)
+        if callable(resolver):
+            resolved = resolver(token)
+            if inspect.isawaitable(resolved):
+                account = await resolved
+
+        if isinstance(account, str) or account is None:
+            user_email = account or await self.ap.user_service.verify_jwt_token(token)
+            account = await self.ap.user_service.get_user_by_email(user_email)
+        if account is None:
+            raise ValueError('User not found')
+        return account, account.user
+
+    async def _resolve_account_context(
+        self,
+        account: typing.Any,
+        auth_type: AuthType,
+    ) -> RequestContext | None:
+        collaboration_service = getattr(self.ap, 'workspace_collaboration_service', None)
+        account_uuid = getattr(account, 'uuid', None)
+        # Compatibility for isolated controller tests that do not wire the tenancy kernel.
+        if collaboration_service is None or not isinstance(account_uuid, str):
+            return None
+
+        requested_workspace_uuid = quart.request.headers.get('X-Workspace-Id')
+        access = await collaboration_service.resolve_account_workspace(account_uuid, requested_workspace_uuid)
+        entitlement_revision = await self._resolve_entitlement_revision(
+            access.execution.instance_uuid,
+            access.workspace.uuid,
+        )
+        request_context = RequestContext(
+            instance_uuid=access.execution.instance_uuid,
+            placement_generation=access.execution.placement_generation,
+            request_id=self.request_id(),
+            auth_type=auth_type.value,
+            principal=PrincipalContext(
+                principal_type=PrincipalType.ACCOUNT,
+                account_uuid=account_uuid,
+            ),
+            workspace=WorkspaceContext(
+                workspace_uuid=access.workspace.uuid,
+                membership_uuid=access.membership.uuid,
+                role=access.membership.role,
+                permissions=permissions_for_role(access.membership.role),
+                membership_revision=access.membership.projection_revision,
+            ),
+            entitlement_revision=entitlement_revision,
+        )
+        quart.g.request_context = request_context
+        quart.g.workspace_membership = access.membership
+        return request_context
+
+    async def _authenticate_api_key(self, api_key: str, auth_type: AuthType) -> RequestContext:
+        authenticator = getattr(self.ap.apikey_service, 'authenticate_api_key', None)
+        if callable(authenticator):
+            authenticated = authenticator(api_key)
+            if inspect.isawaitable(authenticated):
+                identity = await authenticated
+                if identity is not None:
+                    entitlement_revision = await self._resolve_entitlement_revision(
+                        identity.instance_uuid,
+                        identity.workspace_uuid,
+                    )
+                    request_context = RequestContext(
+                        instance_uuid=identity.instance_uuid,
+                        placement_generation=identity.placement_generation,
+                        request_id=self.request_id(),
+                        auth_type=auth_type.value,
+                        principal=PrincipalContext(
+                            principal_type=PrincipalType.API_KEY,
+                            api_key_uuid=identity.api_key_uuid,
+                        ),
+                        workspace=WorkspaceContext(
+                            workspace_uuid=identity.workspace_uuid,
+                            membership_uuid=None,
+                            role=None,
+                            permissions=identity.permissions,
+                        ),
+                        entitlement_revision=entitlement_revision,
+                    )
+                    quart.g.request_context = request_context
+                    return request_context
+
+        if not await self.ap.apikey_service.verify_api_key(api_key):
+            raise ValueError('Invalid API key')
+        workspace_service = getattr(self.ap, 'workspace_service', None)
+        if workspace_service is None:
+            raise ValueError('API key Workspace binding is unavailable')
+        binding = await workspace_service.get_local_execution_binding()
+        request_context = RequestContext(
+            instance_uuid=binding.instance_uuid or constants.instance_id,
+            placement_generation=binding.placement_generation,
+            request_id=self.request_id(),
+            auth_type=auth_type.value,
+            principal=PrincipalContext(
+                principal_type=PrincipalType.API_KEY,
+                api_key_uuid='legacy-oss-api-key',
+            ),
+            workspace=WorkspaceContext(
+                workspace_uuid=binding.workspace_uuid,
+                membership_uuid=None,
+                role=None,
+                permissions=frozenset(item.value for item in Permission),
+            ),
+        )
+        quart.g.request_context = request_context
+        return request_context
+
+    async def _resolve_entitlement_revision(self, instance_uuid: str, workspace_uuid: str) -> int:
+        deployment = getattr(self.ap, 'deployment', None)
+        if deployment is None or not getattr(deployment, 'multi_workspace_enabled', False):
+            return 0
+        resolver = getattr(self.ap, 'entitlement_resolver', None)
+        if resolver is None:
+            raise EntitlementUnavailableError('Workspace entitlement resolver is unavailable')
+        if instance_uuid != resolver.instance_uuid:
+            raise EntitlementUnavailableError('Workspace entitlement targets another LangBot instance')
+        snapshot = await resolver.resolve(workspace_uuid)
+        return snapshot.entitlement_revision
+
+    @staticmethod
+    def _inject_handler_context(
+        handler: RouteCallable,
+        kwargs: dict[str, typing.Any],
+        user_email: str | None,
+        request_context: RequestContext | None,
+        account: typing.Any = None,
+    ) -> None:
+        parameters = inspect.signature(handler).parameters
+        if user_email is not None and 'user_email' in parameters:
+            kwargs['user_email'] = user_email
+        if account is not None and 'account' in parameters:
+            kwargs['account'] = account
+        if request_context is not None:
+            if 'request_context' in parameters:
+                kwargs['request_context'] = request_context
+            elif 'ctx' in parameters:
+                kwargs['ctx'] = request_context
+
+    def _auth_error_response(self, error: Exception) -> typing.Any:
+        if isinstance(error, AuthorizationError):
+            return self.http_status(error.status_code, error.error_code, str(error))
+        if isinstance(error, WorkspaceNotFoundError):
+            return self.http_status(404, 'resource_not_found', 'Resource not found')
+        if isinstance(error, MembershipPermissionError):
+            return self.http_status(403, error.code, str(error))
+        if isinstance(error, EntitlementUnavailableError):
+            return self.http_status(403, 'entitlement_unavailable', str(error))
+        request_id = self.request_id()
+        logger = getattr(self.ap, 'logger', self.quart_app.logger)
+        logger.warning(f'Authentication failed request_id={request_id} error_type={type(error).__name__}: {error}')
+        return self.http_status(
+            401,
+            'invalid_authentication',
+            'Invalid authentication credentials',
+        )
+
+    def request_id(self) -> str:
+        """Return one stable request ID for authentication, logs, and errors."""
+
+        request_context = getattr(quart.g, 'request_context', None)
+        request_id = getattr(request_context, 'request_id', None) or getattr(quart.g, 'request_id', None)
+        if not request_id:
+            candidate = str(quart.request.headers.get('X-Request-Id') or '').strip()
+            if not candidate or len(candidate) > 128 or any(ord(char) < 32 for char in candidate):
+                candidate = str(uuid.uuid4())
+            request_id = candidate
+            quart.g.request_id = request_id
+        return str(request_id)
+
+    def internal_error_response(self, request_id: str | None = None) -> typing.Tuple[quart.Response, int]:
+        """Return a stable 500 response without exposing the underlying exception."""
+
+        resolved_request_id = request_id or self.request_id()
+        response = quart.jsonify(
+            {
+                'code': 'internal_error',
+                'msg': 'Internal server error',
+                'request_id': resolved_request_id,
+            }
+        )
+        response.headers['X-Request-Id'] = resolved_request_id
+        return response, 500
+
    def success(self, data: typing.Any = None) -> quart.Response:
        """Return a 200 response"""
        return quart.jsonify(
@@ -175,7 +449,7 @@ class RouterGroup(abc.ABC):
            }
        )

-    def fail(self, code: int, msg: str) -> quart.Response:
+    def fail(self, code: int | str, msg: str) -> quart.Response:
        """Return an error response"""

        return quart.jsonify(
@@ -185,6 +459,6 @@ class RouterGroup(abc.ABC):
            }
        )

-    def http_status(self, status: int, code: int, msg: str) -> typing.Tuple[quart.Response, int]:
+    def http_status(self, status: int, code: int | str, msg: str) -> typing.Tuple[quart.Response, int]:
        """返回一个指定状态码的响应"""
        return (self.fail(code, msg), status)
@@ -1,43 +1,66 @@
+from __future__ import annotations
+
+import datetime
+
 import quart

+from ...authz import Permission
+from ...context import RequestContext
 from .. import group


@group.group_class('apikeys', '/api/v1/apikeys')
 class ApiKeysRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('', methods=['GET', 'POST'])
-        async def _() -> str:
-            if quart.request.method == 'GET':
-                keys = await self.ap.apikey_service.get_api_keys()
-                return self.success(data={'keys': keys})
-            elif quart.request.method == 'POST':
-                json_data = await quart.request.json
-                name = json_data.get('name', '')
-                description = json_data.get('description', '')
+        @self.route('', methods=['GET'], permission=Permission.API_KEY_MANAGE)
+        async def _(request_context: RequestContext) -> str:
+            keys = await self.ap.apikey_service.get_api_keys(request_context)
+            return self.success(data={'keys': keys})

-                if not name:
-                    return self.http_status(400, -1, 'Name is required')
+        @self.route('', methods=['POST'], permission=Permission.API_KEY_MANAGE)
+        async def _(request_context: RequestContext) -> str:
+            json_data = await quart.request.json
+            expires_at = json_data.get('expires_at')
+            parsed_expiry = None
+            if expires_at:
+                try:
+                    parsed_expiry = datetime.datetime.fromisoformat(str(expires_at).replace('Z', '+00:00'))
+                except ValueError:
+                    return self.http_status(400, 'invalid_expiry', 'Invalid API key expiry')
+            try:
+                key = await self.ap.apikey_service.create_api_key(
+                    request_context,
+                    json_data.get('name', ''),
+                    json_data.get('description', ''),
+                    scopes=json_data.get('scopes'),
+                    expires_at=parsed_expiry,
+                )
+            except ValueError as error:
+                return self.http_status(400, 'invalid_api_key', str(error))
+            return self.success(data={'key': key})

-                key = await self.ap.apikey_service.create_api_key(name, description)
-                return self.success(data={'key': key})
+        @self.route('/<int:key_id>', methods=['GET'], permission=Permission.API_KEY_MANAGE)
+        async def _(key_id: int, request_context: RequestContext) -> str:
+            key = await self.ap.apikey_service.get_api_key(request_context, key_id)
+            if key is None:
+                return self.http_status(404, 'resource_not_found', 'API key not found')
+            return self.success(data={'key': key})

-        @self.route('/<int:key_id>', methods=['GET', 'PUT', 'DELETE'])
-        async def _(key_id: int) -> str:
-            if quart.request.method == 'GET':
-                key = await self.ap.apikey_service.get_api_key(key_id)
-                if key is None:
-                    return self.http_status(404, -1, 'API key not found')
-                return self.success(data={'key': key})
+        @self.route('/<int:key_id>', methods=['PUT'], permission=Permission.API_KEY_MANAGE)
+        async def _(key_id: int, request_context: RequestContext) -> str:
+            json_data = await quart.request.json
+            try:
+                await self.ap.apikey_service.update_api_key(
+                    request_context,
+                    key_id,
+                    json_data.get('name'),
+                    json_data.get('description'),
+                )
+            except ValueError as error:
+                return self.http_status(400, 'invalid_api_key', str(error))
+            return self.success()

-            elif quart.request.method == 'PUT':
-                json_data = await quart.request.json
-                name = json_data.get('name')
-                description = json_data.get('description')
-
-                await self.ap.apikey_service.update_api_key(key_id, name, description)
-                return self.success()
-
-            elif quart.request.method == 'DELETE':
-                await self.ap.apikey_service.delete_api_key(key_id)
-                return self.success()
+        @self.route('/<int:key_id>', methods=['DELETE'], permission=Permission.API_KEY_MANAGE)
+        async def _(key_id: int, request_context: RequestContext) -> str:
+            await self.ap.apikey_service.delete_api_key(request_context, key_id)
+            return self.success()
@@ -1,7 +1,11 @@
 from __future__ import annotations

 from langbot.pkg.utils import constants
+from langbot_plugin.box.errors import BoxAdmissionError

+from langbot.pkg.cloud.entitlements import EntitlementUnavailableError
+from ...authz import Permission
+from ...context import RequestContext
 from .. import group
 from .box_visibility import should_hide_box_runtime_status

@@ -9,18 +13,56 @@ from .box_visibility import should_hide_box_runtime_status
@group.group_class('box', '/api/v1/box')
 class BoxRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('/status', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def _() -> str:
-            status = await self.ap.box_service.get_status()
+        @self.route(
+            '/status',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
+            try:
+                status = await self.ap.box_service.get_status(request_context)
+            except (BoxAdmissionError, EntitlementUnavailableError) as exc:
+                return self.http_status(403, 'managed_sandbox_unavailable', str(exc))
            status['hidden'] = should_hide_box_runtime_status(constants.edition, status.get('enabled'))
            return self.success(data=status)

-        @self.route('/sessions', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def _() -> str:
-            sessions = await self.ap.box_service.get_sessions()
+        @self.route(
+            '/runtime-status',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
+            del request_context
+            status = await self.ap.box_service.get_backend_status()
+            status['hidden'] = should_hide_box_runtime_status(constants.edition, status.get('enabled'))
+            return self.success(data=status)
+
+        @self.route(
+            '/sessions',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN,
+            permission=Permission.AUDIT_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
+            try:
+                sessions = await self.ap.box_service.get_sessions(request_context)
+            except (BoxAdmissionError, EntitlementUnavailableError) as exc:
+                return self.http_status(403, 'managed_sandbox_unavailable', str(exc))
            return self.success(data=sessions)

-        @self.route('/errors', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def _() -> str:
-            errors = self.ap.box_service.get_recent_errors()
+        @self.route(
+            '/errors',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN,
+            permission=Permission.AUDIT_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
+            try:
+                if getattr(self.ap.box_service, 'managed_admission_required', False):
+                    await self.ap.box_service.require_workspace_sandbox(request_context)
+            except (BoxAdmissionError, EntitlementUnavailableError) as exc:
+                return self.http_status(403, 'managed_sandbox_unavailable', str(exc))
+            errors = self.ap.box_service.get_recent_errors(request_context)
            return self.success(data=errors)
@@ -3,6 +3,9 @@ from __future__ import annotations
 import asyncio
 import quart

+from ...authz import Permission
+from ...context import RequestContext
+from ...service.secrets import redact_secrets
 from .. import group


@@ -11,12 +14,29 @@ class ExtensionsRouterGroup(group.RouterGroup):
    """Unified API for installed extensions (plugins, MCP servers, skills)."""

    async def initialize(self) -> None:
-        @self.route('', methods=['GET'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _() -> quart.Response:
+        @self.route(
+            '',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(request_context: RequestContext) -> quart.Response:
+            if self.ap.plugin_connector.is_enable_plugin:
+                await self.ap.plugin_connector.require_workspace_context(request_context)
+
+            async def read_in_task_scope(operation):
+                tenant_scope = getattr(getattr(self.ap, 'persistence_mgr', None), 'tenant_scope', None)
+                if callable(tenant_scope):
+                    async with tenant_scope(request_context.workspace_uuid):
+                        return await operation()
+                return await operation()
+
            plugins, mcp_servers, skills = await asyncio.gather(
-                self.ap.plugin_connector.list_plugins(),
-                self.ap.mcp_service.get_mcp_servers(contain_runtime_info=True),
-                self.ap.skill_service.list_skills(),
+                read_in_task_scope(self.ap.plugin_connector.list_plugins),
+                read_in_task_scope(
+                    lambda: self.ap.mcp_service.get_mcp_servers(request_context, contain_runtime_info=True)
+                ),
+                read_in_task_scope(lambda: self.ap.skill_service.list_skills(request_context)),
                return_exceptions=True,
            )

@@ -39,7 +59,7 @@ class ExtensionsRouterGroup(group.RouterGroup):
            extensions: list[dict] = []
            if isinstance(plugins, list):
                for plugin in plugins:
-                    extensions.append({'type': 'plugin', 'plugin': plugin})
+                    extensions.append({'type': 'plugin', 'plugin': redact_secrets(plugin)})
            if isinstance(mcp_servers, list):
                for server in mcp_servers:
                    extensions.append({'type': 'mcp', 'server': server})
@@ -7,29 +7,53 @@ import asyncio

 import quart.datastructures

+from ...authz import Permission
+from ...context import RequestContext
 from .. import group


+def _storage_owner(context: RequestContext) -> str:
+    if context.principal.account_uuid:
+        return f'account:{context.principal.account_uuid}'
+    if context.principal.api_key_uuid:
+        return f'api-key:{context.principal.api_key_uuid}'
+    return f'principal:{context.principal.principal_type.value}'
+
+
@group.group_class('files', '/api/v1/files')
 class FilesRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('/image/<path:image_key>', methods=['GET'], auth_type=group.AuthType.NONE)
-        async def _(image_key: str) -> quart.Response:
-            if '..' in image_key or '\\' in image_key:
+        @self.route(
+            '/image/<path:image_key>',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(image_key: str, request_context: RequestContext) -> quart.Response:
+            image_bytes = await self.ap.storage_mgr.resolve_public_object(
+                image_key,
+                expected_owner_type='upload_image',
+            )
+            if image_bytes is None:
+                image_bytes = await self.ap.storage_mgr.resolve_public_object(
+                    image_key,
+                    expected_owner_type='bot_log',
+                )
+            if image_bytes is None:
                return quart.Response(status=404)
-
-            if not await self.ap.storage_mgr.storage_provider.exists(image_key):
-                return quart.Response(status=404)
-
-            image_bytes = await self.ap.storage_mgr.storage_provider.load(image_key)
            mime_type = mimetypes.guess_type(image_key)[0]
            if mime_type is None:
                mime_type = 'image/jpeg'

            return quart.Response(image_bytes, mimetype=mime_type)

-        @self.route('/images', methods=['POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def upload_image() -> quart.Response:
+        @self.route(
+            '/images',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def upload_image(request_context: RequestContext) -> quart.Response:
            request = quart.request

            # Check file size limit before reading the file
@@ -66,18 +90,29 @@ class FilesRouterGroup(group.RouterGroup):
            if '/' in file_name or '\\' in file_name:
                return self.fail(400, 'File name contains invalid characters')

-            file_key = file_name + '_' + str(uuid.uuid4())[:8] + '.' + extension
+            logical_key = f'{uuid.uuid4()}.{extension}'

            # save file to storage
-            await self.ap.storage_mgr.storage_provider.save(file_key, file_bytes)
+            file_key = await self.ap.storage_mgr.save_scoped(
+                request_context,
+                owner_type='upload_image',
+                owner=_storage_owner(request_context),
+                key=logical_key,
+                value=file_bytes,
+            )
            return self.success(
                data={
                    'file_key': file_key,
                }
            )

-        @self.route('/documents', methods=['POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def upload_document() -> quart.Response:
+        @self.route(
+            '/documents',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def upload_document(request_context: RequestContext) -> quart.Response:
            request = quart.request

            # Check file size limit before reading the file
@@ -110,12 +145,18 @@ class FilesRouterGroup(group.RouterGroup):
            if '/' in file_name or '\\' in file_name:
                return self.fail(400, 'File name contains invalid characters')

-            file_key = file_name + '_' + str(uuid.uuid4())[:8]
+            logical_key = str(uuid.uuid4())
            if extension:
-                file_key += '.' + extension
+                logical_key += '.' + extension

            # save file to storage
-            await self.ap.storage_mgr.storage_provider.save(file_key, file_bytes)
+            file_key = await self.ap.storage_mgr.save_scoped(
+                request_context,
+                owner_type='upload_document',
+                owner=_storage_owner(request_context),
+                key=logical_key,
+                value=file_bytes,
+            )
            return self.success(
                data={
                    'file_id': file_key,
@@ -1,100 +1,146 @@
 import quart
+
+from ....authz import Permission, has_permission
+from ....context import RequestContext
 from ... import group


@group.group_class('knowledge_base', '/api/v1/knowledge/bases')
 class KnowledgeBaseRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('', methods=['POST', 'GET'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def handle_knowledge_bases() -> quart.Response:
-            if quart.request.method == 'GET':
-                knowledge_bases = await self.ap.knowledge_service.get_knowledge_bases()
-                return self.success(data={'bases': knowledge_bases})
+        @self.route(
+            '',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def handle_knowledge_bases(request_context: RequestContext) -> quart.Response:
+            knowledge_bases = await self.ap.knowledge_service.get_knowledge_bases(
+                request_context,
+                include_secret=has_permission(request_context, Permission.RESOURCE_MANAGE),
+            )
+            return self.success(data={'bases': knowledge_bases})

-            elif quart.request.method == 'POST':
-                json_data = await quart.request.json
-                try:
-                    knowledge_base_uuid = await self.ap.knowledge_service.create_knowledge_base(json_data)
-                except ValueError as e:
-                    return self.http_status(400, -1, str(e))
-                return self.success(data={'uuid': knowledge_base_uuid})
-
-            return self.http_status(405, -1, 'Method not allowed')
+        @self.route(
+            '',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def create_knowledge_base(request_context: RequestContext) -> quart.Response:
+            json_data = await quart.request.json
+            try:
+                knowledge_base_uuid = await self.ap.knowledge_service.create_knowledge_base(
+                    request_context,
+                    json_data,
+                )
+            except ValueError as e:
+                return self.http_status(400, -1, str(e))
+            return self.success(data={'uuid': knowledge_base_uuid})

        @self.route(
            '/<knowledge_base_uuid>',
-            methods=['GET', 'DELETE', 'PUT'],
+            methods=['GET'],
            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
        )
-        async def handle_specific_knowledge_base(knowledge_base_uuid: str) -> quart.Response:
-            if quart.request.method == 'GET':
-                knowledge_base = await self.ap.knowledge_service.get_knowledge_base(knowledge_base_uuid)
+        async def get_specific_knowledge_base(
+            knowledge_base_uuid: str,
+            request_context: RequestContext,
+        ) -> quart.Response:
+            knowledge_base = await self.ap.knowledge_service.get_knowledge_base(
+                request_context,
+                knowledge_base_uuid,
+                include_secret=has_permission(request_context, Permission.RESOURCE_MANAGE),
+            )
+            if knowledge_base is None:
+                return self.http_status(404, 'resource_not_found', 'knowledge base not found')
+            return self.success(data={'base': knowledge_base})

-                if knowledge_base is None:
-                    return self.http_status(404, -1, 'knowledge base not found')
-
-                return self.success(
-                    data={
-                        'base': knowledge_base,
-                    }
-                )
-
-            elif quart.request.method == 'PUT':
+        @self.route(
+            '/<knowledge_base_uuid>',
+            methods=['DELETE', 'PUT'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def mutate_specific_knowledge_base(
+            knowledge_base_uuid: str,
+            request_context: RequestContext,
+        ) -> quart.Response:
+            if quart.request.method == 'PUT':
                json_data = await quart.request.json
-                await self.ap.knowledge_service.update_knowledge_base(knowledge_base_uuid, json_data)
+                await self.ap.knowledge_service.update_knowledge_base(
+                    request_context,
+                    knowledge_base_uuid,
+                    json_data,
+                )
                return self.success(data={'uuid': knowledge_base_uuid})
-
-            elif quart.request.method == 'DELETE':
-                await self.ap.knowledge_service.delete_knowledge_base(knowledge_base_uuid)
-                return self.success({})
+            await self.ap.knowledge_service.delete_knowledge_base(request_context, knowledge_base_uuid)
+            return self.success({})

        @self.route(
            '/<knowledge_base_uuid>/files',
-            methods=['GET', 'POST'],
+            methods=['GET'],
            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
        )
-        async def get_knowledge_base_files(knowledge_base_uuid: str) -> str:
-            if quart.request.method == 'GET':
-                files = await self.ap.knowledge_service.get_files_by_knowledge_base(knowledge_base_uuid)
-                return self.success(
-                    data={
-                        'files': files,
-                    }
-                )
+        async def get_knowledge_base_files(
+            knowledge_base_uuid: str,
+            request_context: RequestContext,
+        ) -> str:
+            files = await self.ap.knowledge_service.get_files_by_knowledge_base(
+                request_context,
+                knowledge_base_uuid,
+            )
+            return self.success(data={'files': files})

-            elif quart.request.method == 'POST':
-                json_data = await quart.request.json
-                file_id = json_data.get('file_id')
-                if not file_id:
-                    return self.http_status(400, -1, 'File ID is required')
-
-                parser_plugin_id = json_data.get('parser_plugin_id')
-
-                # 调用服务层方法将文件与知识库关联
-                task_id = await self.ap.knowledge_service.store_file(
-                    knowledge_base_uuid, file_id, parser_plugin_id=parser_plugin_id
-                )
-                return self.success(
-                    {
-                        'task_id': task_id,
-                    }
-                )
+        @self.route(
+            '/<knowledge_base_uuid>/files',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def add_knowledge_base_file(
+            knowledge_base_uuid: str,
+            request_context: RequestContext,
+        ) -> str:
+            json_data = await quart.request.json
+            file_id = json_data.get('file_id')
+            if not file_id:
+                return self.http_status(400, -1, 'File ID is required')
+            parser_plugin_id = json_data.get('parser_plugin_id')
+            task_id = await self.ap.knowledge_service.store_file(
+                request_context,
+                knowledge_base_uuid,
+                file_id,
+                parser_plugin_id=parser_plugin_id,
+            )
+            return self.success({'task_id': task_id})

        @self.route(
            '/<knowledge_base_uuid>/files/<file_id>',
            methods=['DELETE'],
            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
        )
-        async def delete_specific_file_in_kb(file_id: str, knowledge_base_uuid: str) -> str:
-            await self.ap.knowledge_service.delete_file(knowledge_base_uuid, file_id)
+        async def delete_specific_file_in_kb(
+            file_id: str,
+            knowledge_base_uuid: str,
+            request_context: RequestContext,
+        ) -> str:
+            await self.ap.knowledge_service.delete_file(request_context, knowledge_base_uuid, file_id)
            return self.success({})

        @self.route(
            '/<knowledge_base_uuid>/retrieve',
            methods=['POST'],
            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
        )
-        async def retrieve_knowledge_base(knowledge_base_uuid: str) -> str:
+        async def retrieve_knowledge_base(
+            knowledge_base_uuid: str,
+            request_context: RequestContext,
+        ) -> str:
            json_data = await quart.request.json
            query = json_data.get('query')

@@ -104,6 +150,9 @@ class KnowledgeBaseRouterGroup(group.RouterGroup):
            # Extract retrieval_settings to allow dynamic control over Knowledge Engine behavior (e.g. top_k, filters)
            retrieval_settings = json_data.get('retrieval_settings', {})
            results = await self.ap.knowledge_service.retrieve_knowledge_base(
-                knowledge_base_uuid, query, retrieval_settings
+                request_context,
+                knowledge_base_uuid,
+                query,
+                retrieval_settings,
            )
            return self.success(data={'results': results})
@@ -1,25 +1,39 @@
 import quart
 from urllib.parse import unquote
+
+from ....authz import Permission
+from ....context import RequestContext
 from ... import group


@group.group_class('knowledge_engines', '/api/v1/knowledge/engines')
 class KnowledgeEnginesRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('', methods=['GET'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def list_knowledge_engines() -> quart.Response:
+        @self.route(
+            '',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def list_knowledge_engines(request_context: RequestContext) -> quart.Response:
            """List all available Knowledge Engines from plugins.

            Returns a list of Knowledge Engines with their capabilities and configuration schemas.
            This is used by the frontend to render the knowledge base creation wizard.
            """
-            engines = await self.ap.knowledge_service.list_knowledge_engines()
+            engines = await self.ap.knowledge_service.list_knowledge_engines(request_context)
            return self.success(data={'engines': engines})

        @self.route(
-            '/<path:plugin_id>/creation-schema', methods=['GET'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY
+            '/<path:plugin_id>/creation-schema',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
        )
-        async def get_engine_creation_schema(plugin_id: str) -> quart.Response:
+        async def get_engine_creation_schema(
+            plugin_id: str,
+            request_context: RequestContext,
+        ) -> quart.Response:
            """Get creation settings schema for a specific Knowledge Engine.

            plugin_id is in 'author/name' format, captured via <path:> converter.
@@ -27,13 +41,19 @@ class KnowledgeEnginesRouterGroup(group.RouterGroup):
            plugin_id = unquote(plugin_id)
            if '/' not in plugin_id:
                return self.http_status(400, -1, 'Invalid plugin_id format. Expected author/name.')
-            schema = await self.ap.knowledge_service.get_engine_creation_schema(plugin_id)
+            schema = await self.ap.knowledge_service.get_engine_creation_schema(request_context, plugin_id)
            return self.success(data={'schema': schema})

        @self.route(
-            '/<path:plugin_id>/retrieval-schema', methods=['GET'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY
+            '/<path:plugin_id>/retrieval-schema',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
        )
-        async def get_engine_retrieval_schema(plugin_id: str) -> quart.Response:
+        async def get_engine_retrieval_schema(
+            plugin_id: str,
+            request_context: RequestContext,
+        ) -> quart.Response:
            """Get retrieval settings schema for a specific Knowledge Engine.

            plugin_id is in 'author/name' format, captured via <path:> converter.
@@ -41,5 +61,5 @@ class KnowledgeEnginesRouterGroup(group.RouterGroup):
            plugin_id = unquote(plugin_id)
            if '/' not in plugin_id:
                return self.http_status(400, -1, 'Invalid plugin_id format. Expected author/name.')
-            schema = await self.ap.knowledge_service.get_engine_retrieval_schema(plugin_id)
+            schema = await self.ap.knowledge_service.get_engine_retrieval_schema(request_context, plugin_id)
            return self.success(data={'schema': schema})
@@ -6,8 +6,12 @@ import quart
 import sqlalchemy

 from ... import group
+from ....authz import Permission
+from ....context import ExecutionContext, RequestContext
 from ......core import taskmgr
 from ......entity.persistence import metadata as persistence_metadata
+from ......workspace.errors import WorkspaceError, WorkspaceNotFoundError
+from ......utils import httpclient
 from langbot_plugin.runtime.plugin.mgr import PluginInstallSource

 LANGRAG_PLUGIN_AUTHOR = 'langbot-team'
@@ -31,24 +35,100 @@ EXTERNAL_PLUGIN_CREATION_FIELDS: dict[str, set[str] | None] = {
    'langbot-team/FastGPTConnector': None,  # all fields -> creation_settings
 }

+_INFORMATION_SCHEMA_TABLES = sqlalchemy.table(
+    'tables',
+    sqlalchemy.column('table_schema'),
+    sqlalchemy.column('table_name'),
+    schema='information_schema',
+)
+_SQLITE_MASTER = sqlalchemy.table(
+    'sqlite_master',
+    sqlalchemy.column('type'),
+    sqlalchemy.column('name'),
+)
+_LEGACY_KNOWLEDGE_BASE_BACKUP = sqlalchemy.table(
+    'knowledge_bases_backup',
+    sqlalchemy.column('uuid'),
+    sqlalchemy.column('name'),
+    sqlalchemy.column('description'),
+    sqlalchemy.column('emoji'),
+    sqlalchemy.column('embedding_model_uuid'),
+    sqlalchemy.column('top_k'),
+    sqlalchemy.column('created_at'),
+    sqlalchemy.column('updated_at'),
+)
+_LEGACY_EXTERNAL_KNOWLEDGE_BASE = sqlalchemy.table(
+    'external_knowledge_bases',
+    sqlalchemy.column('uuid'),
+    sqlalchemy.column('name'),
+    sqlalchemy.column('description'),
+    sqlalchemy.column('emoji'),
+    sqlalchemy.column('plugin_author'),
+    sqlalchemy.column('plugin_name'),
+    sqlalchemy.column('retriever_config'),
+    sqlalchemy.column('created_at'),
+)
+_CURRENT_KNOWLEDGE_BASE = sqlalchemy.table(
+    'knowledge_bases',
+    sqlalchemy.column('uuid'),
+    sqlalchemy.column('workspace_uuid'),
+    sqlalchemy.column('name'),
+    sqlalchemy.column('description'),
+    sqlalchemy.column('emoji'),
+    sqlalchemy.column('created_at'),
+    sqlalchemy.column('updated_at'),
+    sqlalchemy.column('knowledge_engine_plugin_id'),
+    sqlalchemy.column('collection_id'),
+    sqlalchemy.column('creation_settings'),
+    sqlalchemy.column('retrieval_settings'),
+)
+

@group.group_class('knowledge/migration', '/api/v1/knowledge/migration')
 class KnowledgeMigrationRouterGroup(group.RouterGroup):
-    async def _get_migration_flag(self) -> bool:
+    async def _require_local_migration_context(
+        self,
+        execution_context: ExecutionContext,
+    ) -> ExecutionContext:
+        """Fence legacy-table migration to the OSS singleton Workspace.
+
+        The backup tables predate Workspace scoping and are deliberately
+        instance-global.  A cloud projection must therefore never be allowed
+        to inspect or restore them, even when it has a valid execution lease.
+        """
+        try:
+            binding = await self.ap.workspace_service.get_local_execution_binding(
+                execution_context.workspace_uuid,
+                expected_generation=execution_context.placement_generation,
+            )
+        except WorkspaceNotFoundError:
+            raise
+        except WorkspaceError as exc:
+            raise WorkspaceNotFoundError('RAG migration is unavailable') from exc
+
+        if binding.instance_uuid != execution_context.instance_uuid:
+            raise WorkspaceNotFoundError('RAG migration is unavailable')
+        return ExecutionContext(
+            instance_uuid=binding.instance_uuid,
+            workspace_uuid=binding.workspace_uuid,
+            placement_generation=binding.placement_generation,
+        )
+
+    async def _get_migration_flag(self, execution_context: ExecutionContext) -> bool:
        """Check if rag_plugin_migration_needed flag is set."""
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_metadata.Metadata).where(
-                persistence_metadata.Metadata.key == 'rag_plugin_migration_needed'
-            )
+            sqlalchemy.select(persistence_metadata.WorkspaceMetadata.value)
+            .where(persistence_metadata.WorkspaceMetadata.workspace_uuid == execution_context.workspace_uuid)
+            .where(persistence_metadata.WorkspaceMetadata.key == 'rag_plugin_migration_needed')
        )
-        row = result.first()
-        return row is not None and row.value == 'true'
+        return result.scalar_one_or_none() == 'true'

-    async def _set_migration_flag(self, value: str):
+    async def _set_migration_flag(self, execution_context: ExecutionContext, value: str):
        """Set rag_plugin_migration_needed flag."""
        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.update(persistence_metadata.Metadata)
-            .where(persistence_metadata.Metadata.key == 'rag_plugin_migration_needed')
+            sqlalchemy.update(persistence_metadata.WorkspaceMetadata)
+            .where(persistence_metadata.WorkspaceMetadata.workspace_uuid == execution_context.workspace_uuid)
+            .where(persistence_metadata.WorkspaceMetadata.key == 'rag_plugin_migration_needed')
            .values(value=value)
        )

@@ -56,35 +136,47 @@ class KnowledgeMigrationRouterGroup(group.RouterGroup):
        """Check if a table exists."""
        if self.ap.persistence_mgr.db.name == 'postgresql':
            result = await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.text(
-                    'SELECT EXISTS (SELECT FROM information_schema.tables WHERE table_name = :table_name);'
-                ).bindparams(table_name=table_name)
+                sqlalchemy.select(_INFORMATION_SCHEMA_TABLES.c.table_name)
+                .where(_INFORMATION_SCHEMA_TABLES.c.table_schema == 'public')
+                .where(_INFORMATION_SCHEMA_TABLES.c.table_name == table_name)
+                .limit(1)
            )
-            return result.scalar()
+            return result.first() is not None
        else:
            result = await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.text("SELECT name FROM sqlite_master WHERE type='table' AND name=:table_name;").bindparams(
-                    table_name=table_name
-                )
+                sqlalchemy.select(_SQLITE_MASTER.c.name)
+                .where(_SQLITE_MASTER.c.type == 'table')
+                .where(_SQLITE_MASTER.c.name == table_name)
+                .limit(1)
            )
            return result.first() is not None

    async def _install_plugin_from_marketplace(
-        self, plugin_id: str, task_context: taskmgr.TaskContext, space_url: str
+        self,
+        execution_context: ExecutionContext,
+        plugin_id: str,
+        task_context: taskmgr.TaskContext,
+        space_url: str,
    ) -> None:
        """Install a single plugin from the marketplace."""
        p_author, p_name = plugin_id.split('/', 1)
        self.ap.logger.info(f'RAG migration: installing plugin {plugin_id} from marketplace...')
        task_context.trace(f'Installing plugin {plugin_id} from marketplace...')

-        async with httpx.AsyncClient(trust_env=True, timeout=15) as client:
+        async with httpx.AsyncClient(
+            trust_env=True,
+            timeout=15,
+            event_hooks=httpclient.httpx_response_limit_hooks(),
+        ) as client:
            resp = await client.get(f'{space_url}/api/v1/marketplace/plugins/{p_author}/{p_name}')
            resp.raise_for_status()
-            p_data = resp.json().get('data', {}).get('plugin', {})
+            response_data = await httpclient.parse_json_response(resp)
+            p_data = response_data.get('data', {}).get('plugin', {})
            p_version = p_data.get('latest_version')
            if not p_version:
                raise Exception(f'Could not determine latest version for {plugin_id}')

+        await self.ap.plugin_connector.require_workspace_context(execution_context)
        await self.ap.plugin_connector.install_plugin(
            PluginInstallSource.MARKETPLACE,
            {
@@ -96,8 +188,15 @@ class KnowledgeMigrationRouterGroup(group.RouterGroup):
        )
        self.ap.logger.info(f'RAG migration: plugin {plugin_id} install request sent.')

-    async def _execute_rag_migration(self, task_context: taskmgr.TaskContext, install_plugin: bool = True):
+    async def _execute_rag_migration(
+        self,
+        execution_context: ExecutionContext,
+        task_context: taskmgr.TaskContext,
+        install_plugin: bool = True,
+    ):
        """Execute RAG migration: install required plugins and restore backup data."""
+        execution_context = await self._require_local_migration_context(execution_context)
+        execution_context = await self.ap.plugin_connector.require_workspace_context(execution_context)
        warnings = []

        # Collect all plugins we need: LangRAG (always) + connector plugins (from external KBs)
@@ -108,7 +207,10 @@ class KnowledgeMigrationRouterGroup(group.RouterGroup):
        has_external = await self._table_exists('external_knowledge_bases')
        if has_external:
            result = await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.text('SELECT DISTINCT plugin_author, plugin_name FROM external_knowledge_bases;')
+                sqlalchemy.select(
+                    _LEGACY_EXTERNAL_KNOWLEDGE_BASE.c.plugin_author,
+                    _LEGACY_EXTERNAL_KNOWLEDGE_BASE.c.plugin_name,
+                ).distinct()
            )
            for row in result.fetchall():
                plugin_author = row[0] or ''
@@ -127,7 +229,14 @@ class KnowledgeMigrationRouterGroup(group.RouterGroup):

            for plugin_id in needed_plugins:
                try:
-                    await self._install_plugin_from_marketplace(plugin_id, task_context, space_url)
+                    await self._install_plugin_from_marketplace(
+                        execution_context,
+                        plugin_id,
+                        task_context,
+                        space_url,
+                    )
+                except WorkspaceNotFoundError:
+                    raise
                except Exception as e:
                    self.ap.logger.warning(f'RAG migration: plugin {plugin_id} install returned: {e}')
                    task_context.trace(f'Plugin install note ({plugin_id}): {e}')
@@ -141,8 +250,11 @@ class KnowledgeMigrationRouterGroup(group.RouterGroup):
            engine_id_set: set[str] = set()
            for i in range(max_retries):
                try:
+                    await self.ap.plugin_connector.require_workspace_context(execution_context)
                    engines = await self.ap.plugin_connector.list_knowledge_engines()
                    engine_id_set = {e.get('plugin_id') for e in engines}
+                except WorkspaceNotFoundError:
+                    raise
                except Exception:
                    pass
                if all(pid in engine_id_set for pid in needed_plugins):
@@ -158,17 +270,18 @@ class KnowledgeMigrationRouterGroup(group.RouterGroup):
                await asyncio.sleep(2)
        else:
            try:
+                await self.ap.plugin_connector.require_workspace_context(execution_context)
                engines = await self.ap.plugin_connector.list_knowledge_engines()
                engine_id_set = {e.get('plugin_id') for e in engines}
+            except WorkspaceNotFoundError:
+                raise
            except Exception:
                engine_id_set = set()

        # Step 3: Restore internal knowledge bases from backup
        task_context.trace('Restoring internal knowledge bases...', action='restore-internal')
        if await self._table_exists('knowledge_bases_backup'):
-            result = await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.text('SELECT * FROM knowledge_bases_backup;')
-            )
+            result = await self.ap.persistence_mgr.execute_async(sqlalchemy.select(_LEGACY_KNOWLEDGE_BASE_BACKUP))
            rows = result.fetchall()
            columns = result.keys()

@@ -183,30 +296,30 @@ class KnowledgeMigrationRouterGroup(group.RouterGroup):
                created_at = row_dict.get('created_at')
                updated_at = row_dict.get('updated_at')

+                # DB migration 20 created these columns as TEXT, while a fresh
+                # schema uses SQLAlchemy JSON.  Keep the statement structured,
+                # but retain untyped bound values so both physical schemas and
+                # SQLite's string-valued legacy DATETIME rows remain valid.
                creation_settings = json.dumps({'embedding_model_uuid': embedding_model_uuid})
                retrieval_settings = json.dumps({'top_k': top_k})

                await self.ap.persistence_mgr.execute_async(
-                    sqlalchemy.text(
-                        'INSERT INTO knowledge_bases '
-                        '(uuid, name, description, emoji, created_at, updated_at, '
-                        'knowledge_engine_plugin_id, collection_id, creation_settings, retrieval_settings) '
-                        'VALUES (:uuid, :name, :description, :emoji, :created_at, :updated_at, '
-                        ':plugin_id, :collection_id, :creation_settings, :retrieval_settings);'
-                    ).bindparams(
+                    sqlalchemy.insert(_CURRENT_KNOWLEDGE_BASE).values(
                        uuid=kb_uuid,
+                        workspace_uuid=execution_context.workspace_uuid,
                        name=name,
                        description=description,
                        emoji=emoji,
                        created_at=created_at,
                        updated_at=updated_at,
-                        plugin_id=LANGRAG_PLUGIN_ID,
+                        knowledge_engine_plugin_id=LANGRAG_PLUGIN_ID,
                        collection_id=kb_uuid,
                        creation_settings=creation_settings,
                        retrieval_settings=retrieval_settings,
                    )
                )

+                await self.ap.plugin_connector.require_workspace_context(execution_context)
                try:
                    config = {'embedding_model_uuid': embedding_model_uuid}
                    await self.ap.plugin_connector.rag_on_kb_create(LANGRAG_PLUGIN_ID, kb_uuid, config)
@@ -221,9 +334,7 @@ class KnowledgeMigrationRouterGroup(group.RouterGroup):
        # Step 4: Restore external knowledge bases
        task_context.trace('Restoring external knowledge bases...', action='restore-external')
        if has_external:
-            result = await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.text('SELECT * FROM external_knowledge_bases;')
-            )
+            result = await self.ap.persistence_mgr.execute_async(sqlalchemy.select(_LEGACY_EXTERNAL_KNOWLEDGE_BASE))
            rows = result.fetchall()
            columns = result.keys()

@@ -266,20 +377,15 @@ class KnowledgeMigrationRouterGroup(group.RouterGroup):
                    retrieval_settings_dict = {k: v for k, v in retriever_config.items() if k not in creation_fields}

                await self.ap.persistence_mgr.execute_async(
-                    sqlalchemy.text(
-                        'INSERT INTO knowledge_bases '
-                        '(uuid, name, description, emoji, created_at, updated_at, '
-                        'knowledge_engine_plugin_id, collection_id, creation_settings, retrieval_settings) '
-                        'VALUES (:uuid, :name, :description, :emoji, :created_at, :updated_at, '
-                        ':plugin_id, :collection_id, :creation_settings, :retrieval_settings);'
-                    ).bindparams(
+                    sqlalchemy.insert(_CURRENT_KNOWLEDGE_BASE).values(
                        uuid=kb_uuid,
+                        workspace_uuid=execution_context.workspace_uuid,
                        name=name,
                        description=description,
                        emoji=emoji,
                        created_at=created_at,
                        updated_at=created_at,
-                        plugin_id=external_plugin_id,
+                        knowledge_engine_plugin_id=external_plugin_id,
                        collection_id=kb_uuid,
                        creation_settings=json.dumps(creation_settings_dict),
                        retrieval_settings=json.dumps(retrieval_settings_dict),
@@ -294,6 +400,7 @@ class KnowledgeMigrationRouterGroup(group.RouterGroup):
                    warnings.append(warning)
                    task_context.trace(warning)
                else:
+                    await self.ap.plugin_connector.require_workspace_context(execution_context)
                    try:
                        await self.ap.plugin_connector.rag_on_kb_create(
                            external_plugin_id, kb_uuid, creation_settings_dict
@@ -307,16 +414,23 @@ class KnowledgeMigrationRouterGroup(group.RouterGroup):
            await self.ap.rag_mgr.load_knowledge_bases_from_db()

        # Step 5: Clear migration flag
-        await self._set_migration_flag('false')
+        await self._set_migration_flag(execution_context, 'false')
        task_context.trace('RAG migration completed.', action='done')

        if warnings:
            task_context.trace(f'Completed with {len(warnings)} warning(s).')

    async def initialize(self) -> None:
-        @self.route('/status', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def _() -> str:
-            needed = await self._get_migration_flag()
+        @self.route(
+            '/status',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
+            execution_context = ExecutionContext.from_request(request_context)
+            execution_context = await self._require_local_migration_context(execution_context)
+            needed = await self._get_migration_flag(execution_context)

            internal_kb_count = 0
            external_kb_count = 0
@@ -324,13 +438,13 @@ class KnowledgeMigrationRouterGroup(group.RouterGroup):
            if needed:
                if await self._table_exists('knowledge_bases_backup'):
                    result = await self.ap.persistence_mgr.execute_async(
-                        sqlalchemy.text('SELECT COUNT(*) FROM knowledge_bases_backup;')
+                        sqlalchemy.select(sqlalchemy.func.count()).select_from(_LEGACY_KNOWLEDGE_BASE_BACKUP)
                    )
                    internal_kb_count = result.scalar() or 0

                if await self._table_exists('external_knowledge_bases'):
                    result = await self.ap.persistence_mgr.execute_async(
-                        sqlalchemy.text('SELECT COUNT(*) FROM external_knowledge_bases;')
+                        sqlalchemy.select(sqlalchemy.func.count()).select_from(_LEGACY_EXTERNAL_KNOWLEDGE_BASE)
                    )
                    external_kb_count = result.scalar() or 0

@@ -342,9 +456,16 @@ class KnowledgeMigrationRouterGroup(group.RouterGroup):
                }
            )

-        @self.route('/execute', methods=['POST'], auth_type=group.AuthType.USER_TOKEN)
-        async def _() -> str:
-            needed = await self._get_migration_flag()
+        @self.route(
+            '/execute',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(request_context: RequestContext) -> str:
+            execution_context = ExecutionContext.from_request(request_context)
+            execution_context = await self._require_local_migration_context(execution_context)
+            needed = await self._get_migration_flag(execution_context)
            if not needed:
                return self.http_status(400, -1, 'RAG migration is not needed')

@@ -353,20 +474,34 @@ class KnowledgeMigrationRouterGroup(group.RouterGroup):

            ctx = taskmgr.TaskContext.new()
            wrapper = self.ap.task_mgr.create_user_task(
-                self._execute_rag_migration(task_context=ctx, install_plugin=install_plugin),
+                self._execute_rag_migration(
+                    execution_context,
+                    task_context=ctx,
+                    install_plugin=install_plugin,
+                ),
                kind='rag-migration',
                name='rag-migration-execute',
                label='Migrating knowledge bases to plugin architecture',
                context=ctx,
+                instance_uuid=execution_context.instance_uuid,
+                workspace_uuid=execution_context.workspace_uuid,
+                placement_generation=execution_context.placement_generation,
            )

            return self.success(data={'task_id': wrapper.id})

-        @self.route('/dismiss', methods=['POST'], auth_type=group.AuthType.USER_TOKEN)
-        async def _() -> str:
-            needed = await self._get_migration_flag()
+        @self.route(
+            '/dismiss',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(request_context: RequestContext) -> str:
+            execution_context = ExecutionContext.from_request(request_context)
+            execution_context = await self._require_local_migration_context(execution_context)
+            needed = await self._get_migration_flag(execution_context)
            if not needed:
                return self.http_status(400, -1, 'RAG migration is not needed')

-            await self._set_migration_flag('false')
+            await self._set_migration_flag(execution_context, 'false')
            return self.success()
@@ -1,16 +1,24 @@
 import quart
+
+from ....authz import Permission
+from ....context import RequestContext
 from ... import group


@group.group_class('parsers', '/api/v1/knowledge/parsers')
 class ParsersRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('', methods=['GET'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def list_parsers() -> quart.Response:
+        @self.route(
+            '',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def list_parsers(request_context: RequestContext) -> quart.Response:
            """List all available parsers from plugins.

            Optional query parameter `mime_type` to filter parsers by supported MIME type.
            """
            mime_type = quart.request.args.get('mime_type')
-            parsers = await self.ap.knowledge_service.list_parsers(mime_type)
+            parsers = await self.ap.knowledge_service.list_parsers(request_context, mime_type)
            return self.success(data={'parsers': parsers})
@@ -3,14 +3,23 @@ from __future__ import annotations

 import quart

+from ...authz import Permission
+from ...context import RequestContext
 from .. import group


@group.group_class('logs', '/api/v1/logs')
 class LogsRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def _() -> str:
+        @self.route('', methods=['GET'], permission=Permission.AUDIT_VIEW)
+        async def _(request_context: RequestContext) -> str:
+            # The process log is instance-global.  It is safe to expose only in
+            # the OSS singleton Workspace; SaaS must use Workspace-scoped
+            # observability records instead of leaking another tenant's lines.
+            await self.ap.workspace_service.get_local_execution_binding(
+                request_context.workspace_uuid,
+                expected_generation=request_context.placement_generation,
+            )
            start_page_number = int(quart.request.args.get('start_page_number', 0))
            start_offset = int(quart.request.args.get('start_offset', 0))

@@ -3,6 +3,8 @@ from __future__ import annotations
 import datetime
 import quart

+from ...authz import Permission
+from ...context import RequestContext
 from .. import group


@@ -24,8 +26,8 @@ def parse_iso_datetime(datetime_str: str | None) -> datetime.datetime | None:
@group.group_class('monitoring', '/api/v1/monitoring')
 class MonitoringRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('/overview', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def get_overview() -> str:
+        @self.route('/overview', methods=['GET'], permission=Permission.RESOURCE_VIEW)
+        async def get_overview(request_context: RequestContext) -> str:
            """Get overview metrics"""
            # Parse query parameters
            bot_ids = quart.request.args.getlist('botId')
@@ -38,6 +40,7 @@ class MonitoringRouterGroup(group.RouterGroup):
            end_time = parse_iso_datetime(end_time_str)

            metrics = await self.ap.monitoring_service.get_overview_metrics(
+                request_context,
                bot_ids=bot_ids if bot_ids else None,
                pipeline_ids=pipeline_ids if pipeline_ids else None,
                start_time=start_time,
@@ -46,8 +49,8 @@ class MonitoringRouterGroup(group.RouterGroup):

            return self.success(data=metrics)

-        @self.route('/token-statistics', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def get_token_statistics() -> str:
+        @self.route('/token-statistics', methods=['GET'], permission=Permission.RESOURCE_VIEW)
+        async def get_token_statistics(request_context: RequestContext) -> str:
            """Get detailed token usage statistics (summary, per-model, timeseries)."""
            bot_ids = quart.request.args.getlist('botId')
            pipeline_ids = quart.request.args.getlist('pipelineId')
@@ -61,6 +64,7 @@ class MonitoringRouterGroup(group.RouterGroup):
            end_time = parse_iso_datetime(end_time_str)

            stats = await self.ap.monitoring_service.get_token_statistics(
+                request_context,
                bot_ids=bot_ids if bot_ids else None,
                pipeline_ids=pipeline_ids if pipeline_ids else None,
                start_time=start_time,
@@ -70,8 +74,8 @@ class MonitoringRouterGroup(group.RouterGroup):

            return self.success(data=stats)

-        @self.route('/messages', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def get_messages() -> str:
+        @self.route('/messages', methods=['GET'], permission=Permission.RESOURCE_VIEW)
+        async def get_messages(request_context: RequestContext) -> str:
            """Get message logs"""
            # Parse query parameters
            bot_ids = quart.request.args.getlist('botId')
@@ -87,6 +91,7 @@ class MonitoringRouterGroup(group.RouterGroup):
            end_time = parse_iso_datetime(end_time_str)

            messages, total = await self.ap.monitoring_service.get_messages(
+                request_context,
                bot_ids=bot_ids if bot_ids else None,
                pipeline_ids=pipeline_ids if pipeline_ids else None,
                session_ids=session_ids if session_ids else None,
@@ -105,8 +110,8 @@ class MonitoringRouterGroup(group.RouterGroup):
                }
            )

-        @self.route('/llm-calls', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def get_llm_calls() -> str:
+        @self.route('/llm-calls', methods=['GET'], permission=Permission.RESOURCE_VIEW)
+        async def get_llm_calls(request_context: RequestContext) -> str:
            """Get LLM call records"""
            # Parse query parameters
            bot_ids = quart.request.args.getlist('botId')
@@ -121,6 +126,7 @@ class MonitoringRouterGroup(group.RouterGroup):
            end_time = parse_iso_datetime(end_time_str)

            llm_calls, total = await self.ap.monitoring_service.get_llm_calls(
+                request_context,
                bot_ids=bot_ids if bot_ids else None,
                pipeline_ids=pipeline_ids if pipeline_ids else None,
                start_time=start_time,
@@ -138,8 +144,8 @@ class MonitoringRouterGroup(group.RouterGroup):
                }
            )

-        @self.route('/tool-calls', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def get_tool_calls() -> str:
+        @self.route('/tool-calls', methods=['GET'], permission=Permission.RESOURCE_VIEW)
+        async def get_tool_calls(request_context: RequestContext) -> str:
            """Get tool call records"""
            bot_ids = quart.request.args.getlist('botId')
            pipeline_ids = quart.request.args.getlist('pipelineId')
@@ -153,6 +159,7 @@ class MonitoringRouterGroup(group.RouterGroup):
            end_time = parse_iso_datetime(end_time_str)

            tool_calls, total = await self.ap.monitoring_service.get_tool_calls(
+                request_context,
                bot_ids=bot_ids if bot_ids else None,
                pipeline_ids=pipeline_ids if pipeline_ids else None,
                session_ids=session_ids if session_ids else None,
@@ -171,8 +178,8 @@ class MonitoringRouterGroup(group.RouterGroup):
                }
            )

-        @self.route('/embedding-calls', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def get_embedding_calls() -> str:
+        @self.route('/embedding-calls', methods=['GET'], permission=Permission.RESOURCE_VIEW)
+        async def get_embedding_calls(request_context: RequestContext) -> str:
            """Get embedding call records"""
            # Parse query parameters
            start_time_str = quart.request.args.get('startTime')
@@ -186,6 +193,7 @@ class MonitoringRouterGroup(group.RouterGroup):
            end_time = parse_iso_datetime(end_time_str)

            embedding_calls, total = await self.ap.monitoring_service.get_embedding_calls(
+                request_context,
                start_time=start_time,
                end_time=end_time,
                knowledge_base_id=knowledge_base_id if knowledge_base_id else None,
@@ -202,8 +210,8 @@ class MonitoringRouterGroup(group.RouterGroup):
                }
            )

-        @self.route('/sessions', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def get_sessions() -> str:
+        @self.route('/sessions', methods=['GET'], permission=Permission.RESOURCE_VIEW)
+        async def get_sessions(request_context: RequestContext) -> str:
            """Get session information"""
            # Parse query parameters
            bot_ids = quart.request.args.getlist('botId')
@@ -224,6 +232,7 @@ class MonitoringRouterGroup(group.RouterGroup):
                is_active = is_active_str.lower() == 'true'

            sessions, total = await self.ap.monitoring_service.get_sessions(
+                request_context,
                bot_ids=bot_ids if bot_ids else None,
                pipeline_ids=pipeline_ids if pipeline_ids else None,
                start_time=start_time,
@@ -242,8 +251,8 @@ class MonitoringRouterGroup(group.RouterGroup):
                }
            )

-        @self.route('/errors', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def get_errors() -> str:
+        @self.route('/errors', methods=['GET'], permission=Permission.RESOURCE_VIEW)
+        async def get_errors(request_context: RequestContext) -> str:
            """Get error logs"""
            # Parse query parameters
            bot_ids = quart.request.args.getlist('botId')
@@ -258,6 +267,7 @@ class MonitoringRouterGroup(group.RouterGroup):
            end_time = parse_iso_datetime(end_time_str)

            errors, total = await self.ap.monitoring_service.get_errors(
+                request_context,
                bot_ids=bot_ids if bot_ids else None,
                pipeline_ids=pipeline_ids if pipeline_ids else None,
                start_time=start_time,
@@ -275,8 +285,8 @@ class MonitoringRouterGroup(group.RouterGroup):
                }
            )

-        @self.route('/data', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def get_all_data() -> str:
+        @self.route('/data', methods=['GET'], permission=Permission.RESOURCE_VIEW)
+        async def get_all_data(request_context: RequestContext) -> str:
            """Get all monitoring data in a single request"""
            # Parse query parameters
            bot_ids = quart.request.args.getlist('botId')
@@ -291,6 +301,7 @@ class MonitoringRouterGroup(group.RouterGroup):

            # Get overview metrics
            overview = await self.ap.monitoring_service.get_overview_metrics(
+                request_context,
                bot_ids=bot_ids if bot_ids else None,
                pipeline_ids=pipeline_ids if pipeline_ids else None,
                start_time=start_time,
@@ -299,6 +310,7 @@ class MonitoringRouterGroup(group.RouterGroup):

            # Get messages
            messages, messages_total = await self.ap.monitoring_service.get_messages(
+                request_context,
                bot_ids=bot_ids if bot_ids else None,
                pipeline_ids=pipeline_ids if pipeline_ids else None,
                start_time=start_time,
@@ -309,6 +321,7 @@ class MonitoringRouterGroup(group.RouterGroup):

            # Get LLM calls
            llm_calls, llm_calls_total = await self.ap.monitoring_service.get_llm_calls(
+                request_context,
                bot_ids=bot_ids if bot_ids else None,
                pipeline_ids=pipeline_ids if pipeline_ids else None,
                start_time=start_time,
@@ -319,6 +332,7 @@ class MonitoringRouterGroup(group.RouterGroup):

            # Get tool calls
            tool_calls, tool_calls_total = await self.ap.monitoring_service.get_tool_calls(
+                request_context,
                bot_ids=bot_ids if bot_ids else None,
                pipeline_ids=pipeline_ids if pipeline_ids else None,
                start_time=start_time,
@@ -329,6 +343,7 @@ class MonitoringRouterGroup(group.RouterGroup):

            # Get sessions
            sessions, sessions_total = await self.ap.monitoring_service.get_sessions(
+                request_context,
                bot_ids=bot_ids if bot_ids else None,
                pipeline_ids=pipeline_ids if pipeline_ids else None,
                start_time=start_time,
@@ -340,6 +355,7 @@ class MonitoringRouterGroup(group.RouterGroup):

            # Get errors
            errors, errors_total = await self.ap.monitoring_service.get_errors(
+                request_context,
                bot_ids=bot_ids if bot_ids else None,
                pipeline_ids=pipeline_ids if pipeline_ids else None,
                start_time=start_time,
@@ -350,6 +366,7 @@ class MonitoringRouterGroup(group.RouterGroup):

            # Get embedding calls
            embedding_calls, embedding_calls_total = await self.ap.monitoring_service.get_embedding_calls(
+                request_context,
                start_time=start_time,
                end_time=end_time,
                limit=limit,
@@ -376,27 +393,27 @@ class MonitoringRouterGroup(group.RouterGroup):
                }
            )

-        @self.route('/sessions/<session_id>/analysis', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def get_session_analysis(session_id: str) -> str:
+        @self.route('/sessions/<session_id>/analysis', methods=['GET'], permission=Permission.RESOURCE_VIEW)
+        async def get_session_analysis(session_id: str, request_context: RequestContext) -> str:
            """Get detailed analysis for a specific session"""
-            analysis = await self.ap.monitoring_service.get_session_analysis(session_id)
+            analysis = await self.ap.monitoring_service.get_session_analysis(request_context, session_id)

            # Always return success with the analysis data
            # The frontend will handle the 'found: false' case
            return self.success(data=analysis)

-        @self.route('/messages/<message_id>/details', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def get_message_details(message_id: str) -> str:
+        @self.route('/messages/<message_id>/details', methods=['GET'], permission=Permission.RESOURCE_VIEW)
+        async def get_message_details(message_id: str, request_context: RequestContext) -> str:
            """Get detailed information for a specific message"""
-            details = await self.ap.monitoring_service.get_message_details(message_id)
+            details = await self.ap.monitoring_service.get_message_details(request_context, message_id)

            if not details.get('found'):
                return self.error(message=f'Message {message_id} not found', code=404)

            return self.success(data=details)

-        @self.route('/export', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def export_data() -> tuple[str, int]:
+        @self.route('/export', methods=['GET'], permission=Permission.DATA_EXPORT)
+        async def export_data(request_context: RequestContext) -> tuple[str, int]:
            """Export monitoring data as CSV"""
            # Parse query parameters
            export_type = quart.request.args.get('type', 'messages')
@@ -413,6 +430,7 @@ class MonitoringRouterGroup(group.RouterGroup):
            # Get data based on export type
            if export_type == 'messages':
                data = await self.ap.monitoring_service.export_messages(
+                    request_context,
                    bot_ids=bot_ids if bot_ids else None,
                    pipeline_ids=pipeline_ids if pipeline_ids else None,
                    start_time=start_time,
@@ -437,6 +455,7 @@ class MonitoringRouterGroup(group.RouterGroup):
                ]
            elif export_type == 'llm-calls':
                data = await self.ap.monitoring_service.export_llm_calls(
+                    request_context,
                    bot_ids=bot_ids if bot_ids else None,
                    pipeline_ids=pipeline_ids if pipeline_ids else None,
                    start_time=start_time,
@@ -463,6 +482,7 @@ class MonitoringRouterGroup(group.RouterGroup):
                ]
            elif export_type == 'embedding-calls':
                data = await self.ap.monitoring_service.export_embedding_calls(
+                    request_context,
                    start_time=start_time,
                    end_time=end_time,
                    limit=limit,
@@ -485,6 +505,7 @@ class MonitoringRouterGroup(group.RouterGroup):
                ]
            elif export_type == 'errors':
                data = await self.ap.monitoring_service.export_errors(
+                    request_context,
                    bot_ids=bot_ids if bot_ids else None,
                    pipeline_ids=pipeline_ids if pipeline_ids else None,
                    start_time=start_time,
@@ -506,6 +527,7 @@ class MonitoringRouterGroup(group.RouterGroup):
                ]
            elif export_type == 'sessions':
                data = await self.ap.monitoring_service.export_sessions(
+                    request_context,
                    bot_ids=bot_ids if bot_ids else None,
                    pipeline_ids=pipeline_ids if pipeline_ids else None,
                    start_time=start_time,
@@ -527,6 +549,7 @@ class MonitoringRouterGroup(group.RouterGroup):
                ]
            elif export_type == 'feedback':
                data = await self.ap.monitoring_service.export_feedback(
+                    request_context,
                    bot_ids=bot_ids if bot_ids else None,
                    pipeline_ids=pipeline_ids if pipeline_ids else None,
                    start_time=start_time,
@@ -581,8 +604,8 @@ class MonitoringRouterGroup(group.RouterGroup):

            return response, 200

-        @self.route('/feedback/stats', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def get_feedback_stats() -> str:
+        @self.route('/feedback/stats', methods=['GET'], permission=Permission.RESOURCE_VIEW)
+        async def get_feedback_stats(request_context: RequestContext) -> str:
            """Get feedback statistics"""
            # Parse query parameters
            bot_ids = quart.request.args.getlist('botId')
@@ -595,6 +618,7 @@ class MonitoringRouterGroup(group.RouterGroup):
            end_time = parse_iso_datetime(end_time_str)

            stats = await self.ap.monitoring_service.get_feedback_stats(
+                request_context,
                bot_ids=bot_ids if bot_ids else None,
                pipeline_ids=pipeline_ids if pipeline_ids else None,
                start_time=start_time,
@@ -603,8 +627,8 @@ class MonitoringRouterGroup(group.RouterGroup):

            return self.success(data=stats)

-        @self.route('/feedback', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def get_feedback() -> str:
+        @self.route('/feedback', methods=['GET'], permission=Permission.RESOURCE_VIEW)
+        async def get_feedback(request_context: RequestContext) -> str:
            """Get feedback list"""
            # Parse query parameters
            bot_ids = quart.request.args.getlist('botId')
@@ -623,6 +647,7 @@ class MonitoringRouterGroup(group.RouterGroup):
            feedback_type = int(feedback_type_str) if feedback_type_str else None

            feedback_list, total = await self.ap.monitoring_service.get_feedback_list(
+                request_context,
                bot_ids=bot_ids if bot_ids else None,
                pipeline_ids=pipeline_ids if pipeline_ids else None,
                feedback_type=feedback_type,
@@ -20,10 +20,12 @@ import httpx
 import quart

 from ... import group
-from ......utils import paths
-from ......platform.sources.websocket_manager import ws_connection_manager
+from ......utils import httpclient, paths
+from ......platform.sources.websocket_manager import WebSocketScope, is_valid_session_id, ws_connection_manager
+from .websocket_chat import create_scoped_duplex_tasks, wait_for_duplex_tasks

 logger = logging.getLogger(__name__)
+_AUTH_TIMEOUT_SECONDS = 10.0

 # Cache the widget template content
 _widget_template_cache: str | None = None
@@ -58,37 +60,31 @@ def _get_logo_bytes() -> bytes:
 class EmbedRouterGroup(group.RouterGroup):
    # -- helpers -------------------------------------------------------------

-    def _resolve_bot(self, bot_uuid: str):
+    async def _resolve_bot(self, bot_uuid: str):
        """Resolve *bot_uuid* to ``(runtime_bot, pipeline_uuid)``.

        Returns ``(None, None)`` when the bot does not exist, is not a
        ``web_page_bot``, is disabled, or has no pipeline bound.
        """
-        for bot in self.ap.platform_mgr.bots:
-            if (
-                bot.bot_entity.uuid == bot_uuid
-                and bot.bot_entity.adapter == 'web_page_bot'
-                and bot.bot_entity.enable
-                and bot.bot_entity.use_pipeline_uuid
-            ):
-                return bot, bot.bot_entity.use_pipeline_uuid
+        bot = await self.ap.platform_mgr.resolve_public_bot(bot_uuid)
+        if (
+            bot is not None
+            and bot.bot_entity.adapter == 'web_page_bot'
+            and bot.bot_entity.enable
+            and bot.bot_entity.use_pipeline_uuid
+        ):
+            return bot, bot.bot_entity.use_pipeline_uuid
        return None, None

-    def _get_bot_config(self, bot_uuid: str) -> dict:
-        for bot in self.ap.platform_mgr.bots:
-            if bot.bot_entity.uuid == bot_uuid and bot.bot_entity.adapter == 'web_page_bot':
-                return bot.bot_entity.adapter_config
-        return {}
+    @staticmethod
+    def _get_bot_config(runtime_bot) -> dict:
+        return runtime_bot.bot_entity.adapter_config

-    async def _verify_session_token(self, request, bot_uuid: str) -> bool:
-        config = self._get_bot_config(bot_uuid)
+    def _verify_session_token_value(self, token: str, runtime_bot) -> bool:
+        config = self._get_bot_config(runtime_bot)
        secret = config.get('turnstile_secret_key', '')
        if not secret:
            return True
-        auth_header = request.headers.get('Authorization', '')
-        if not auth_header.startswith('Bearer '):
-            return False
-        token = auth_header[7:]
        try:
            ts_str, mac = token.split('.', 1)
            ts = float(ts_str)
@@ -99,6 +95,50 @@ class EmbedRouterGroup(group.RouterGroup):
        except Exception:
            return False

+    async def _verify_session_token(self, request, runtime_bot) -> bool:
+        auth_header = request.headers.get('Authorization', '')
+        token = auth_header[7:] if auth_header.startswith('Bearer ') else ''
+        return self._verify_session_token_value(token, runtime_bot)
+
+    async def _authenticate_websocket(self, runtime_bot) -> None:
+        """Require the embed session token as the first WebSocket frame."""
+
+        raw_message = await asyncio.wait_for(quart.websocket.receive(), timeout=_AUTH_TIMEOUT_SECONDS)
+        payload = await asyncio.to_thread(json.loads, raw_message)
+        if not isinstance(payload, dict) or payload.get('type') != 'authenticate':
+            raise ValueError('Authentication is required')
+        token = str(payload.get('token') or '')
+        if not self._verify_session_token_value(token, runtime_bot):
+            raise ValueError('Authentication is required')
+
+    async def _assert_execution_active(self, runtime_bot) -> None:
+        context = runtime_bot.execution_context
+        await self.ap.workspace_service.get_execution_binding(
+            context.workspace_uuid,
+            expected_generation=context.placement_generation,
+        )
+
+    async def _resolve_connected_bot(self, owner_bot, pipeline_uuid: str):
+        """Re-resolve mutable bot state before every public message."""
+        current_bot, current_pipeline_uuid = await self._resolve_bot(owner_bot.bot_entity.uuid)
+        if current_bot is None or current_pipeline_uuid != pipeline_uuid:
+            raise RuntimeError('Bot is unavailable')
+
+        owner_context = owner_bot.execution_context
+        current_context = current_bot.execution_context
+        if (
+            current_context.instance_uuid,
+            current_context.workspace_uuid,
+            current_context.placement_generation,
+        ) != (
+            owner_context.instance_uuid,
+            owner_context.workspace_uuid,
+            owner_context.placement_generation,
+        ):
+            raise RuntimeError('Bot is unavailable')
+        await self._assert_execution_active(current_bot)
+        return current_bot
+
    # -- routes --------------------------------------------------------------

    async def initialize(self) -> None:
@@ -106,7 +146,7 @@ class EmbedRouterGroup(group.RouterGroup):
        async def verify_turnstile(bot_uuid: str) -> str:
            if not _is_valid_uuid(bot_uuid):
                return self.http_status(400, -1, 'Invalid bot_uuid format')
-            runtime_bot, pipeline_uuid = self._resolve_bot(bot_uuid)
+            runtime_bot, pipeline_uuid = await self._resolve_bot(bot_uuid)
            if runtime_bot is None:
                return self.http_status(404, -1, 'Bot not found or not available')
            try:
@@ -115,18 +155,18 @@ class EmbedRouterGroup(group.RouterGroup):
                if not token:
                    return self.http_status(400, -1, 'Token is required')

-                config = self._get_bot_config(bot_uuid)
+                config = self._get_bot_config(runtime_bot)
                secret = config.get('turnstile_secret_key', '')
                if not secret:
                    ts = time.time()
                    return self.success(data={'token': f'{ts}.dummy'})

-                async with httpx.AsyncClient() as client:
+                async with httpx.AsyncClient(event_hooks=httpclient.httpx_response_limit_hooks()) as client:
                    resp = await client.post(
                        'https://challenges.cloudflare.com/turnstile/v0/siteverify',
                        data={'secret': secret, 'response': token},
                    )
-                    result = resp.json()
+                    result = await httpclient.parse_json_response(resp)

                if not result.get('success'):
                    return self.http_status(403, -1, 'Turnstile verification failed')
@@ -146,7 +186,7 @@ class EmbedRouterGroup(group.RouterGroup):
            """Serve the embed widget JavaScript with injected configuration."""
            if not _is_valid_uuid(bot_uuid):
                return self.http_status(400, -1, 'Invalid bot_uuid format')
-            runtime_bot, pipeline_uuid = self._resolve_bot(bot_uuid)
+            runtime_bot, pipeline_uuid = await self._resolve_bot(bot_uuid)
            if runtime_bot is None:
                return quart.Response(
                    '// Bot not found or not available', status=404, content_type='application/javascript'
@@ -164,7 +204,7 @@ class EmbedRouterGroup(group.RouterGroup):
            if not re.match(r'^https?://[a-zA-Z0-9._:/-]+$', base_url):
                base_url = quart.request.host_url.rstrip('/')

-            config = self._get_bot_config(bot_uuid)
+            config = self._get_bot_config(runtime_bot)
            site_key = config.get('turnstile_site_key', '')
            locale = config.get('language', 'en_US') or 'en_US'
            bubble_icon = config.get('bubble_icon', 'logo') or 'logo'
@@ -194,20 +234,25 @@ class EmbedRouterGroup(group.RouterGroup):
        async def get_embed_messages(bot_uuid: str, session_type: str) -> str:
            if not _is_valid_uuid(bot_uuid):
                return self.http_status(400, -1, 'Invalid bot_uuid format')
-            runtime_bot, pipeline_uuid = self._resolve_bot(bot_uuid)
+            runtime_bot, pipeline_uuid = await self._resolve_bot(bot_uuid)
            if runtime_bot is None:
                return self.http_status(404, -1, 'Bot not found or not available')
-            if not await self._verify_session_token(quart.request, bot_uuid):
+            if not await self._verify_session_token(quart.request, runtime_bot):
                return self.http_status(403, -1, 'Unauthorized or session expired')
            try:
                if session_type not in ['person', 'group']:
                    return self.http_status(400, -1, 'session_type must be person or group')

-                websocket_adapter = self.ap.platform_mgr.websocket_proxy_bot.adapter
+                session_id = quart.request.args.get('session_id', '')
+                if not is_valid_session_id(session_id):
+                    return self.http_status(400, -1, 'Valid session_id is required')
+
+                proxy_bot = await self.ap.platform_mgr.get_websocket_proxy_bot(runtime_bot.execution_context)
+                websocket_adapter = proxy_bot.adapter
                if not websocket_adapter:
                    return self.http_status(404, -1, 'WebSocket adapter not found')

-                messages = websocket_adapter.get_websocket_messages(pipeline_uuid, session_type)
+                messages = websocket_adapter.get_websocket_messages(pipeline_uuid, session_type, session_id)
                return self.success(data={'messages': messages})

            except Exception as e:
@@ -218,20 +263,25 @@ class EmbedRouterGroup(group.RouterGroup):
        async def reset_embed_session(bot_uuid: str, session_type: str) -> str:
            if not _is_valid_uuid(bot_uuid):
                return self.http_status(400, -1, 'Invalid bot_uuid format')
-            runtime_bot, pipeline_uuid = self._resolve_bot(bot_uuid)
+            runtime_bot, pipeline_uuid = await self._resolve_bot(bot_uuid)
            if runtime_bot is None:
                return self.http_status(404, -1, 'Bot not found or not available')
-            if not await self._verify_session_token(quart.request, bot_uuid):
+            if not await self._verify_session_token(quart.request, runtime_bot):
                return self.http_status(403, -1, 'Unauthorized or session expired')
            try:
                if session_type not in ['person', 'group']:
                    return self.http_status(400, -1, 'session_type must be person or group')

-                websocket_adapter = self.ap.platform_mgr.websocket_proxy_bot.adapter
+                session_id = quart.request.args.get('session_id', '')
+                if not is_valid_session_id(session_id):
+                    return self.http_status(400, -1, 'Valid session_id is required')
+
+                proxy_bot = await self.ap.platform_mgr.get_websocket_proxy_bot(runtime_bot.execution_context)
+                websocket_adapter = proxy_bot.adapter
                if not websocket_adapter:
                    return self.http_status(404, -1, 'WebSocket adapter not found')

-                websocket_adapter.reset_session(pipeline_uuid, session_type)
+                websocket_adapter.reset_session(pipeline_uuid, session_type, session_id)
                return self.success(data={'message': 'Session reset successfully'})

            except Exception as e:
@@ -242,10 +292,10 @@ class EmbedRouterGroup(group.RouterGroup):
        async def submit_feedback(bot_uuid: str) -> str:
            if not _is_valid_uuid(bot_uuid):
                return self.http_status(400, -1, 'Invalid bot_uuid format')
-            runtime_bot, pipeline_uuid = self._resolve_bot(bot_uuid)
+            runtime_bot, pipeline_uuid = await self._resolve_bot(bot_uuid)
            if runtime_bot is None:
                return self.http_status(404, -1, 'Bot not found or not available')
-            if not await self._verify_session_token(quart.request, bot_uuid):
+            if not await self._verify_session_token(quart.request, runtime_bot):
                return self.http_status(403, -1, 'Unauthorized or session expired')
            try:
                data = await quart.request.get_json()
@@ -258,6 +308,7 @@ class EmbedRouterGroup(group.RouterGroup):
                feedback_id = f'embed_{uuid.uuid4().hex[:12]}'

                await self.ap.monitoring_service.record_feedback(
+                    runtime_bot.execution_context,
                    feedback_id=feedback_id,
                    feedback_type=feedback_type,
                    bot_id=runtime_bot.bot_entity.uuid,
@@ -278,11 +329,12 @@ class EmbedRouterGroup(group.RouterGroup):
        @self.quart_app.websocket(self.path + '/<bot_uuid>/ws/connect')
        async def embed_websocket_connect(bot_uuid: str):
            """WebSocket connection for embed widget, keyed by bot_uuid."""
+            await quart.websocket.accept()
            if not _is_valid_uuid(bot_uuid):
                await quart.websocket.send(json.dumps({'type': 'error', 'message': 'Invalid bot_uuid format'}))
                return

-            runtime_bot, pipeline_uuid = self._resolve_bot(bot_uuid)
+            runtime_bot, pipeline_uuid = await self._resolve_bot(bot_uuid)
            if runtime_bot is None:
                await quart.websocket.send(json.dumps({'type': 'error', 'message': 'Bot not found or not available'}))
                return
@@ -294,17 +346,47 @@ class EmbedRouterGroup(group.RouterGroup):
                )
                return

-            websocket_adapter = self.ap.platform_mgr.websocket_proxy_bot.adapter
-            if not websocket_adapter:
-                await quart.websocket.send(json.dumps({'type': 'error', 'message': 'WebSocket adapter not found'}))
+            session_id = quart.websocket.args.get('session_id', '')
+            if not is_valid_session_id(session_id):
+                await quart.websocket.send(json.dumps({'type': 'error', 'message': 'Valid session_id is required'}))
                return

            try:
+                await self._authenticate_websocket(runtime_bot)
+                await self._assert_execution_active(runtime_bot)
+            except Exception:
+                await quart.websocket.send(json.dumps({'type': 'error', 'message': 'Unauthorized'}))
+                return
+
+            try:
+                proxy_bot = await self.ap.platform_mgr.get_websocket_proxy_bot(runtime_bot.execution_context)
+                websocket_adapter = proxy_bot.adapter
+                if not websocket_adapter:
+                    await quart.websocket.send(json.dumps({'type': 'error', 'message': 'WebSocket adapter not found'}))
+                    return
+
                connection = await ws_connection_manager.add_connection(
                    websocket=quart.websocket._get_current_object(),
+                    scope=WebSocketScope.from_context(runtime_bot.execution_context),
                    pipeline_uuid=pipeline_uuid,
                    session_type=session_type,
+                    session_id=session_id,
                    metadata={'user_agent': quart.websocket.headers.get('User-Agent', '')},
+                    send_queue_size=(
+                        self.ap.instance_config.data.get('system', {})
+                        .get('websocket_retention', {})
+                        .get('send_queue_size', 100)
+                    ),
+                    max_connections=(
+                        self.ap.instance_config.data.get('system', {})
+                        .get('websocket_retention', {})
+                        .get('max_connections', 1024)
+                    ),
+                    max_connections_per_workspace=(
+                        self.ap.instance_config.data.get('system', {})
+                        .get('websocket_retention', {})
+                        .get('max_connections_per_workspace', 32)
+                    ),
                )

                await quart.websocket.send(
@@ -324,11 +406,19 @@ class EmbedRouterGroup(group.RouterGroup):
                    f'(bot={bot_uuid}, pipeline={pipeline_uuid}, session_type={session_type})'
                )

-                receive_task = asyncio.create_task(self._handle_receive(connection, websocket_adapter, runtime_bot))
-                send_task = asyncio.create_task(self._handle_send(connection))
+                receive_task, send_task = create_scoped_duplex_tasks(
+                    self._handle_receive(
+                        connection,
+                        websocket_adapter,
+                        runtime_bot,
+                        pipeline_uuid,
+                    ),
+                    self._handle_send(connection),
+                    runtime_bot.execution_context.workspace_uuid,
+                )

                try:
-                    await asyncio.gather(receive_task, send_task)
+                    await wait_for_duplex_tasks(receive_task, send_task)
                except Exception as e:
                    logger.error(f'Embed WebSocket task error: {e}')
                finally:
@@ -343,14 +433,14 @@ class EmbedRouterGroup(group.RouterGroup):

    # -- WebSocket receive/send helpers --------------------------------------

-    async def _handle_receive(self, connection, websocket_adapter, owner_bot):
+    async def _handle_receive(self, connection, websocket_adapter, owner_bot, pipeline_uuid: str):
        try:
            while connection.is_active:
                message = await quart.websocket.receive()
                await ws_connection_manager.update_activity(connection.connection_id)

                try:
-                    data = json.loads(message)
+                    data = await asyncio.to_thread(json.loads, message)
                    message_type = data.get('type', 'message')

                    if message_type == 'ping':
@@ -358,7 +448,12 @@ class EmbedRouterGroup(group.RouterGroup):
                            {'type': 'pong', 'timestamp': datetime.datetime.now().isoformat()}
                        )
                    elif message_type == 'message':
-                        await websocket_adapter.handle_websocket_message(connection, data, owner_bot=owner_bot)
+                        try:
+                            current_bot = await self._resolve_connected_bot(owner_bot, pipeline_uuid)
+                        except Exception:
+                            await connection.send_queue.put({'type': 'error', 'message': 'Bot is unavailable'})
+                            break
+                        await websocket_adapter.handle_websocket_message(connection, data, owner_bot=current_bot)
                    elif message_type == 'disconnect':
                        break

@@ -369,13 +464,20 @@ class EmbedRouterGroup(group.RouterGroup):
            logger.error(f'Embed receive error: {e}', exc_info=True)
        finally:
            connection.is_active = False
+            try:
+                connection.send_queue.put_nowait(None)
+            except asyncio.QueueFull:
+                pass

    async def _handle_send(self, connection):
        try:
-            while connection.is_active:
+            while connection.is_active or not connection.send_queue.empty():
                try:
                    message = await asyncio.wait_for(connection.send_queue.get(), timeout=1.0)
-                    await quart.websocket.send(json.dumps(message))
+                    if message is None:
+                        break
+                    encoded = await asyncio.to_thread(json.dumps, message)
+                    await quart.websocket.send(encoded)
                except asyncio.TimeoutError:
                    continue
        except Exception as e:
@@ -2,120 +2,156 @@ from __future__ import annotations

 import quart

+from ....authz import Permission, has_permission
+from ....context import RequestContext
+from ....service.secrets import redact_secrets
 from ... import group


@group.group_class('pipelines', '/api/v1/pipelines')
 class PipelinesRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('', methods=['GET', 'POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _() -> str:
-            if quart.request.method == 'GET':
-                sort_by = quart.request.args.get('sort_by', 'created_at')
-                sort_order = quart.request.args.get('sort_order', 'DESC')
-                return self.success(
-                    data={'pipelines': await self.ap.pipeline_service.get_pipelines(sort_by, sort_order)}
-                )
-            elif quart.request.method == 'POST':
-                json_data = await quart.request.json
-
-                pipeline_uuid = await self.ap.pipeline_service.create_pipeline(json_data)
-
-                return self.success(data={'uuid': pipeline_uuid})
-
-        @self.route('/_/metadata', methods=['GET'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _() -> str:
-            return self.success(data={'configs': await self.ap.pipeline_service.get_pipeline_metadata()})
+        @self.route(
+            '',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
+            sort_by = quart.request.args.get('sort_by', 'created_at')
+            sort_order = quart.request.args.get('sort_order', 'DESC')
+            include_secret = has_permission(request_context, Permission.RESOURCE_MANAGE)
+            return self.success(
+                data={
+                    'pipelines': await self.ap.pipeline_service.get_pipelines(
+                        request_context,
+                        sort_by,
+                        sort_order,
+                        include_secret=include_secret,
+                    )
+                }
+            )

        @self.route(
-            '/<pipeline_uuid>', methods=['GET', 'PUT', 'DELETE'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY
+            '',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
        )
-        async def _(pipeline_uuid: str) -> str:
-            if quart.request.method == 'GET':
-                pipeline = await self.ap.pipeline_service.get_pipeline(pipeline_uuid)
+        async def _(request_context: RequestContext) -> str:
+            pipeline_uuid = await self.ap.pipeline_service.create_pipeline(request_context, await quart.request.json)
+            return self.success(data={'uuid': pipeline_uuid})

-                if pipeline is None:
-                    return self.http_status(404, -1, 'pipeline not found')
+        @self.route(
+            '/_/metadata',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
+            return self.success(data={'configs': await self.ap.pipeline_service.get_pipeline_metadata(request_context)})

-                return self.success(data={'pipeline': pipeline})
-            elif quart.request.method == 'PUT':
-                json_data = await quart.request.json
+        @self.route(
+            '/<pipeline_uuid>',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(pipeline_uuid: str, request_context: RequestContext) -> str:
+            pipeline = await self.ap.pipeline_service.get_pipeline(
+                request_context,
+                pipeline_uuid,
+                include_secret=has_permission(request_context, Permission.RESOURCE_MANAGE),
+            )
+            if pipeline is None:
+                return self.http_status(404, -1, 'pipeline not found')
+            return self.success(data={'pipeline': pipeline})

-                await self.ap.pipeline_service.update_pipeline(pipeline_uuid, json_data)
+        @self.route(
+            '/<pipeline_uuid>',
+            methods=['PUT', 'DELETE'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(pipeline_uuid: str, request_context: RequestContext) -> str:
+            if quart.request.method == 'PUT':
+                try:
+                    await self.ap.pipeline_service.update_pipeline(
+                        request_context,
+                        pipeline_uuid,
+                        await quart.request.json,
+                    )
+                except ValueError as exc:
+                    return self.http_status(400, -1, str(exc))
+            else:
+                await self.ap.pipeline_service.delete_pipeline(request_context, pipeline_uuid)
+            return self.success()

-                return self.success()
-            elif quart.request.method == 'DELETE':
-                await self.ap.pipeline_service.delete_pipeline(pipeline_uuid)
-
-                return self.success()
-
-        @self.route('/<pipeline_uuid>/copy', methods=['POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _(pipeline_uuid: str) -> str:
+        @self.route(
+            '/<pipeline_uuid>/copy',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(pipeline_uuid: str, request_context: RequestContext) -> str:
            try:
-                new_uuid = await self.ap.pipeline_service.copy_pipeline(pipeline_uuid)
+                new_uuid = await self.ap.pipeline_service.copy_pipeline(request_context, pipeline_uuid)
                return self.success(data={'uuid': new_uuid})
            except ValueError as e:
-                return self.http_status(404, -1, str(e))
+                return self.http_status(400, -1, str(e))

        @self.route(
-            '/<pipeline_uuid>/extensions', methods=['GET', 'PUT'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY
+            '/<pipeline_uuid>/extensions',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
        )
-        async def _(pipeline_uuid: str) -> str:
-            if quart.request.method == 'GET':
-                # Get current extensions and available plugins
-                pipeline = await self.ap.pipeline_service.get_pipeline(pipeline_uuid)
-                if pipeline is None:
-                    return self.http_status(404, -1, 'pipeline not found')
+        async def _(pipeline_uuid: str, request_context: RequestContext) -> str:
+            pipeline = await self.ap.pipeline_service.get_pipeline(request_context, pipeline_uuid)
+            if pipeline is None:
+                return self.http_status(404, -1, 'pipeline not found')

-                # Only include plugins with pipeline-related components (Command, EventListener, Tool)
-                # Plugins that only have KnowledgeEngine components are not suitable for pipeline extensions
-                pipeline_component_kinds = ['Command', 'EventListener', 'Tool']
-                plugins = await self.ap.plugin_connector.list_plugins(component_kinds=pipeline_component_kinds)
-                mcp_servers = await self.ap.mcp_service.get_mcp_servers(contain_runtime_info=True)
+            pipeline_component_kinds = ['Command', 'EventListener', 'Tool']
+            if self.ap.plugin_connector.is_enable_plugin:
+                await self.ap.plugin_connector.require_workspace_context(request_context)
+            plugins = await self.ap.plugin_connector.list_plugins(component_kinds=pipeline_component_kinds)
+            mcp_servers = await self.ap.mcp_service.get_mcp_servers(request_context, contain_runtime_info=True)
+            available_skills = await self.ap.skill_service.list_skills(request_context)
+            extensions_prefs = pipeline.get('extensions_preferences', {})
+            return self.success(
+                data={
+                    'enable_all_plugins': extensions_prefs.get('enable_all_plugins', True),
+                    'enable_all_mcp_servers': extensions_prefs.get('enable_all_mcp_servers', True),
+                    'enable_all_skills': extensions_prefs.get('enable_all_skills', True),
+                    'bound_plugins': extensions_prefs.get('plugins', []),
+                    'available_plugins': redact_secrets(plugins),
+                    'bound_mcp_servers': extensions_prefs.get('mcp_servers', []),
+                    'available_mcp_servers': mcp_servers,
+                    'bound_mcp_resources': extensions_prefs.get('mcp_resources', []),
+                    'mcp_resource_agent_read_enabled': extensions_prefs.get('mcp_resource_agent_read_enabled', True),
+                    'bound_skills': extensions_prefs.get('skills', []),
+                    'available_skills': available_skills,
+                }
+            )

-                # Get available skills
-                available_skills = await self.ap.skill_service.list_skills()
-
-                extensions_prefs = pipeline.get('extensions_preferences', {})
-                return self.success(
-                    data={
-                        'enable_all_plugins': extensions_prefs.get('enable_all_plugins', True),
-                        'enable_all_mcp_servers': extensions_prefs.get('enable_all_mcp_servers', True),
-                        'enable_all_skills': extensions_prefs.get('enable_all_skills', True),
-                        'bound_plugins': extensions_prefs.get('plugins', []),
-                        'available_plugins': plugins,
-                        'bound_mcp_servers': extensions_prefs.get('mcp_servers', []),
-                        'available_mcp_servers': mcp_servers,
-                        'bound_mcp_resources': extensions_prefs.get('mcp_resources', []),
-                        'mcp_resource_agent_read_enabled': extensions_prefs.get(
-                            'mcp_resource_agent_read_enabled', True
-                        ),
-                        'bound_skills': extensions_prefs.get('skills', []),
-                        'available_skills': available_skills,
-                    }
-                )
-            elif quart.request.method == 'PUT':
-                # Update bound plugins and MCP servers for this pipeline
-                json_data = await quart.request.json
-                enable_all_plugins = json_data.get('enable_all_plugins', True)
-                enable_all_mcp_servers = json_data.get('enable_all_mcp_servers', True)
-                enable_all_skills = json_data.get('enable_all_skills', True)
-                bound_plugins = json_data.get('bound_plugins', [])
-                bound_mcp_servers = json_data.get('bound_mcp_servers', [])
-                bound_skills = json_data.get('bound_skills', [])
-                bound_mcp_resources = json_data.get('bound_mcp_resources')
-                mcp_resource_agent_read_enabled = json_data.get('mcp_resource_agent_read_enabled')
-
-                await self.ap.pipeline_service.update_pipeline_extensions(
-                    pipeline_uuid,
-                    bound_plugins,
-                    bound_mcp_servers,
-                    enable_all_plugins,
-                    enable_all_mcp_servers,
-                    bound_skills=bound_skills,
-                    enable_all_skills=enable_all_skills,
-                    bound_mcp_resources=bound_mcp_resources,
-                    mcp_resource_agent_read_enabled=mcp_resource_agent_read_enabled,
-                )
-
-                return self.success()
+        @self.route(
+            '/<pipeline_uuid>/extensions',
+            methods=['PUT'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(pipeline_uuid: str, request_context: RequestContext) -> str:
+            json_data = await quart.request.json
+            await self.ap.pipeline_service.update_pipeline_extensions(
+                request_context,
+                pipeline_uuid,
+                json_data.get('bound_plugins', []),
+                json_data.get('bound_mcp_servers', []),
+                json_data.get('enable_all_plugins', True),
+                json_data.get('enable_all_mcp_servers', True),
+                bound_skills=json_data.get('bound_skills', []),
+                enable_all_skills=json_data.get('enable_all_skills', True),
+                bound_mcp_resources=json_data.get('bound_mcp_resources'),
+                mcp_resource_agent_read_enabled=json_data.get('mcp_resource_agent_read_enabled'),
+            )
+            return self.success()
@@ -1,64 +1,234 @@
-"""WebSocket聊天路由 - 支持双向实时通信"""
+"""Authenticated dashboard WebSocket chat routes."""
+
+from __future__ import annotations

 import asyncio
 import datetime
 import json
 import logging
+import typing
+import uuid

 import quart

+from ....authz import Permission, permissions_for_role, require_permission
+from ....context import PrincipalContext, PrincipalType, RequestContext, WorkspaceContext
 from ... import group
-from ......platform.sources.websocket_manager import ws_connection_manager
+from ......core.task_boundary import run_in_workspace_uow
+from ......platform.sources.websocket_manager import WebSocketScope, ws_connection_manager
+from ......utils import bounded_executor

 logger = logging.getLogger(__name__)
+_AUTH_TIMEOUT_SECONDS = 10.0
+_DUPLEX_DRAIN_TIMEOUT_SECONDS = 0.25
+
+
+def create_scoped_duplex_tasks(
+    receive_coro: typing.Coroutine[typing.Any, typing.Any, None],
+    send_coro: typing.Coroutine[typing.Any, typing.Any, None],
+    workspace_uuid: str,
+) -> tuple[asyncio.Task[None], asyncio.Task[None]]:
+    """Create both socket directions under one trusted Workspace budget."""
+
+    return (
+        asyncio.create_task(
+            bounded_executor.run_in_blocking_work_scope(
+                receive_coro,
+                workspace_uuid,
+            )
+        ),
+        asyncio.create_task(
+            bounded_executor.run_in_blocking_work_scope(
+                send_coro,
+                workspace_uuid,
+            )
+        ),
+    )
+
+
+async def wait_for_duplex_tasks(
+    receive_task: asyncio.Task,
+    send_task: asyncio.Task,
+) -> None:
+    """Stop the peer direction as soon as either socket task terminates."""
+
+    try:
+        done, _ = await asyncio.wait(
+            {receive_task, send_task},
+            return_when=asyncio.FIRST_COMPLETED,
+        )
+        # A receive task may enqueue a terminal authorization/error frame and
+        # then finish. Give the sender a short deterministic drain window
+        # instead of cancelling it before that frame reaches the client.
+        if receive_task in done and not send_task.done():
+            await asyncio.wait(
+                {send_task},
+                timeout=_DUPLEX_DRAIN_TIMEOUT_SECONDS,
+            )
+    finally:
+        for task in (receive_task, send_task):
+            if not task.done():
+                task.cancel()
+        await asyncio.gather(
+            receive_task,
+            send_task,
+            return_exceptions=True,
+        )


@group.group_class('websocket_chat', '/api/v1/pipelines/<pipeline_uuid>/ws')
 class WebSocketChatRouterGroup(group.RouterGroup):
+    async def _authenticate_websocket(self) -> tuple[RequestContext, str]:
+        """Authenticate the first dashboard WebSocket message.
+
+        Browsers cannot attach the normal Authorization/X-Workspace-Id headers
+        to a WebSocket handshake.  The client therefore sends one auth frame
+        immediately after opening the socket; no connection is registered and
+        no runtime object is resolved before this method succeeds.
+        """
+
+        raw_message = await asyncio.wait_for(quart.websocket.receive(), timeout=_AUTH_TIMEOUT_SECONDS)
+        payload = await asyncio.to_thread(json.loads, raw_message)
+        if not isinstance(payload, dict) or payload.get('type') != 'authenticate':
+            raise ValueError('Authentication is required')
+
+        token = str(payload.get('token') or '').strip()
+        workspace_uuid = str(payload.get('workspace_uuid') or '').strip()
+        if not token or not workspace_uuid:
+            raise ValueError('Authentication is required')
+
+        account, _ = await self._authenticate_account(token)
+        account_uuid = getattr(account, 'uuid', None)
+        collaboration_service = getattr(self.ap, 'workspace_collaboration_service', None)
+        if not isinstance(account_uuid, str) or collaboration_service is None:
+            raise ValueError('Workspace authentication is unavailable')
+
+        access = await collaboration_service.resolve_account_workspace(account_uuid, workspace_uuid)
+        request_context = RequestContext(
+            instance_uuid=access.execution.instance_uuid,
+            placement_generation=access.execution.placement_generation,
+            request_id=quart.websocket.headers.get('X-Request-Id') or str(uuid.uuid4()),
+            auth_type=group.AuthType.USER_TOKEN.value,
+            principal=PrincipalContext(
+                principal_type=PrincipalType.ACCOUNT,
+                account_uuid=account_uuid,
+            ),
+            workspace=WorkspaceContext(
+                workspace_uuid=access.workspace.uuid,
+                membership_uuid=access.membership.uuid,
+                role=access.membership.role,
+                permissions=permissions_for_role(access.membership.role),
+                membership_revision=access.membership.projection_revision,
+            ),
+        )
+        require_permission(request_context, Permission.RUNTIME_OPERATE)
+        return request_context, token
+
+    async def _revalidate_websocket_authorization(
+        self,
+        request_context: RequestContext,
+        token: str,
+    ) -> RequestContext:
+        """Recheck revocable account, membership, permission, and placement state."""
+
+        account, _ = await self._authenticate_account(token)
+        account_uuid = getattr(account, 'uuid', None)
+        if account_uuid != request_context.account_uuid:
+            raise ValueError('WebSocket account changed')
+
+        collaboration_service = getattr(self.ap, 'workspace_collaboration_service', None)
+        if collaboration_service is None or not isinstance(account_uuid, str):
+            raise ValueError('Workspace authentication is unavailable')
+        access = await collaboration_service.resolve_account_workspace(
+            account_uuid,
+            request_context.workspace_uuid,
+        )
+        if (
+            access.workspace.uuid != request_context.workspace_uuid
+            or access.membership.uuid != request_context.workspace.membership_uuid
+            or access.membership.projection_revision != request_context.workspace.membership_revision
+            or access.execution.instance_uuid != request_context.instance_uuid
+            or access.execution.placement_generation != request_context.placement_generation
+        ):
+            raise ValueError('WebSocket authorization changed')
+
+        current_context = RequestContext(
+            instance_uuid=access.execution.instance_uuid,
+            placement_generation=access.execution.placement_generation,
+            request_id=request_context.request_id,
+            auth_type=request_context.auth_type,
+            principal=request_context.principal,
+            workspace=WorkspaceContext(
+                workspace_uuid=access.workspace.uuid,
+                membership_uuid=access.membership.uuid,
+                role=access.membership.role,
+                permissions=permissions_for_role(access.membership.role),
+                membership_revision=access.membership.projection_revision,
+            ),
+            entitlement_revision=request_context.entitlement_revision,
+        )
+        require_permission(current_context, Permission.RUNTIME_OPERATE)
+        return current_context
+
+    async def _get_scoped_adapter(self, request_context: RequestContext, pipeline_uuid: str):
+        pipeline = await run_in_workspace_uow(
+            self.ap,
+            request_context.workspace_uuid,
+            lambda: self.ap.pipeline_service.get_pipeline(request_context, pipeline_uuid),
+        )
+        if pipeline is None:
+            return None
+        proxy_bot = await self.ap.platform_mgr.get_websocket_proxy_bot(request_context)
+        return proxy_bot.adapter
+
    async def initialize(self) -> None:
-        # 直接使用 quart_app 注册 WebSocket 路由
        @self.quart_app.websocket(self.path + '/connect')
        async def websocket_connect(pipeline_uuid: str):
-            """
-            建立WebSocket连接
+            """Open one authenticated dashboard debug connection."""

-            URL参数:
-                - pipeline_uuid: 流水线UUID
-                - session_type: 会话类型 (person/group)
-            """
+            await quart.websocket.accept()
            try:
-                # 获取参数 - 在WebSocket上下文中使用 quart.websocket.args
-                session_type = quart.websocket.args.get('session_type', 'person')
+                request_context, token = await self._authenticate_websocket()
+            except Exception:
+                await quart.websocket.send(json.dumps({'type': 'error', 'message': 'Unauthorized'}))
+                return

-                if session_type not in ['person', 'group']:
-                    await quart.websocket.send(
-                        json.dumps({'type': 'error', 'message': 'session_type must be person or group'})
-                    )
+            session_type = quart.websocket.args.get('session_type', 'person')
+            if session_type not in ['person', 'group']:
+                await quart.websocket.send(
+                    json.dumps({'type': 'error', 'message': 'session_type must be person or group'})
+                )
+                return
+
+            try:
+                websocket_adapter = await self._get_scoped_adapter(request_context, pipeline_uuid)
+                if websocket_adapter is None:
+                    await quart.websocket.send(json.dumps({'type': 'error', 'message': 'Pipeline not found'}))
                    return

-                # 获取WebSocket适配器
-                websocket_adapter = self.ap.platform_mgr.websocket_proxy_bot.adapter
-
-                if not websocket_adapter:
-                    await quart.websocket.send(json.dumps({'type': 'error', 'message': 'WebSocket adapter not found'}))
-                    return
-
-                # Dashboard pipeline-debug sessions must always run under the
-                # built-in websocket_proxy_bot identity. We deliberately do NOT
-                # resolve a web_page_bot owner here — even if one is bound to
-                # the same pipeline, debug requests must not be attributed to
-                # it. The embed widget path (`/api/v1/embed/<bot>/ws/connect`)
-                # is the one that carries the page-bot identity.
-
-                # 注册连接
                connection = await ws_connection_manager.add_connection(
                    websocket=quart.websocket._get_current_object(),
+                    scope=WebSocketScope.from_context(request_context),
                    pipeline_uuid=pipeline_uuid,
                    session_type=session_type,
                    metadata={'user_agent': quart.websocket.headers.get('User-Agent', '')},
+                    send_queue_size=(
+                        self.ap.instance_config.data.get('system', {})
+                        .get('websocket_retention', {})
+                        .get('send_queue_size', 100)
+                    ),
+                    max_connections=(
+                        self.ap.instance_config.data.get('system', {})
+                        .get('websocket_retention', {})
+                        .get('max_connections', 1024)
+                    ),
+                    max_connections_per_workspace=(
+                        self.ap.instance_config.data.get('system', {})
+                        .get('websocket_retention', {})
+                        .get('max_connections_per_workspace', 32)
+                    ),
                )

-                # 发送连接成功消息
                await quart.websocket.send(
                    json.dumps(
                        {
@@ -72,182 +242,188 @@ class WebSocketChatRouterGroup(group.RouterGroup):
                )

                logger.debug(
-                    f'WebSocket connection established: {connection.connection_id} '
-                    f'(pipeline={pipeline_uuid}, session_type={session_type})'
+                    f'Dashboard WebSocket connected: {connection.connection_id} '
+                    f'(workspace={connection.workspace_uuid}, pipeline={pipeline_uuid}, '
+                    f'session_type={session_type})'
                )

-                # 创建接收和发送任务
-                receive_task = asyncio.create_task(self._handle_receive(connection, websocket_adapter))
-                send_task = asyncio.create_task(self._handle_send(connection))
-
-                # 等待任务完成
+                receive_task, send_task = create_scoped_duplex_tasks(
+                    self._handle_receive(
+                        connection,
+                        websocket_adapter,
+                        request_context,
+                        token,
+                    ),
+                    self._handle_send(connection),
+                    request_context.workspace_uuid,
+                )
                try:
-                    await asyncio.gather(receive_task, send_task)
-                except Exception as e:
-                    logger.error(f'WebSocket task execution error: {e}')
+                    await wait_for_duplex_tasks(receive_task, send_task)
+                except Exception as exc:
+                    logger.error(f'WebSocket task execution error: {exc}')
                finally:
-                    # 清理连接
                    await ws_connection_manager.remove_connection(connection.connection_id)
-                    logger.debug(f'WebSocket connection cleaned: {connection.connection_id}')

-            except Exception as e:
-                logger.error(f'WebSocket connection error: {e}', exc_info=True)
+            except Exception:
+                logger.error('Dashboard WebSocket connection error', exc_info=True)
                try:
-                    await quart.websocket.send(json.dumps({'type': 'error', 'message': str(e)}))
-                except:
+                    await quart.websocket.send(json.dumps({'type': 'error', 'message': 'Internal server error'}))
+                except Exception:
                    pass

-        @self.route('/messages/<session_type>', methods=['GET'])
-        async def get_messages(pipeline_uuid: str, session_type: str) -> str:
-            """获取消息历史"""
-            try:
-                if session_type not in ['person', 'group']:
-                    return self.http_status(400, -1, 'session_type must be person or group')
+        @self.route(
+            '/messages/<session_type>',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RUNTIME_OPERATE,
+        )
+        async def get_messages(
+            pipeline_uuid: str,
+            session_type: str,
+            request_context: RequestContext,
+        ) -> str:
+            if session_type not in ['person', 'group']:
+                return self.http_status(400, -1, 'session_type must be person or group')

-                websocket_adapter = self.ap.platform_mgr.websocket_proxy_bot.adapter
+            websocket_adapter = await self._get_scoped_adapter(request_context, pipeline_uuid)
+            if websocket_adapter is None:
+                return self.http_status(404, -1, 'Pipeline not found')
+            messages = websocket_adapter.get_websocket_messages(pipeline_uuid, session_type)
+            return self.success(data={'messages': messages})

-                if not websocket_adapter:
-                    return self.http_status(404, -1, 'WebSocket adapter not found')
+        @self.route(
+            '/reset/<session_type>',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RUNTIME_OPERATE,
+        )
+        async def reset_session(
+            pipeline_uuid: str,
+            session_type: str,
+            request_context: RequestContext,
+        ) -> str:
+            if session_type not in ['person', 'group']:
+                return self.http_status(400, -1, 'session_type must be person or group')

-                messages = websocket_adapter.get_websocket_messages(pipeline_uuid, session_type)
+            websocket_adapter = await self._get_scoped_adapter(request_context, pipeline_uuid)
+            if websocket_adapter is None:
+                return self.http_status(404, -1, 'Pipeline not found')
+            websocket_adapter.reset_session(pipeline_uuid, session_type)
+            return self.success(data={'message': 'Session reset successfully'})

-                return self.success(data={'messages': messages})
+        @self.route(
+            '/connections',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RUNTIME_OPERATE,
+        )
+        async def get_connections(pipeline_uuid: str, request_context: RequestContext) -> str:
+            if await self.ap.pipeline_service.get_pipeline(request_context, pipeline_uuid) is None:
+                return self.http_status(404, -1, 'Pipeline not found')

-            except Exception as e:
-                return self.http_status(500, -1, f'Internal server error: {str(e)}')
-
-        @self.route('/reset/<session_type>', methods=['POST'])
-        async def reset_session(pipeline_uuid: str, session_type: str) -> str:
-            """重置会话"""
-            try:
-                if session_type not in ['person', 'group']:
-                    return self.http_status(400, -1, 'session_type must be person or group')
-
-                websocket_adapter = self.ap.platform_mgr.websocket_proxy_bot.adapter
-
-                if not websocket_adapter:
-                    return self.http_status(404, -1, 'WebSocket adapter not found')
-
-                websocket_adapter.reset_session(pipeline_uuid, session_type)
-
-                return self.success(data={'message': 'Session reset successfully'})
-
-            except Exception as e:
-                return self.http_status(500, -1, f'Internal server error: {str(e)}')
-
-        @self.route('/connections', methods=['GET'])
-        async def get_connections(pipeline_uuid: str) -> str:
-            """获取当前连接统计"""
-            try:
-                stats = ws_connection_manager.get_stats()
-                connections = await ws_connection_manager.get_connections_by_pipeline(pipeline_uuid)
-
-                return self.success(
-                    data={
-                        'stats': stats,
-                        'connections': [
-                            {
-                                'connection_id': conn.connection_id,
-                                'session_type': conn.session_type,
-                                'created_at': conn.created_at.isoformat(),
-                                'last_active': conn.last_active.isoformat(),
-                                'is_active': conn.is_active,
-                            }
-                            for conn in connections
-                        ],
-                    }
-                )
-
-            except Exception as e:
-                return self.http_status(500, -1, f'Internal server error: {str(e)}')
-
-        @self.route('/broadcast', methods=['POST'])
-        async def broadcast_message(pipeline_uuid: str) -> str:
-            """向所有连接广播消息（后端主动推送）"""
-            try:
-                data = await quart.request.get_json()
-                message = data.get('message')
-
-                if not message:
-                    return self.http_status(400, -1, 'message is required')
-
-                # 广播消息
-                broadcast_data = {
-                    'type': 'broadcast',
-                    'message': message,
-                    'timestamp': datetime.datetime.now().isoformat(),
+            scope = WebSocketScope.from_context(request_context)
+            stats = ws_connection_manager.get_stats(scope=scope)
+            connections = await ws_connection_manager.get_connections_by_pipeline(
+                pipeline_uuid,
+                scope=scope,
+            )
+            return self.success(
+                data={
+                    'stats': stats,
+                    'connections': [
+                        {
+                            'connection_id': connection.connection_id,
+                            'session_type': connection.session_type,
+                            'created_at': connection.created_at.isoformat(),
+                            'last_active': connection.last_active.isoformat(),
+                            'is_active': connection.is_active,
+                        }
+                        for connection in connections
+                    ],
                }
+            )

-                await ws_connection_manager.broadcast_to_pipeline(pipeline_uuid, broadcast_data)
+        @self.route(
+            '/broadcast',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RUNTIME_OPERATE,
+        )
+        async def broadcast_message(pipeline_uuid: str, request_context: RequestContext) -> str:
+            if await self.ap.pipeline_service.get_pipeline(request_context, pipeline_uuid) is None:
+                return self.http_status(404, -1, 'Pipeline not found')

-                return self.success(data={'message': 'Broadcast sent successfully'})
+            data = await quart.request.get_json()
+            message = data.get('message')
+            if not message:
+                return self.http_status(400, -1, 'message is required')

-            except Exception as e:
-                return self.http_status(500, -1, f'Internal server error: {str(e)}')
+            broadcast_data = {
+                'type': 'broadcast',
+                'message': message,
+                'timestamp': datetime.datetime.now().isoformat(),
+            }
+            await ws_connection_manager.broadcast_to_pipeline(
+                pipeline_uuid,
+                broadcast_data,
+                scope=WebSocketScope.from_context(request_context),
+            )
+            return self.success(data={'message': 'Broadcast sent successfully'})

-    async def _handle_receive(self, connection, websocket_adapter):
-        """处理接收消息的任务"""
+    async def _handle_receive(
+        self,
+        connection,
+        websocket_adapter,
+        request_context: RequestContext,
+        token: str,
+    ):
        try:
            while connection.is_active:
-                # 接收消息
                message = await quart.websocket.receive()
-
-                # 更新活跃时间
                await ws_connection_manager.update_activity(connection.connection_id)

                try:
-                    data = json.loads(message)
+                    data = await asyncio.to_thread(json.loads, message)
                    message_type = data.get('type', 'message')
-
                    if message_type == 'ping':
-                        # 心跳响应
                        await connection.send_queue.put(
                            {'type': 'pong', 'timestamp': datetime.datetime.now().isoformat()}
                        )
-
                    elif message_type == 'message':
-                        # 处理用户消息
-                        logger.debug(f'收到消息: {data} from {connection.connection_id}')
-
-                        # 处理消息（不等待响应，响应会通过broadcast异步发送）
-                        # owner_bot is intentionally NOT passed: the dashboard
-                        # debug WebSocket must always run under the proxy bot,
-                        # never under a coincidentally-bound web_page_bot.
+                        try:
+                            await self._revalidate_websocket_authorization(request_context, token)
+                        except Exception:
+                            await connection.send_queue.put({'type': 'error', 'message': 'Unauthorized'})
+                            break
                        await websocket_adapter.handle_websocket_message(connection, data)
-
                    elif message_type == 'disconnect':
-                        # 客户端主动断开
-                        logger.debug(f'Client disconnected: {connection.connection_id}')
                        break
-
                    else:
-                        logger.warning(f'Unknown message type: {message_type}')
-
+                        logger.warning(f'Unknown WebSocket message type: {message_type}')
                except json.JSONDecodeError:
-                    logger.error(f'Invalid JSON message: {message}')
                    await connection.send_queue.put({'type': 'error', 'message': 'Invalid JSON format'})

-        except Exception as e:
-            logger.error(f'Receive message error: {e}', exc_info=True)
+        except Exception:
+            logger.error('Dashboard WebSocket receive error', exc_info=True)
        finally:
            connection.is_active = False
+            try:
+                connection.send_queue.put_nowait(None)
+            except asyncio.QueueFull:
+                pass

    async def _handle_send(self, connection):
-        """处理发送消息的任务"""
        try:
-            while connection.is_active:
-                # 从队列获取消息
+            while connection.is_active or not connection.send_queue.empty():
                try:
                    message = await asyncio.wait_for(connection.send_queue.get(), timeout=1.0)
-
-                    # 发送消息
-                    await quart.websocket.send(json.dumps(message))
-
+                    if message is None:
+                        break
+                    encoded = await asyncio.to_thread(json.dumps, message)
+                    await quart.websocket.send(encoded)
                except asyncio.TimeoutError:
-                    # 超时继续循环
                    continue
-
-        except Exception as e:
-            logger.error(f'Send message error: {e}', exc_info=True)
+        except Exception:
+            logger.error('Dashboard WebSocket send error', exc_info=True)
        finally:
            connection.is_active = False
@@ -1,8 +1,133 @@
-import quart
-import mimetypes
 import asyncio
+import dataclasses
+import mimetypes
+
+import quart
+
+from langbot.pkg.api.http.authz import Permission
+from langbot.pkg.api.http.context import RequestContext
+from langbot.pkg.core.errors import TaskCapacityError
+from langbot.pkg.utils import httpclient, importutil
+
 from ... import group
-from langbot.pkg.utils import importutil
+
+
+@dataclasses.dataclass(frozen=True, slots=True)
+class _AdapterSessionScope:
+    """Immutable tenant and principal binding for a credential exchange."""
+
+    instance_uuid: str
+    workspace_uuid: str
+    placement_generation: int
+    principal_type: str
+    account_uuid: str | None
+    api_key_uuid: str | None
+
+    @classmethod
+    def from_request_context(cls, request_context: RequestContext) -> '_AdapterSessionScope':
+        principal = request_context.principal
+        return cls(
+            instance_uuid=request_context.instance_uuid,
+            workspace_uuid=request_context.workspace_uuid,
+            placement_generation=request_context.placement_generation,
+            principal_type=principal.principal_type.value,
+            account_uuid=principal.account_uuid,
+            api_key_uuid=principal.api_key_uuid,
+        )
+
+    def matches(self, request_context: RequestContext) -> bool:
+        """Return whether a request is from the exact initiating tenant principal."""
+
+        return self == self.from_request_context(request_context)
+
+
+def _bind_session_scope(session: dict, request_context: RequestContext) -> None:
+    session['scope'] = _AdapterSessionScope.from_request_context(request_context)
+
+
+def _get_owned_session(
+    sessions: dict[str, dict],
+    session_id: str,
+    request_context: RequestContext,
+) -> dict | None:
+    """Resolve a session without revealing sessions owned by another scope."""
+
+    session = sessions.get(session_id)
+    scope = session.get('scope') if session is not None else None
+    if not isinstance(scope, _AdapterSessionScope) or not scope.matches(request_context):
+        return None
+    return session
+
+
+def _pop_owned_session(
+    sessions: dict[str, dict],
+    session_id: str,
+    request_context: RequestContext,
+) -> dict | None:
+    """Remove an owned session without allowing cross-scope cancellation."""
+
+    session = _get_owned_session(sessions, session_id, request_context)
+    if session is None:
+        return None
+    return sessions.pop(session_id, None)
+
+
+_MAX_ADAPTER_SESSIONS = 100
+_MAX_ADAPTER_SESSIONS_PER_WORKSPACE = 10
+
+
+def _start_adapter_session_task(
+    ap,
+    coro,
+    *,
+    adapter: str,
+    session_id: str,
+    request_context: RequestContext,
+) -> asyncio.Task | None:
+    """Attach one credential exchange to tenant admission and app shutdown."""
+
+    try:
+        wrapper = ap.task_mgr.create_user_task(
+            coro,
+            kind='platform-adapter-credential-exchange',
+            name=f'{adapter}-credential-{session_id}',
+            label=f'{adapter} credential exchange',
+            instance_uuid=request_context.instance_uuid,
+            workspace_uuid=request_context.workspace_uuid,
+            placement_generation=request_context.placement_generation,
+        )
+    except TaskCapacityError:
+        coro.close()
+        return None
+    return wrapper.task
+
+
+def _make_room_for_session(
+    sessions: dict[str, dict],
+    request_context: RequestContext,
+) -> None:
+    """Bound credential-exchange sessions globally and per workspace."""
+
+    workspace_uuid = request_context.workspace_uuid
+    owned = [
+        (session_id, session)
+        for session_id, session in sessions.items()
+        if getattr(session.get('scope'), 'workspace_uuid', None) == workspace_uuid
+    ]
+    evict_workspace_session = len(owned) >= _MAX_ADAPTER_SESSIONS_PER_WORKSPACE
+    evict_global_session = len(sessions) >= _MAX_ADAPTER_SESSIONS
+    if not evict_workspace_session and not evict_global_session:
+        return
+
+    candidates = owned if evict_workspace_session else list(sessions.items())
+    session_id, _ = min(
+        candidates,
+        key=lambda item: float(item[1].get('created_at', 0.0)),
+    )
+    session = sessions.pop(session_id, None)
+    task = session.get('task') if session is not None else None
+    if task is not None and not task.done():
+        task.cancel()


 def _decrypt_qqofficial_secret(encrypted_b64: str, key: bytes) -> str:
@@ -84,8 +209,8 @@ class AdaptersRouterGroup(group.RouterGroup):
                if session and session.get('task') and not session['task'].done():
                    session['task'].cancel()

-        @self.route('/lark/create-app', methods=['POST'])
-        async def _() -> str:
+        @self.route('/lark/create-app', methods=['POST'], permission=Permission.RESOURCE_MANAGE)
+        async def _(request_context: RequestContext) -> str:
            """Start Feishu one-click app registration. Returns session_id + QR code URL."""
            import uuid
            import time
@@ -106,6 +231,8 @@ class AdaptersRouterGroup(group.RouterGroup):
                'error': None,
                'created_at': time.time(),
            }
+            _bind_session_scope(session, request_context)
+            _make_room_for_session(_create_app_sessions, request_context)
            _create_app_sessions[session_id] = session

            def on_qr_code(info):
@@ -137,7 +264,16 @@ class AdaptersRouterGroup(group.RouterGroup):
                    session['status'] = 'error'
                    session['error'] = str(e)

-            task = asyncio.create_task(run_registration())
+            task = _start_adapter_session_task(
+                self.ap,
+                run_registration(),
+                adapter='lark',
+                session_id=session_id,
+                request_context=request_context,
+            )
+            if task is None:
+                _create_app_sessions.pop(session_id, None)
+                return self.http_status(429, -1, 'Too many active credential exchanges')
            session['task'] = task

            # Wait for QR code to be ready (max 10 seconds)
@@ -160,10 +296,15 @@ class AdaptersRouterGroup(group.RouterGroup):
                }
            )

-        @self.route('/lark/create-app/status/<session_id>', methods=['GET'])
-        async def _(session_id: str) -> str:
+        @self.route(
+            '/lark/create-app/status/<session_id>',
+            methods=['GET'],
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(session_id: str, request_context: RequestContext) -> str:
            """Poll registration status."""
-            session = _create_app_sessions.get(session_id)
+            _cleanup_expired_sessions()
+            session = _get_owned_session(_create_app_sessions, session_id, request_context)
            if not session:
                return self.http_status(404, -1, 'Session not found')

@@ -179,10 +320,16 @@ class AdaptersRouterGroup(group.RouterGroup):

            return self.success(data=data)

-        @self.route('/lark/create-app/<session_id>', methods=['DELETE'])
-        async def _(session_id: str) -> str:
+        @self.route(
+            '/lark/create-app/<session_id>',
+            methods=['DELETE'],
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(session_id: str, request_context: RequestContext) -> str:
            """Cancel and clean up a registration session."""
-            session = _create_app_sessions.pop(session_id, None)
+            session = _pop_owned_session(_create_app_sessions, session_id, request_context)
+            if session is None:
+                return self.http_status(404, -1, 'Session not found')
            if session and session.get('task') and not session['task'].done():
                session['task'].cancel()
            return self.success(data={})
@@ -206,8 +353,8 @@ class AdaptersRouterGroup(group.RouterGroup):
                if session and session.get('task') and not session['task'].done():
                    session['task'].cancel()

-        @self.route('/weixin/login', methods=['POST'])
-        async def _() -> str:
+        @self.route('/weixin/login', methods=['POST'], permission=Permission.RESOURCE_MANAGE)
+        async def _(request_context: RequestContext) -> str:
            """Start WeChat QR code login. Returns session_id + QR code data URL."""
            import uuid
            import time
@@ -229,6 +376,8 @@ class AdaptersRouterGroup(group.RouterGroup):
                'error': None,
                'created_at': time.time(),
            }
+            _bind_session_scope(session, request_context)
+            _make_room_for_session(_weixin_login_sessions, request_context)
            _weixin_login_sessions[session_id] = session

            client = OpenClawWeixinClient(
@@ -267,7 +416,16 @@ class AdaptersRouterGroup(group.RouterGroup):
                finally:
                    await client.close()

-            task = asyncio.create_task(run_login())
+            task = _start_adapter_session_task(
+                self.ap,
+                run_login(),
+                adapter='weixin',
+                session_id=session_id,
+                request_context=request_context,
+            )
+            if task is None:
+                _weixin_login_sessions.pop(session_id, None)
+                return self.http_status(429, -1, 'Too many active credential exchanges')
            session['task'] = task

            # Wait for QR code to be ready (max 10 seconds)
@@ -290,10 +448,15 @@ class AdaptersRouterGroup(group.RouterGroup):
                }
            )

-        @self.route('/weixin/login/status/<session_id>', methods=['GET'])
-        async def _(session_id: str) -> str:
+        @self.route(
+            '/weixin/login/status/<session_id>',
+            methods=['GET'],
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(session_id: str, request_context: RequestContext) -> str:
            """Poll WeChat login status."""
-            session = _weixin_login_sessions.get(session_id)
+            _cleanup_expired_weixin_sessions()
+            session = _get_owned_session(_weixin_login_sessions, session_id, request_context)
            if not session:
                return self.http_status(404, -1, 'Session not found')

@@ -317,10 +480,16 @@ class AdaptersRouterGroup(group.RouterGroup):

            return self.success(data=data)

-        @self.route('/weixin/login/<session_id>', methods=['DELETE'])
-        async def _(session_id: str) -> str:
+        @self.route(
+            '/weixin/login/<session_id>',
+            methods=['DELETE'],
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(session_id: str, request_context: RequestContext) -> str:
            """Cancel and clean up a WeChat login session."""
-            session = _weixin_login_sessions.pop(session_id, None)
+            session = _pop_owned_session(_weixin_login_sessions, session_id, request_context)
+            if session is None:
+                return self.http_status(404, -1, 'Session not found')
            if session and session.get('task') and not session['task'].done():
                session['task'].cancel()
            return self.success(data={})
@@ -344,8 +513,8 @@ class AdaptersRouterGroup(group.RouterGroup):
                if session and session.get('task') and not session['task'].done():
                    session['task'].cancel()

-        @self.route('/dingtalk/create-app', methods=['POST'])
-        async def _() -> str:
+        @self.route('/dingtalk/create-app', methods=['POST'], permission=Permission.RESOURCE_MANAGE)
+        async def _(request_context: RequestContext) -> str:
            """Start DingTalk one-click app creation via Device Flow. Returns session_id + QR code URL."""
            import uuid
            import time
@@ -368,6 +537,8 @@ class AdaptersRouterGroup(group.RouterGroup):
                'device_code': None,
                'interval': 5,
            }
+            _bind_session_scope(session, request_context)
+            _make_room_for_session(_dingtalk_sessions, request_context)
            _dingtalk_sessions[session_id] = session

            async def run_device_flow():
@@ -380,7 +551,7 @@ class AdaptersRouterGroup(group.RouterGroup):
                            json={'source': 'langbot'},
                        ) as resp:
                            try:
-                                data = await resp.json()
+                                data = await httpclient.read_json_limited(resp)
                            except (aiohttp.ContentTypeError, ValueError):
                                session['status'] = 'error'
                                session['error'] = 'Invalid response from DingTalk service'
@@ -397,7 +568,7 @@ class AdaptersRouterGroup(group.RouterGroup):
                            json={'nonce': nonce},
                        ) as resp:
                            try:
-                                data = await resp.json()
+                                data = await httpclient.read_json_limited(resp)
                            except (aiohttp.ContentTypeError, ValueError):
                                session['status'] = 'error'
                                session['error'] = 'Invalid response from DingTalk service'
@@ -428,7 +599,7 @@ class AdaptersRouterGroup(group.RouterGroup):
                                json={'device_code': device_code},
                            ) as poll_resp:
                                try:
-                                    poll_data = await poll_resp.json()
+                                    poll_data = await httpclient.read_json_limited(poll_resp)
                                except (aiohttp.ContentTypeError, ValueError):
                                    continue

@@ -464,7 +635,16 @@ class AdaptersRouterGroup(group.RouterGroup):
                    session['status'] = 'error'
                    session['error'] = str(e)

-            task = asyncio.create_task(run_device_flow())
+            task = _start_adapter_session_task(
+                self.ap,
+                run_device_flow(),
+                adapter='dingtalk',
+                session_id=session_id,
+                request_context=request_context,
+            )
+            if task is None:
+                _dingtalk_sessions.pop(session_id, None)
+                return self.http_status(429, -1, 'Too many active credential exchanges')
            session['task'] = task

            # Wait for QR code to be ready (max 10 seconds)
@@ -491,11 +671,15 @@ class AdaptersRouterGroup(group.RouterGroup):
                }
            )

-        @self.route('/dingtalk/create-app/status/<session_id>', methods=['GET'])
-        async def _(session_id: str) -> str:
+        @self.route(
+            '/dingtalk/create-app/status/<session_id>',
+            methods=['GET'],
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(session_id: str, request_context: RequestContext) -> str:
            """Poll DingTalk Device Flow status."""
            _cleanup_expired_dingtalk_sessions()
-            session = _dingtalk_sessions.get(session_id)
+            session = _get_owned_session(_dingtalk_sessions, session_id, request_context)
            if not session:
                return self.http_status(404, -1, 'Session not found')

@@ -511,10 +695,16 @@ class AdaptersRouterGroup(group.RouterGroup):

            return self.success(data=data)

-        @self.route('/dingtalk/create-app/<session_id>', methods=['DELETE'])
-        async def _(session_id: str) -> str:
+        @self.route(
+            '/dingtalk/create-app/<session_id>',
+            methods=['DELETE'],
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(session_id: str, request_context: RequestContext) -> str:
            """Cancel and clean up a DingTalk Device Flow session."""
-            session = _dingtalk_sessions.pop(session_id, None)
+            session = _pop_owned_session(_dingtalk_sessions, session_id, request_context)
+            if session is None:
+                return self.http_status(404, -1, 'Session not found')
            if session and session.get('task') and not session['task'].done():
                session['task'].cancel()
            return self.success(data={})
@@ -538,8 +728,8 @@ class AdaptersRouterGroup(group.RouterGroup):
                if session and session.get('task') and not session['task'].done():
                    session['task'].cancel()

-        @self.route('/wecombot/create-bot', methods=['POST'])
-        async def _() -> str:
+        @self.route('/wecombot/create-bot', methods=['POST'], permission=Permission.RESOURCE_MANAGE)
+        async def _(request_context: RequestContext) -> str:
            """Start WeComBot one-click creation via QR code. Returns session_id + QR code URL."""
            import uuid
            import time
@@ -563,6 +753,8 @@ class AdaptersRouterGroup(group.RouterGroup):
                'scode': None,
                'task': None,
            }
+            _bind_session_scope(session, request_context)
+            _make_room_for_session(_wecombot_sessions, request_context)
            _wecombot_sessions[session_id] = session

            async def run_qr_flow():
@@ -574,7 +766,7 @@ class AdaptersRouterGroup(group.RouterGroup):
                            f'{WECOM_QC_GENERATE_URL}?source=langbot&plat=0',
                        ) as resp:
                            try:
-                                data = await resp.json()
+                                data = await httpclient.read_json_limited(resp)
                            except (aiohttp.ContentTypeError, ValueError):
                                session['status'] = 'error'
                                session['error'] = 'Invalid response from WeCom service'
@@ -601,7 +793,7 @@ class AdaptersRouterGroup(group.RouterGroup):
                                f'{WECOM_QC_QUERY_URL}?scode={scode}',
                            ) as poll_resp:
                                try:
-                                    poll_data = await poll_resp.json()
+                                    poll_data = await httpclient.read_json_limited(poll_resp)
                                except (aiohttp.ContentTypeError, ValueError):
                                    continue

@@ -628,7 +820,16 @@ class AdaptersRouterGroup(group.RouterGroup):
                    session['status'] = 'error'
                    session['error'] = str(e)

-            task = asyncio.create_task(run_qr_flow())
+            task = _start_adapter_session_task(
+                self.ap,
+                run_qr_flow(),
+                adapter='wecombot',
+                session_id=session_id,
+                request_context=request_context,
+            )
+            if task is None:
+                _wecombot_sessions.pop(session_id, None)
+                return self.http_status(429, -1, 'Too many active credential exchanges')
            session['task'] = task

            # Wait for QR code to be ready (max 10 seconds)
@@ -655,11 +856,15 @@ class AdaptersRouterGroup(group.RouterGroup):
                }
            )

-        @self.route('/wecombot/create-bot/status/<session_id>', methods=['GET'])
-        async def _(session_id: str) -> str:
+        @self.route(
+            '/wecombot/create-bot/status/<session_id>',
+            methods=['GET'],
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(session_id: str, request_context: RequestContext) -> str:
            """Poll WeComBot creation status."""
            _cleanup_expired_wecombot_sessions()
-            session = _wecombot_sessions.get(session_id)
+            session = _get_owned_session(_wecombot_sessions, session_id, request_context)
            if not session:
                return self.http_status(404, -1, 'Session not found')

@@ -675,10 +880,16 @@ class AdaptersRouterGroup(group.RouterGroup):

            return self.success(data=data)

-        @self.route('/wecombot/create-bot/<session_id>', methods=['DELETE'])
-        async def _(session_id: str) -> str:
+        @self.route(
+            '/wecombot/create-bot/<session_id>',
+            methods=['DELETE'],
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(session_id: str, request_context: RequestContext) -> str:
            """Cancel and clean up a WeComBot creation session."""
-            session = _wecombot_sessions.pop(session_id, None)
+            session = _pop_owned_session(_wecombot_sessions, session_id, request_context)
+            if session is None:
+                return self.http_status(404, -1, 'Session not found')
            if session and session.get('task') and not session['task'].done():
                session['task'].cancel()
            return self.success(data={})
@@ -702,8 +913,8 @@ class AdaptersRouterGroup(group.RouterGroup):
                if session and session.get('task') and not session['task'].done():
                    session['task'].cancel()

-        @self.route('/qqofficial/bind', methods=['POST'])
-        async def _() -> str:
+        @self.route('/qqofficial/bind', methods=['POST'], permission=Permission.RESOURCE_MANAGE)
+        async def _(request_context: RequestContext) -> str:
            """Start QQ Official QR binding. Returns session_id + QR URL.

            Flow: generate a local AES-256 key, register it with
@@ -739,6 +950,8 @@ class AdaptersRouterGroup(group.RouterGroup):
                'bind_key_bytes': bind_key_bytes,
                'interval': 2,
            }
+            _bind_session_scope(session, request_context)
+            _make_room_for_session(_qqofficial_sessions, request_context)
            _qqofficial_sessions[session_id] = session

            async def run_qr_binding():
@@ -752,7 +965,7 @@ class AdaptersRouterGroup(group.RouterGroup):
                            headers={'Accept': 'application/json'},
                        ) as resp:
                            try:
-                                data = await resp.json(content_type=None)
+                                data = await httpclient.read_json_limited(resp)
                            except (aiohttp.ContentTypeError, ValueError):
                                session['status'] = 'error'
                                session['error'] = 'Invalid response from QQ bind service'
@@ -790,7 +1003,7 @@ class AdaptersRouterGroup(group.RouterGroup):
                                headers={'Accept': 'application/json'},
                            ) as poll_resp:
                                try:
-                                    poll_data = await poll_resp.json(content_type=None)
+                                    poll_data = await httpclient.read_json_limited(poll_resp)
                                except (aiohttp.ContentTypeError, ValueError):
                                    continue

@@ -843,7 +1056,16 @@ class AdaptersRouterGroup(group.RouterGroup):
                    session['status'] = 'error'
                    session['error'] = str(e)

-            task = asyncio.create_task(run_qr_binding())
+            task = _start_adapter_session_task(
+                self.ap,
+                run_qr_binding(),
+                adapter='qqofficial',
+                session_id=session_id,
+                request_context=request_context,
+            )
+            if task is None:
+                _qqofficial_sessions.pop(session_id, None)
+                return self.http_status(429, -1, 'Too many active credential exchanges')
            session['task'] = task

            # Wait up to 10s for the QR URL to be ready before responding.
@@ -870,11 +1092,15 @@ class AdaptersRouterGroup(group.RouterGroup):
                }
            )

-        @self.route('/qqofficial/bind/status/<session_id>', methods=['GET'])
-        async def _(session_id: str) -> str:
+        @self.route(
+            '/qqofficial/bind/status/<session_id>',
+            methods=['GET'],
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(session_id: str, request_context: RequestContext) -> str:
            """Poll QQ Official QR binding status."""
            _cleanup_expired_qqofficial_sessions()
-            session = _qqofficial_sessions.get(session_id)
+            session = _get_owned_session(_qqofficial_sessions, session_id, request_context)
            if not session:
                return self.http_status(404, -1, 'Session not found')

@@ -892,10 +1118,16 @@ class AdaptersRouterGroup(group.RouterGroup):

            return self.success(data=data)

-        @self.route('/qqofficial/bind/<session_id>', methods=['DELETE'])
-        async def _(session_id: str) -> str:
+        @self.route(
+            '/qqofficial/bind/<session_id>',
+            methods=['DELETE'],
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(session_id: str, request_context: RequestContext) -> str:
            """Cancel and clean up a QQ Official QR binding session."""
-            session = _qqofficial_sessions.pop(session_id, None)
+            session = _pop_owned_session(_qqofficial_sessions, session_id, request_context)
+            if session is None:
+                return self.http_status(404, -1, 'Session not found')
            if session and session.get('task') and not session['task'].done():
                session['task'].cancel()
            return self.success(data={})
@@ -1,45 +1,95 @@
 import quart
+from sqlalchemy.exc import IntegrityError

+from ....authz import Permission, has_permission
+from ....context import RequestContext
 from ... import group


@group.group_class('bots', '/api/v1/platform/bots')
 class BotsRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('', methods=['GET', 'POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _() -> str:
-            if quart.request.method == 'GET':
-                return self.success(data={'bots': await self.ap.bot_service.get_bots()})
-            elif quart.request.method == 'POST':
-                json_data = await quart.request.json
-                bot_uuid = await self.ap.bot_service.create_bot(json_data)
-                return self.success(data={'uuid': bot_uuid})
+        @self.route(
+            '',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
+            include_secret = has_permission(request_context, Permission.RESOURCE_MANAGE)
+            return self.success(
+                data={
+                    'bots': await self.ap.bot_service.get_bots(
+                        request_context,
+                        include_secret=include_secret,
+                    )
+                }
+            )

-        @self.route('/<bot_uuid>', methods=['GET', 'PUT', 'DELETE'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _(bot_uuid: str) -> str:
-            if quart.request.method == 'GET':
-                bot = await self.ap.bot_service.get_runtime_bot_info(bot_uuid)
-                if bot is None:
-                    return self.http_status(404, -1, 'bot not found')
-                return self.success(data={'bot': bot})
-            elif quart.request.method == 'PUT':
-                json_data = await quart.request.json
-                await self.ap.bot_service.update_bot(bot_uuid, json_data)
-                return self.success()
-            elif quart.request.method == 'DELETE':
-                await self.ap.bot_service.delete_bot(bot_uuid)
-                return self.success()
+        @self.route(
+            '',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(request_context: RequestContext) -> str:
+            json_data = await quart.request.json
+            bot_uuid = await self.ap.bot_service.create_bot(request_context, json_data)
+            return self.success(data={'uuid': bot_uuid})

-        @self.route('/<bot_uuid>/logs', methods=['POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _(bot_uuid: str) -> str:
+        @self.route(
+            '/<bot_uuid>',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(bot_uuid: str, request_context: RequestContext) -> str:
+            include_secret = has_permission(request_context, Permission.RESOURCE_MANAGE)
+            bot = await self.ap.bot_service.get_runtime_bot_info(
+                request_context,
+                bot_uuid,
+                include_secret=include_secret,
+            )
+            if bot is None:
+                return self.http_status(404, -1, 'bot not found')
+            return self.success(data={'bot': bot})
+
+        @self.route(
+            '/<bot_uuid>',
+            methods=['PUT', 'DELETE'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(bot_uuid: str, request_context: RequestContext) -> str:
+            if quart.request.method == 'PUT':
+                json_data = await quart.request.json
+                await self.ap.bot_service.update_bot(request_context, bot_uuid, json_data)
+            else:
+                await self.ap.bot_service.delete_bot(request_context, bot_uuid)
+            return self.success()
+
+        @self.route(
+            '/<bot_uuid>/logs',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(bot_uuid: str, request_context: RequestContext) -> str:
            json_data = await quart.request.json
            from_index = json_data.get('from_index', -1)
            max_count = json_data.get('max_count', 10)
-            logs, total_count = await self.ap.bot_service.list_event_logs(bot_uuid, from_index, max_count)
+            logs, total_count = await self.ap.bot_service.list_event_logs(
+                request_context, bot_uuid, from_index, max_count
+            )
            return self.success(data={'logs': logs, 'total_count': total_count})

-        @self.route('/<bot_uuid>/send_message', methods=['POST'], auth_type=group.AuthType.API_KEY)
-        async def _(bot_uuid: str) -> str:
+        @self.route(
+            '/<bot_uuid>/send_message',
+            methods=['POST'],
+            auth_type=group.AuthType.API_KEY,
+            permission=Permission.RUNTIME_OPERATE,
+        )
+        async def _(bot_uuid: str, request_context: RequestContext) -> str:
            json_data = await quart.request.json
            target_type = json_data.get('target_type')
            target_id = json_data.get('target_id')
@@ -54,37 +104,51 @@ class BotsRouterGroup(group.RouterGroup):
            if target_type not in ['person', 'group']:
                return self.http_status(400, -1, 'target_type must be either "person" or "group"')

-            try:
-                await self.ap.bot_service.send_message(bot_uuid, target_type, target_id, message_chain_data)
-                return self.success(data={'sent': True})
-            except Exception as e:
-                import traceback
-
-                traceback.print_exc()
-                return self.http_status(500, -1, f'Failed to send message: {str(e)}')
-
-        # ============ Bot Admins ============
-
-        @self.route('/<bot_uuid>/admins', methods=['GET', 'POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _(bot_uuid: str) -> str:
-            if quart.request.method == 'GET':
-                admins = await self.ap.bot_service.get_bot_admins(bot_uuid)
-                return self.success(data={'admins': admins})
-            elif quart.request.method == 'POST':
-                json_data = await quart.request.json
-                launcher_type = json_data.get('launcher_type', '').strip()
-                launcher_id = str(json_data.get('launcher_id', '')).strip()
-                if not launcher_type or not launcher_id:
-                    return self.http_status(400, -1, 'launcher_type and launcher_id are required')
-                try:
-                    admin_id = await self.ap.bot_service.add_bot_admin(bot_uuid, launcher_type, launcher_id)
-                    return self.success(data={'id': admin_id})
-                except Exception as e:
-                    return self.http_status(409, -1, str(e))
+            await self.ap.bot_service.send_message(
+                request_context,
+                bot_uuid,
+                target_type,
+                target_id,
+                message_chain_data,
+            )
+            return self.success(data={'sent': True})

        @self.route(
-            '/<bot_uuid>/admins/<int:admin_id>', methods=['DELETE'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY
+            '/<bot_uuid>/admins',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
        )
-        async def _(bot_uuid: str, admin_id: int) -> str:
-            await self.ap.bot_service.delete_bot_admin(bot_uuid, admin_id)
+        async def _(bot_uuid: str, request_context: RequestContext) -> str:
+            admins = await self.ap.bot_service.get_bot_admins(request_context, bot_uuid)
+            return self.success(data={'admins': admins})
+
+        @self.route(
+            '/<bot_uuid>/admins',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(bot_uuid: str, request_context: RequestContext) -> str:
+            json_data = await quart.request.json
+            launcher_type = json_data.get('launcher_type', '').strip()
+            launcher_id = str(json_data.get('launcher_id', '')).strip()
+            if not launcher_type or not launcher_id:
+                return self.http_status(400, -1, 'launcher_type and launcher_id are required')
+            try:
+                admin_id = await self.ap.bot_service.add_bot_admin(
+                    request_context, bot_uuid, launcher_type, launcher_id
+                )
+                return self.success(data={'id': admin_id})
+            except IntegrityError as e:
+                return self.http_status(409, -1, str(e))
+
+        @self.route(
+            '/<bot_uuid>/admins/<int:admin_id>',
+            methods=['DELETE'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(bot_uuid: str, admin_id: int, request_context: RequestContext) -> str:
+            await self.ap.bot_service.delete_bot_admin(request_context, bot_uuid, admin_id)
            return self.success()
@@ -1,23 +1,158 @@
 from __future__ import annotations

+import asyncio
 import base64
-import io
+import collections.abc
+import copy
 import quart
 import re
 import httpx
 import uuid
 import os
 import zipfile
-import yaml
 from urllib.parse import urlparse
 import posixpath
 import sqlalchemy

 from .....core import taskmgr
+from .....core.task_boundary import run_in_workspace_uow
 from .....entity.persistence import plugin as persistence_plugin
+from ...authz import Permission
+from ...context import ExecutionContext, RequestContext
 from .. import group
+from .....workspace.errors import WorkspaceNotFoundError
+from .....plugin.github import validate_github_plugin_install_info
+from .....plugin.archive import inspect_plugin_archive_metadata
+from .....utils import httpclient
 from langbot_plugin.runtime.plugin.mgr import PluginInstallSource

+
+_SECRET_MASK = '***'
+_MISSING_SECRET = object()
+_SENSITIVE_CONFIG_NAMES = frozenset(
+    {
+        'api_key',
+        'apikey',
+        'auth',
+        'authorization',
+        'cookie',
+        'credentials',
+        'database_url',
+        'dsn',
+        'key',
+        'proxy_authorization',
+        'set_cookie',
+    }
+)
+_SENSITIVE_CONFIG_TOKENS = frozenset(
+    {
+        'credential',
+        'credentials',
+        'passwd',
+        'password',
+        'secret',
+        'token',
+    }
+)
+_SENSITIVE_KEY_QUALIFIERS = frozenset(
+    {
+        'access',
+        'api',
+        'auth',
+        'bearer',
+        'client',
+        'debug',
+        'encryption',
+        'private',
+        'signing',
+    }
+)
+
+
+def _normalize_config_key(key: object) -> str:
+    value = re.sub(r'([a-z0-9])([A-Z])', r'\1_\2', str(key or ''))
+    return re.sub(r'[^a-zA-Z0-9]+', '_', value).strip('_').lower()
+
+
+def _is_sensitive_config_key(key: object) -> bool:
+    normalized = _normalize_config_key(key)
+    if normalized in _SENSITIVE_CONFIG_NAMES:
+        return True
+    tokens = frozenset(token for token in normalized.split('_') if token)
+    if tokens & _SENSITIVE_CONFIG_TOKENS:
+        return True
+    return 'key' in tokens and bool(tokens & _SENSITIVE_KEY_QUALIFIERS)
+
+
+def _mask_secret_structure(value):
+    """Mask every non-empty leaf while preserving container structure."""
+
+    if isinstance(value, dict):
+        return {key: _mask_secret_structure(item) for key, item in value.items()}
+    if isinstance(value, list):
+        return [_mask_secret_structure(item) for item in value]
+    if isinstance(value, tuple):
+        return tuple(_mask_secret_structure(item) for item in value)
+    if value is None or value == '':
+        return value
+    return _SECRET_MASK
+
+
+def redact_plugin_secrets(value):
+    """Return a recursively redacted copy of plugin-facing data."""
+
+    if isinstance(value, dict):
+        return {
+            key: (_mask_secret_structure(item) if _is_sensitive_config_key(key) else redact_plugin_secrets(item))
+            for key, item in value.items()
+        }
+    if isinstance(value, list):
+        return [redact_plugin_secrets(item) for item in value]
+    if isinstance(value, tuple):
+        return tuple(redact_plugin_secrets(item) for item in value)
+    return value
+
+
+def restore_plugin_secret_placeholders(value, current_value=_MISSING_SECRET, *, sensitive: bool = False):
+    """Restore masked leaves from the current config before a management write."""
+
+    if sensitive and value == _SECRET_MASK:
+        if current_value is _MISSING_SECRET:
+            raise ValueError('Masked plugin secret has no existing value')
+        return copy.deepcopy(current_value)
+    if isinstance(value, dict):
+        current_mapping = current_value if isinstance(current_value, dict) else {}
+        return {
+            key: restore_plugin_secret_placeholders(
+                item,
+                current_mapping.get(key, _MISSING_SECRET),
+                sensitive=sensitive or _is_sensitive_config_key(key),
+            )
+            for key, item in value.items()
+        }
+    if isinstance(value, list):
+        current_items = current_value if isinstance(current_value, (list, tuple)) else ()
+        return [
+            restore_plugin_secret_placeholders(
+                item,
+                current_items[index] if index < len(current_items) else _MISSING_SECRET,
+                sensitive=sensitive,
+            )
+            for index, item in enumerate(value)
+        ]
+    if isinstance(value, tuple):
+        current_items = current_value if isinstance(current_value, (list, tuple)) else ()
+        return tuple(
+            restore_plugin_secret_placeholders(
+                item,
+                current_items[index] if index < len(current_items) else _MISSING_SECRET,
+                sensitive=sensitive,
+            )
+            for index, item in enumerate(value)
+        )
+    return value
+
+
 # Resolve the built-in page SDK JS from the langbot_plugin package
 _PAGE_SDK_PATH = None
 try:
@@ -148,18 +283,78 @@ class PluginsRouterGroup(group.RouterGroup):
            'subdir': subdir,
        }

-    async def _check_extensions_limit(self) -> str | None:
+    async def _check_extensions_limit(self, request_context: RequestContext) -> str | None:
        """Check if extensions limit is reached. Returns error response if limit exceeded, None otherwise."""
+        await self.ap.plugin_connector.require_workspace_context(request_context)
        limitation = self.ap.instance_config.data.get('system', {}).get('limitation', {})
        max_extensions = limitation.get('max_extensions', -1)
        if max_extensions >= 0:
            plugins = await self.ap.plugin_connector.list_plugins()
-            mcp_servers = await self.ap.mcp_service.get_mcp_servers()
+            mcp_servers = await self.ap.mcp_service.get_mcp_servers(request_context)
            total_extensions = len(plugins) + len(mcp_servers)
            if total_extensions >= max_extensions:
                return self.http_status(400, -1, f'Maximum number of extensions ({max_extensions}) reached')
        return None

+    @staticmethod
+    def _task_scope(request_context: RequestContext) -> dict[str, str | int]:
+        return {
+            'instance_uuid': request_context.instance_uuid,
+            'workspace_uuid': request_context.workspace_uuid,
+            'placement_generation': request_context.placement_generation,
+        }
+
+    async def _run_fenced_plugin_operation(
+        self,
+        execution_context: ExecutionContext,
+        operation: collections.abc.Callable[[], collections.abc.Awaitable],
+    ):
+        """Revalidate a captured task context immediately before Runtime I/O."""
+
+        await run_in_workspace_uow(
+            self.ap,
+            execution_context.workspace_uuid,
+            lambda: self.ap.plugin_connector.require_workspace_context(execution_context),
+        )
+        return await operation()
+
+    async def _require_public_plugin_runtime_context(self) -> ExecutionContext:
+        """Resolve public assets only for the OSS singleton Workspace.
+
+        Public image and iframe requests cannot carry the WebUI bearer token.
+        They therefore remain available for the one-Workspace Core deployment,
+        but fail closed instead of guessing a Workspace when multi-Workspace
+        policy is active.
+        """
+
+        workspace_service = getattr(self.ap, 'workspace_service', None)
+        policy = getattr(workspace_service, 'policy', None)
+        if workspace_service is None or policy is None or getattr(policy, 'multi_workspace_enabled', False):
+            raise WorkspaceNotFoundError('Plugin resource not found')
+        binding = await workspace_service.get_local_execution_binding()
+        execution_context = ExecutionContext(
+            instance_uuid=binding.instance_uuid,
+            workspace_uuid=binding.workspace_uuid,
+            placement_generation=binding.placement_generation,
+        )
+        return await self.ap.plugin_connector.require_workspace_context(execution_context)
+
+    async def _get_stored_plugin_config(
+        self,
+        request_context: RequestContext,
+        author: str,
+        plugin_name: str,
+        plugin: dict,
+    ):
+        result = await self.ap.persistence_mgr.execute_async(
+            sqlalchemy.select(persistence_plugin.PluginSetting.config)
+            .where(persistence_plugin.PluginSetting.workspace_uuid == request_context.workspace_uuid)
+            .where(persistence_plugin.PluginSetting.plugin_author == author)
+            .where(persistence_plugin.PluginSetting.plugin_name == plugin_name)
+        )
+        persisted_config = result.scalar_one_or_none()
+        return persisted_config if persisted_config is not None else plugin['plugin_config']
+
    async def initialize(self) -> None:
        @self.route('/_sdk/page-sdk.js', methods=['GET'], auth_type=group.AuthType.NONE)
        async def _() -> quart.Response:
@@ -170,15 +365,27 @@ class PluginsRouterGroup(group.RouterGroup):
                return quart.Response(content, mimetype='application/javascript')
            return quart.Response('// SDK not found', status=404, mimetype='application/javascript')

-        @self.route('', methods=['GET'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _() -> str:
+        @self.route(
+            '',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
+            await self.ap.plugin_connector.require_workspace_context(request_context)
            plugins = await self.ap.plugin_connector.list_plugins()

-            return self.success(data={'plugins': plugins})
+            return self.success(data={'plugins': redact_plugin_secrets(plugins)})

-        @self.route('/debug-info', methods=['GET'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _() -> str:
+        @self.route(
+            '/debug-info',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(request_context: RequestContext) -> str:
            """Get plugin debug information including debug URL and key"""
+            await self.ap.plugin_connector.require_workspace_context(request_context)
            debug_info = await self.ap.plugin_connector.get_debug_info()

            # Get debug URL from config
@@ -196,77 +403,121 @@ class PluginsRouterGroup(group.RouterGroup):
            '/<author>/<plugin_name>/upgrade',
            methods=['POST'],
            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
        )
-        async def _(author: str, plugin_name: str) -> str:
+        async def _(author: str, plugin_name: str, request_context: RequestContext) -> str:
+            execution_context = await self.ap.plugin_connector.require_workspace_context(request_context)
            ctx = taskmgr.TaskContext.new()
            wrapper = self.ap.task_mgr.create_user_task(
-                self.ap.plugin_connector.upgrade_plugin(author, plugin_name, task_context=ctx),
+                self._run_fenced_plugin_operation(
+                    execution_context,
+                    lambda: self.ap.plugin_connector.upgrade_plugin(author, plugin_name, task_context=ctx),
+                ),
                kind='plugin-operation',
                name=f'plugin-upgrade-{plugin_name}',
                label=f'Upgrading plugin {plugin_name}',
                context=ctx,
+                **self._task_scope(request_context),
            )
            return self.success(data={'task_id': wrapper.id})

        @self.route(
            '/<author>/<plugin_name>',
-            methods=['GET', 'DELETE'],
+            methods=['GET'],
            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
        )
-        async def _(author: str, plugin_name: str) -> str:
-            if quart.request.method == 'GET':
-                plugin = await self.ap.plugin_connector.get_plugin_info(author, plugin_name)
-                if plugin is None:
-                    return self.http_status(404, -1, 'plugin not found')
-                return self.success(data={'plugin': plugin})
-            elif quart.request.method == 'DELETE':
-                delete_data = quart.request.args.get('delete_data', 'false').lower() == 'true'
-                ctx = taskmgr.TaskContext.new()
-                wrapper = self.ap.task_mgr.create_user_task(
-                    self.ap.plugin_connector.delete_plugin(
-                        author, plugin_name, delete_data=delete_data, task_context=ctx
-                    ),
-                    kind='plugin-operation',
-                    name=f'plugin-remove-{plugin_name}',
-                    label=f'Removing plugin {plugin_name}',
-                    context=ctx,
-                )
+        async def _(author: str, plugin_name: str, request_context: RequestContext) -> str:
+            await self.ap.plugin_connector.require_workspace_context(request_context)
+            plugin = await self.ap.plugin_connector.get_plugin_info(author, plugin_name)
+            if plugin is None:
+                return self.http_status(404, -1, 'plugin not found')
+            return self.success(data={'plugin': redact_plugin_secrets(plugin)})

-                return self.success(data={'task_id': wrapper.id})
+        @self.route(
+            '/<author>/<plugin_name>',
+            methods=['DELETE'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(author: str, plugin_name: str, request_context: RequestContext) -> str:
+            execution_context = await self.ap.plugin_connector.require_workspace_context(request_context)
+            delete_data = quart.request.args.get('delete_data', 'false').lower() == 'true'
+            ctx = taskmgr.TaskContext.new()
+            wrapper = self.ap.task_mgr.create_user_task(
+                self._run_fenced_plugin_operation(
+                    execution_context,
+                    lambda: self.ap.plugin_connector.delete_plugin(
+                        author,
+                        plugin_name,
+                        delete_data=delete_data,
+                        task_context=ctx,
+                    ),
+                ),
+                kind='plugin-operation',
+                name=f'plugin-remove-{plugin_name}',
+                label=f'Removing plugin {plugin_name}',
+                context=ctx,
+                **self._task_scope(request_context),
+            )
+            return self.success(data={'task_id': wrapper.id})

        @self.route(
            '/<author>/<plugin_name>/config',
-            methods=['GET', 'PUT'],
+            methods=['GET'],
            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
        )
-        async def _(author: str, plugin_name: str) -> quart.Response:
+        async def _(author: str, plugin_name: str, request_context: RequestContext) -> quart.Response:
+            await self.ap.plugin_connector.require_workspace_context(request_context)
            plugin = await self.ap.plugin_connector.get_plugin_info(author, plugin_name)
            if plugin is None:
                return self.http_status(404, -1, 'plugin not found')

-            if quart.request.method == 'GET':
-                result = await self.ap.persistence_mgr.execute_async(
-                    sqlalchemy.select(persistence_plugin.PluginSetting.config)
-                    .where(persistence_plugin.PluginSetting.plugin_author == author)
-                    .where(persistence_plugin.PluginSetting.plugin_name == plugin_name)
+            config = await self._get_stored_plugin_config(
+                request_context,
+                author,
+                plugin_name,
+                plugin,
+            )
+            return self.success(data={'config': redact_plugin_secrets(config)})
+
+        @self.route(
+            '/<author>/<plugin_name>/config',
+            methods=['PUT'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(author: str, plugin_name: str, request_context: RequestContext) -> quart.Response:
+            await self.ap.plugin_connector.require_workspace_context(request_context)
+            plugin = await self.ap.plugin_connector.get_plugin_info(author, plugin_name)
+            if plugin is None:
+                return self.http_status(404, -1, 'plugin not found')
+            current_config = await self._get_stored_plugin_config(
+                request_context,
+                author,
+                plugin_name,
+                plugin,
+            )
+            try:
+                config = restore_plugin_secret_placeholders(
+                    await quart.request.json,
+                    current_config,
                )
-                persisted_config = result.scalar_one_or_none()
-
-                config = persisted_config if persisted_config is not None else plugin['plugin_config']
-                return self.success(data={'config': config})
-            elif quart.request.method == 'PUT':
-                data = await quart.request.json
-
-                await self.ap.plugin_connector.set_plugin_config(author, plugin_name, data)
-
-                return self.success(data={})
+            except ValueError as exc:
+                return self.http_status(400, -1, str(exc))
+            await self.ap.plugin_connector.require_workspace_context(request_context)
+            await self.ap.plugin_connector.set_plugin_config(author, plugin_name, config)
+            return self.success(data={})

        @self.route(
            '/<author>/<plugin_name>/readme',
            methods=['GET'],
            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
        )
-        async def _(author: str, plugin_name: str) -> quart.Response:
+        async def _(author: str, plugin_name: str, request_context: RequestContext) -> quart.Response:
+            await self.ap.plugin_connector.require_workspace_context(request_context)
            language = quart.request.args.get('language', 'en')
            readme = await self.ap.plugin_connector.get_plugin_readme(author, plugin_name, language=language)
            return self.success(data={'readme': readme})
@@ -275,8 +526,10 @@ class PluginsRouterGroup(group.RouterGroup):
            '/<author>/<plugin_name>/logs',
            methods=['GET'],
            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.AUDIT_VIEW,
        )
-        async def _(author: str, plugin_name: str) -> quart.Response:
+        async def _(author: str, plugin_name: str, request_context: RequestContext) -> quart.Response:
+            await self.ap.plugin_connector.require_workspace_context(request_context)
            try:
                limit = int(quart.request.args.get('limit', 200))
            except (TypeError, ValueError):
@@ -291,11 +544,12 @@ class PluginsRouterGroup(group.RouterGroup):
            auth_type=group.AuthType.NONE,
        )
        async def _(author: str, plugin_name: str) -> quart.Response:
+            await self._require_public_plugin_runtime_context()
            icon_data = await self.ap.plugin_connector.get_plugin_icon(author, plugin_name)
            icon_base64 = icon_data['plugin_icon_base64']
            mime_type = icon_data['mime_type']

-            icon_data = base64.b64decode(icon_base64)
+            icon_data = await asyncio.to_thread(base64.b64decode, icon_base64)

            return quart.Response(icon_data, mimetype=mime_type)

@@ -305,6 +559,7 @@ class PluginsRouterGroup(group.RouterGroup):
            auth_type=group.AuthType.NONE,
        )
        async def _(author: str, plugin_name: str, filepath: str) -> quart.Response:
+            await self._require_public_plugin_runtime_context()
            asset_path = _normalize_plugin_asset_path(filepath)
            if asset_path is None:
                return quart.Response('Asset not found', status=404)
@@ -312,7 +567,10 @@ class PluginsRouterGroup(group.RouterGroup):
            asset_data = await self.ap.plugin_connector.get_plugin_assets(author, plugin_name, asset_path)
            if not asset_data.get('asset_base64'):
                return quart.Response('Asset not found', status=404)
-            asset_bytes = base64.b64decode(asset_data['asset_base64'])
+            asset_bytes = await asyncio.to_thread(
+                base64.b64decode,
+                asset_data['asset_base64'],
+            )
            mime_type = asset_data['mime_type']
            resp = quart.Response(asset_bytes, mimetype=mime_type)
            # CSP for HTML pages served to sandboxed iframes (opaque origin).
@@ -334,9 +592,11 @@ class PluginsRouterGroup(group.RouterGroup):
            '/<author>/<plugin_name>/page-api',
            methods=['POST'],
            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
        )
-        async def _(author: str, plugin_name: str) -> str:
+        async def _(author: str, plugin_name: str, request_context: RequestContext) -> str:
            """Forward a page API request to the plugin."""
+            await self.ap.plugin_connector.require_workspace_context(request_context)
            data = await quart.request.json
            if not isinstance(data, dict):
                return self.http_status(400, -1, 'invalid request body')
@@ -357,9 +617,15 @@ class PluginsRouterGroup(group.RouterGroup):
                return self.http_status(400, -1, result['error'])
            return self.success(data=result.get('data'))

-        @self.route('/github/releases', methods=['POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _() -> str:
+        @self.route(
+            '/github/releases',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
            """Get releases from a GitHub repository URL"""
+            await self.ap.plugin_connector.require_workspace_context(request_context)
            data = await quart.request.json
            repo_url = data.get('repo_url', '')

@@ -400,10 +666,11 @@ class PluginsRouterGroup(group.RouterGroup):
                    trust_env=True,
                    follow_redirects=True,
                    timeout=10,
+                    event_hooks=httpclient.httpx_response_limit_hooks(),
                ) as client:
                    response = await client.get(url)
                    response.raise_for_status()
-                    releases = response.json()
+                    releases = await httpclient.parse_json_response(response)

                # Format releases data for frontend
                formatted_releases = []
@@ -427,16 +694,18 @@ class PluginsRouterGroup(group.RouterGroup):
                        'source_subdir': requested_subdir,
                    }
                )
-            except httpx.RequestError as e:
-                return self.http_status(500, -1, f'Failed to fetch releases: {str(e)}')
+            except httpx.RequestError:
+                raise

        @self.route(
            '/github/release-assets',
            methods=['POST'],
            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
        )
-        async def _() -> str:
+        async def _(request_context: RequestContext) -> str:
            """Get assets from a specific GitHub release"""
+            await self.ap.plugin_connector.require_workspace_context(request_context)
            data = await quart.request.json
            owner = data.get('owner', '')
            repo = data.get('repo', '')
@@ -452,12 +721,13 @@ class PluginsRouterGroup(group.RouterGroup):
                    trust_env=True,
                    follow_redirects=True,
                    timeout=10,
+                    event_hooks=httpclient.httpx_response_limit_hooks(),
                ) as client:
                    response = await client.get(
                        url,
                    )
                    response.raise_for_status()
-                    release = response.json()
+                    release = await httpclient.parse_json_response(response)

                # Format assets data for frontend
                formatted_assets = []
@@ -484,42 +754,61 @@ class PluginsRouterGroup(group.RouterGroup):
                # )

                return self.success(data={'assets': formatted_assets})
-            except httpx.RequestError as e:
-                return self.http_status(500, -1, f'Failed to fetch release assets: {str(e)}')
+            except httpx.RequestError:
+                raise

-        @self.route('/install/github', methods=['POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _() -> str:
+        @self.route(
+            '/install/github',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(request_context: RequestContext) -> str:
            """Install plugin from GitHub release asset"""
-            limit_error = await self._check_extensions_limit()
+            limit_error = await self._check_extensions_limit(request_context)
            if limit_error is not None:
                return limit_error

-            data = await quart.request.json
-            asset_url = data.get('asset_url', '')
-            owner = data.get('owner', '')
-            repo = data.get('repo', '')
-            release_tag = data.get('release_tag', '')
+            data = await quart.request.json or {}
+            try:
+                install_info = validate_github_plugin_install_info(
+                    {
+                        'asset_url': data.get('asset_url'),
+                        'asset_id': data.get('asset_id'),
+                        'release_id': data.get('release_id'),
+                        'owner': data.get('owner'),
+                        'repo': data.get('repo'),
+                        'release_tag': data.get('release_tag'),
+                        'github_url': f'https://github.com/{data.get("owner", "")}/{data.get("repo", "")}',
+                    }
+                )
+            except ValueError as exc:
+                return self.http_status(400, -1, str(exc))

-            if not asset_url:
-                return self.http_status(400, -1, 'Missing asset_url parameter')
+            owner = install_info['owner']
+            repo = install_info['repo']
+            release_tag = install_info['release_tag']
+
+            execution_context = await self.ap.plugin_connector.require_workspace_context(request_context)

            ctx = taskmgr.TaskContext.new()
            ctx.metadata['plugin_name'] = f'{owner}/{repo}'
            ctx.metadata['install_source'] = 'github'
-            install_info = {
-                'asset_url': asset_url,
-                'owner': owner,
-                'repo': repo,
-                'release_tag': release_tag,
-                'github_url': f'https://github.com/{owner}/{repo}',
-            }

            wrapper = self.ap.task_mgr.create_user_task(
-                self.ap.plugin_connector.install_plugin(PluginInstallSource.GITHUB, install_info, task_context=ctx),
+                self._run_fenced_plugin_operation(
+                    execution_context,
+                    lambda: self.ap.plugin_connector.install_plugin(
+                        PluginInstallSource.GITHUB,
+                        install_info,
+                        task_context=ctx,
+                    ),
+                ),
                kind='plugin-operation',
                name='plugin-install-github',
                label=f'Installing plugin from GitHub {owner}/{repo}@{release_tag}',
                context=ctx,
+                **self._task_scope(request_context),
            )

            return self.success(data={'task_id': wrapper.id})
@@ -528,9 +817,10 @@ class PluginsRouterGroup(group.RouterGroup):
            '/install/marketplace',
            methods=['POST'],
            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
        )
-        async def _() -> str:
-            limit_error = await self._check_extensions_limit()
+        async def _(request_context: RequestContext) -> str:
+            limit_error = await self._check_extensions_limit(request_context)
            if limit_error is not None:
                return limit_error

@@ -538,23 +828,37 @@ class PluginsRouterGroup(group.RouterGroup):

            plugin_author = data.get('plugin_author', '')
            plugin_name = data.get('plugin_name', '')
+            execution_context = await self.ap.plugin_connector.require_workspace_context(request_context)

            ctx = taskmgr.TaskContext.new()
            ctx.metadata['plugin_name'] = f'{plugin_author}/{plugin_name}'
            ctx.metadata['install_source'] = 'marketplace'
            wrapper = self.ap.task_mgr.create_user_task(
-                self.ap.plugin_connector.install_plugin(PluginInstallSource.MARKETPLACE, data, task_context=ctx),
+                self._run_fenced_plugin_operation(
+                    execution_context,
+                    lambda: self.ap.plugin_connector.install_plugin(
+                        PluginInstallSource.MARKETPLACE,
+                        data,
+                        task_context=ctx,
+                    ),
+                ),
                kind='plugin-operation',
                name='plugin-install-marketplace',
                label=f'Installing plugin from marketplace {plugin_author}/{plugin_name}',
                context=ctx,
+                **self._task_scope(request_context),
            )

            return self.success(data={'task_id': wrapper.id})

-        @self.route('/install/local', methods=['POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _() -> str:
-            limit_error = await self._check_extensions_limit()
+        @self.route(
+            '/install/local',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(request_context: RequestContext) -> str:
+            limit_error = await self._check_extensions_limit(request_context)
            if limit_error is not None:
                return limit_error

@@ -563,6 +867,7 @@ class PluginsRouterGroup(group.RouterGroup):
                return self.http_status(400, -1, 'file is required')

            file_bytes = file.read()
+            execution_context = await self.ap.plugin_connector.require_workspace_context(request_context)

            data = {
                'plugin_file': file_bytes,
@@ -572,74 +877,72 @@ class PluginsRouterGroup(group.RouterGroup):
            ctx.metadata['plugin_name'] = file.filename or 'local plugin'
            ctx.metadata['install_source'] = 'local'
            wrapper = self.ap.task_mgr.create_user_task(
-                self.ap.plugin_connector.install_plugin(PluginInstallSource.LOCAL, data, task_context=ctx),
+                self._run_fenced_plugin_operation(
+                    execution_context,
+                    lambda: self.ap.plugin_connector.install_plugin(
+                        PluginInstallSource.LOCAL,
+                        data,
+                        task_context=ctx,
+                    ),
+                ),
                kind='plugin-operation',
                name='plugin-install-local',
                label=f'Installing plugin from local {file.filename}',
                context=ctx,
+                **self._task_scope(request_context),
            )

            return self.success(data={'task_id': wrapper.id})

-        @self.route('/install/local/preview', methods=['POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _() -> str:
+        @self.route(
+            '/install/local/preview',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(request_context: RequestContext) -> str:
+            await self.ap.plugin_connector.require_workspace_context(request_context)
            file = (await quart.request.files).get('file')
            if file is None:
                return self.http_status(400, -1, 'file is required')

            file_bytes = file.read()
            try:
-                with zipfile.ZipFile(io.BytesIO(file_bytes)) as zf:
-                    names = [name for name in zf.namelist() if not name.endswith('/')]
-                    manifest_name = next(
-                        (
-                            name
-                            for name in names
-                            if name.replace('\\', '/').strip('/').lower() in ('manifest.yaml', 'manifest.yml')
-                        ),
-                        None,
-                    )
-                    if manifest_name is None:
-                        return self.http_status(400, -1, 'manifest.yaml is required')
+                manifest, requirements, names = await asyncio.to_thread(
+                    inspect_plugin_archive_metadata,
+                    file_bytes,
+                )
+                spec = manifest.get('spec') or {}
+                components = spec.get('components') or {}
+                component_counts = self._count_plugin_components(components, names)
+                component_types = list(component_counts.keys())

-                    manifest = yaml.safe_load(zf.read(manifest_name).decode('utf-8')) or {}
-                    requirements: list[str] = []
-                    requirements_name = next(
-                        (name for name in names if name.replace('\\', '/').strip('/').lower() == 'requirements.txt'),
-                        None,
-                    )
-                    if requirements_name is not None:
-                        requirements = [
-                            line.strip()
-                            for line in zf.read(requirements_name).decode('utf-8', errors='ignore').splitlines()
-                            if line.strip() and not line.strip().startswith('#')
-                        ]
+                return self.success(
+                    data={
+                        'filename': file.filename or 'local plugin',
+                        'size': len(file_bytes),
+                        'manifest': manifest,
+                        'metadata': manifest.get('metadata') or {},
+                        'component_types': component_types,
+                        'component_counts': component_counts,
+                        'requirements': requirements,
+                        'file_count': len(names),
+                    }
+                )
+            except (zipfile.BadZipFile, ValueError) as exc:
+                return self.http_status(400, -1, str(exc) or 'invalid .lbpkg file')
+            except Exception:
+                raise

-                    spec = manifest.get('spec') or {}
-                    components = spec.get('components') or {}
-                    component_counts = self._count_plugin_components(components, names)
-                    component_types = list(component_counts.keys())
-
-                    return self.success(
-                        data={
-                            'filename': file.filename or 'local plugin',
-                            'size': len(file_bytes),
-                            'manifest': manifest,
-                            'metadata': manifest.get('metadata') or {},
-                            'component_types': component_types,
-                            'component_counts': component_counts,
-                            'requirements': requirements,
-                            'file_count': len(names),
-                        }
-                    )
-            except zipfile.BadZipFile:
-                return self.http_status(400, -1, 'invalid .lbpkg file')
-            except Exception as exc:
-                return self.http_status(500, -1, f'Failed to preview plugin package: {exc}')
-
-        @self.route('/config-files', methods=['POST'], auth_type=group.AuthType.USER_TOKEN)
-        async def _() -> str:
+        @self.route(
+            '/config-files',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(request_context: RequestContext) -> str:
            """Upload a file for plugin configuration"""
+            await self.ap.plugin_connector.require_workspace_context(request_context)
            file = (await quart.request.files).get('file')
            if file is None:
                return self.http_status(400, -1, 'file is required')
@@ -650,25 +953,37 @@ class PluginsRouterGroup(group.RouterGroup):
            if len(file_bytes) > MAX_FILE_SIZE:
                return self.http_status(400, -1, 'file size exceeds 10MB limit')

-            # Generate unique file key with original extension
-            original_filename = file.filename
+            original_filename = file.filename or 'config.bin'
            _, ext = os.path.splitext(original_filename)
-            file_key = f'plugin_config_{uuid.uuid4().hex}{ext}'
-
-            # Save file using storage manager
-            await self.ap.storage_mgr.storage_provider.save(file_key, file_bytes)
+            logical_key = f'plugin_config_{uuid.uuid4().hex}{ext}'
+            file_key = await self.ap.storage_mgr.save_scoped(
+                request_context,
+                owner_type='plugin_config',
+                owner=request_context.workspace_uuid,
+                key=logical_key,
+                value=file_bytes,
+            )

            return self.success(data={'file_key': file_key})

-        @self.route('/config-files/<file_key>', methods=['DELETE'], auth_type=group.AuthType.USER_TOKEN)
-        async def _(file_key: str) -> str:
+        @self.route(
+            '/config-files/<path:file_key>',
+            methods=['DELETE'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(file_key: str, request_context: RequestContext) -> str:
            """Delete a plugin configuration file"""
-            # Only allow deletion of files with plugin_config_ prefix for security
-            if not file_key.startswith('plugin_config_'):
+            await self.ap.plugin_connector.require_workspace_context(request_context)
+            if not self.ap.storage_mgr.is_scoped_object_key(file_key, expected_owner_type='plugin_config'):
                return self.http_status(400, -1, 'invalid file key')

            try:
-                await self.ap.storage_mgr.storage_provider.delete(file_key)
+                await self.ap.storage_mgr.delete_scoped_object_key(
+                    request_context,
+                    file_key,
+                    expected_owner_type='plugin_config',
+                )
                return self.success(data={'deleted': True})
-            except Exception as e:
-                return self.http_status(500, -1, f'failed to delete file: {str(e)}')
+            except Exception:
+                raise
@@ -1,147 +1,292 @@
 import quart

+from ....authz import Permission, has_permission
+from ....context import RequestContext
 from ... import group


@group.group_class('models/llm', '/api/v1/provider/models/llm')
 class LLMModelsRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('', methods=['GET', 'POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _() -> str:
-            if quart.request.method == 'GET':
-                provider_uuid = quart.request.args.get('provider_uuid')
-                if provider_uuid:
-                    return self.success(
-                        data={'models': await self.ap.llm_model_service.get_llm_models_by_provider(provider_uuid)}
-                    )
-                return self.success(data={'models': await self.ap.llm_model_service.get_llm_models()})
-            elif quart.request.method == 'POST':
-                json_data = await quart.request.json
-                model_uuid = await self.ap.llm_model_service.create_llm_model(json_data)
-                return self.success(data={'uuid': model_uuid})
+        @self.route(
+            '',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
+            provider_uuid = quart.request.args.get('provider_uuid')
+            include_secret = has_permission(request_context, Permission.PROVIDER_SECRET_MANAGE)
+            if provider_uuid:
+                models = await self.ap.llm_model_service.get_llm_models_by_provider(
+                    request_context,
+                    provider_uuid,
+                    include_secret=include_secret,
+                )
+            else:
+                models = await self.ap.llm_model_service.get_llm_models(
+                    request_context,
+                    include_secret=include_secret,
+                )
+            return self.success(data={'models': models})

-        @self.route('/<model_uuid>', methods=['GET', 'PUT', 'DELETE'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _(model_uuid: str) -> str:
-            if quart.request.method == 'GET':
-                model = await self.ap.llm_model_service.get_llm_model(model_uuid)
+        @self.route(
+            '',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.PROVIDER_SECRET_MANAGE,
+        )
+        async def _(request_context: RequestContext) -> str:
+            try:
+                model_uuid = await self.ap.llm_model_service.create_llm_model(
+                    request_context,
+                    await quart.request.json,
+                )
+            except ValueError as exc:
+                return self.http_status(400, -1, str(exc))
+            return self.success(data={'uuid': model_uuid})

-                if model is None:
-                    return self.http_status(404, -1, 'model not found')
+        @self.route(
+            '/<model_uuid>',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(model_uuid: str, request_context: RequestContext) -> str:
+            model = await self.ap.llm_model_service.get_llm_model(
+                request_context,
+                model_uuid,
+                include_secret=has_permission(request_context, Permission.PROVIDER_SECRET_MANAGE),
+            )
+            if model is None:
+                return self.http_status(404, -1, 'model not found')
+            return self.success(data={'model': model})

-                return self.success(data={'model': model})
-            elif quart.request.method == 'PUT':
-                json_data = await quart.request.json
+        @self.route(
+            '/<model_uuid>',
+            methods=['PUT'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.PROVIDER_SECRET_MANAGE,
+        )
+        async def _(model_uuid: str, request_context: RequestContext) -> str:
+            try:
+                await self.ap.llm_model_service.update_llm_model(
+                    request_context,
+                    model_uuid,
+                    await quart.request.json,
+                )
+            except ValueError as exc:
+                return self.http_status(400, -1, str(exc))
+            return self.success()

-                await self.ap.llm_model_service.update_llm_model(model_uuid, json_data)
-
-                return self.success()
-            elif quart.request.method == 'DELETE':
-                await self.ap.llm_model_service.delete_llm_model(model_uuid)
-
-                return self.success()
-
-        @self.route('/<model_uuid>/test', methods=['POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _(model_uuid: str) -> str:
-            json_data = await quart.request.json
-
-            await self.ap.llm_model_service.test_llm_model(model_uuid, json_data)
+        @self.route(
+            '/<model_uuid>',
+            methods=['DELETE'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(model_uuid: str, request_context: RequestContext) -> str:
+            await self.ap.llm_model_service.delete_llm_model(request_context, model_uuid)
+            return self.success()

+        @self.route(
+            '/<model_uuid>/test',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.PROVIDER_SECRET_MANAGE,
+        )
+        async def _(model_uuid: str, request_context: RequestContext) -> str:
+            await self.ap.llm_model_service.test_llm_model(request_context, model_uuid, await quart.request.json)
            return self.success()


@group.group_class('models/embedding', '/api/v1/provider/models/embedding')
 class EmbeddingModelsRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('', methods=['GET', 'POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _() -> str:
-            if quart.request.method == 'GET':
-                provider_uuid = quart.request.args.get('provider_uuid')
-                if provider_uuid:
-                    return self.success(
-                        data={
-                            'models': await self.ap.embedding_models_service.get_embedding_models_by_provider(
-                                provider_uuid
-                            )
-                        }
-                    )
-                return self.success(data={'models': await self.ap.embedding_models_service.get_embedding_models()})
-            elif quart.request.method == 'POST':
-                json_data = await quart.request.json
-                model_uuid = await self.ap.embedding_models_service.create_embedding_model(json_data)
-                return self.success(data={'uuid': model_uuid})
+        @self.route(
+            '',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
+            provider_uuid = quart.request.args.get('provider_uuid')
+            include_secret = has_permission(request_context, Permission.PROVIDER_SECRET_MANAGE)
+            if provider_uuid:
+                models = await self.ap.embedding_models_service.get_embedding_models_by_provider(
+                    request_context,
+                    provider_uuid,
+                    include_secret=include_secret,
+                )
+            else:
+                models = await self.ap.embedding_models_service.get_embedding_models(
+                    request_context,
+                    include_secret=include_secret,
+                )
+            return self.success(data={'models': models})

-        @self.route('/<model_uuid>', methods=['GET', 'PUT', 'DELETE'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _(model_uuid: str) -> str:
-            if quart.request.method == 'GET':
-                model = await self.ap.embedding_models_service.get_embedding_model(model_uuid)
+        @self.route(
+            '',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.PROVIDER_SECRET_MANAGE,
+        )
+        async def _(request_context: RequestContext) -> str:
+            try:
+                model_uuid = await self.ap.embedding_models_service.create_embedding_model(
+                    request_context,
+                    await quart.request.json,
+                )
+            except ValueError as exc:
+                return self.http_status(400, -1, str(exc))
+            return self.success(data={'uuid': model_uuid})

-                if model is None:
-                    return self.http_status(404, -1, 'model not found')
+        @self.route(
+            '/<model_uuid>',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(model_uuid: str, request_context: RequestContext) -> str:
+            model = await self.ap.embedding_models_service.get_embedding_model(
+                request_context,
+                model_uuid,
+                include_secret=has_permission(request_context, Permission.PROVIDER_SECRET_MANAGE),
+            )
+            if model is None:
+                return self.http_status(404, -1, 'model not found')
+            return self.success(data={'model': model})

-                return self.success(data={'model': model})
-            elif quart.request.method == 'PUT':
-                json_data = await quart.request.json
+        @self.route(
+            '/<model_uuid>',
+            methods=['PUT'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.PROVIDER_SECRET_MANAGE,
+        )
+        async def _(model_uuid: str, request_context: RequestContext) -> str:
+            try:
+                await self.ap.embedding_models_service.update_embedding_model(
+                    request_context,
+                    model_uuid,
+                    await quart.request.json,
+                )
+            except ValueError as exc:
+                return self.http_status(400, -1, str(exc))
+            return self.success()

-                await self.ap.embedding_models_service.update_embedding_model(model_uuid, json_data)
-
-                return self.success()
-            elif quart.request.method == 'DELETE':
-                await self.ap.embedding_models_service.delete_embedding_model(model_uuid)
-
-                return self.success()
-
-        @self.route('/<model_uuid>/test', methods=['POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _(model_uuid: str) -> str:
-            json_data = await quart.request.json
-
-            await self.ap.embedding_models_service.test_embedding_model(model_uuid, json_data)
+        @self.route(
+            '/<model_uuid>',
+            methods=['DELETE'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(model_uuid: str, request_context: RequestContext) -> str:
+            await self.ap.embedding_models_service.delete_embedding_model(request_context, model_uuid)
+            return self.success()

+        @self.route(
+            '/<model_uuid>/test',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.PROVIDER_SECRET_MANAGE,
+        )
+        async def _(model_uuid: str, request_context: RequestContext) -> str:
+            await self.ap.embedding_models_service.test_embedding_model(
+                request_context, model_uuid, await quart.request.json
+            )
            return self.success()


@group.group_class('models/rerank', '/api/v1/provider/models/rerank')
 class RerankModelsRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('', methods=['GET', 'POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _() -> str:
-            if quart.request.method == 'GET':
-                provider_uuid = quart.request.args.get('provider_uuid')
-                if provider_uuid:
-                    return self.success(
-                        data={
-                            'models': await self.ap.rerank_models_service.get_rerank_models_by_provider(provider_uuid)
-                        }
-                    )
-                return self.success(data={'models': await self.ap.rerank_models_service.get_rerank_models()})
-            elif quart.request.method == 'POST':
-                json_data = await quart.request.json
-                model_uuid = await self.ap.rerank_models_service.create_rerank_model(json_data)
-                return self.success(data={'uuid': model_uuid})
+        @self.route(
+            '',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
+            provider_uuid = quart.request.args.get('provider_uuid')
+            include_secret = has_permission(request_context, Permission.PROVIDER_SECRET_MANAGE)
+            if provider_uuid:
+                models = await self.ap.rerank_models_service.get_rerank_models_by_provider(
+                    request_context,
+                    provider_uuid,
+                    include_secret=include_secret,
+                )
+            else:
+                models = await self.ap.rerank_models_service.get_rerank_models(
+                    request_context,
+                    include_secret=include_secret,
+                )
+            return self.success(data={'models': models})

-        @self.route('/<model_uuid>', methods=['GET', 'PUT', 'DELETE'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _(model_uuid: str) -> str:
-            if quart.request.method == 'GET':
-                model = await self.ap.rerank_models_service.get_rerank_model(model_uuid)
+        @self.route(
+            '',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.PROVIDER_SECRET_MANAGE,
+        )
+        async def _(request_context: RequestContext) -> str:
+            try:
+                model_uuid = await self.ap.rerank_models_service.create_rerank_model(
+                    request_context,
+                    await quart.request.json,
+                )
+            except ValueError as exc:
+                return self.http_status(400, -1, str(exc))
+            return self.success(data={'uuid': model_uuid})

-                if model is None:
-                    return self.http_status(404, -1, 'model not found')
-
-                return self.success(data={'model': model})
-            elif quart.request.method == 'PUT':
-                json_data = await quart.request.json
-
-                await self.ap.rerank_models_service.update_rerank_model(model_uuid, json_data)
-
-                return self.success()
-            elif quart.request.method == 'DELETE':
-                await self.ap.rerank_models_service.delete_rerank_model(model_uuid)
-
-                return self.success()
-
-        @self.route('/<model_uuid>/test', methods=['POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _(model_uuid: str) -> str:
-            json_data = await quart.request.json
-
-            await self.ap.rerank_models_service.test_rerank_model(model_uuid, json_data)
+        @self.route(
+            '/<model_uuid>',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(model_uuid: str, request_context: RequestContext) -> str:
+            model = await self.ap.rerank_models_service.get_rerank_model(
+                request_context,
+                model_uuid,
+                include_secret=has_permission(request_context, Permission.PROVIDER_SECRET_MANAGE),
+            )
+            if model is None:
+                return self.http_status(404, -1, 'model not found')
+            return self.success(data={'model': model})

+        @self.route(
+            '/<model_uuid>',
+            methods=['PUT'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.PROVIDER_SECRET_MANAGE,
+        )
+        async def _(model_uuid: str, request_context: RequestContext) -> str:
+            try:
+                await self.ap.rerank_models_service.update_rerank_model(
+                    request_context,
+                    model_uuid,
+                    await quart.request.json,
+                )
+            except ValueError as exc:
+                return self.http_status(400, -1, str(exc))
+            return self.success()
+
+        @self.route(
+            '/<model_uuid>',
+            methods=['DELETE'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(model_uuid: str, request_context: RequestContext) -> str:
+            await self.ap.rerank_models_service.delete_rerank_model(request_context, model_uuid)
+            return self.success()
+
+        @self.route(
+            '/<model_uuid>/test',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.PROVIDER_SECRET_MANAGE,
+        )
+        async def _(model_uuid: str, request_context: RequestContext) -> str:
+            await self.ap.rerank_models_service.test_rerank_model(request_context, model_uuid, await quart.request.json)
            return self.success()
@@ -1,56 +1,102 @@
 import quart

+from ....authz import Permission, has_permission
+from ....context import RequestContext
 from ... import group


@group.group_class('models/providers', '/api/v1/provider/providers')
 class ModelProvidersRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('', methods=['GET', 'POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _() -> str:
-            if quart.request.method == 'GET':
-                providers = await self.ap.provider_service.get_providers()
-                # Add model counts
-                for provider in providers:
-                    counts = await self.ap.provider_service.get_provider_model_counts(provider['uuid'])
-                    provider['llm_count'] = counts['llm_count']
-                    provider['embedding_count'] = counts['embedding_count']
-                    provider['rerank_count'] = counts['rerank_count']
-                return self.success(data={'providers': providers})
-            elif quart.request.method == 'POST':
-                json_data = await quart.request.json
-                provider_uuid = await self.ap.provider_service.create_provider(json_data)
-                return self.success(data={'uuid': provider_uuid})
-
        @self.route(
-            '/<provider_uuid>', methods=['GET', 'PUT', 'DELETE'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY
+            '',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
        )
-        async def _(provider_uuid: str) -> str:
-            if quart.request.method == 'GET':
-                provider = await self.ap.provider_service.get_provider(provider_uuid)
-                if provider is None:
-                    return self.http_status(404, -1, 'provider not found')
-                counts = await self.ap.provider_service.get_provider_model_counts(provider_uuid)
+        async def _(request_context: RequestContext) -> str:
+            providers = await self.ap.provider_service.get_providers(
+                request_context,
+                include_secret=has_permission(request_context, Permission.PROVIDER_SECRET_MANAGE),
+            )
+            for provider in providers:
+                counts = await self.ap.provider_service.get_provider_model_counts(request_context, provider['uuid'])
                provider['llm_count'] = counts['llm_count']
                provider['embedding_count'] = counts['embedding_count']
                provider['rerank_count'] = counts['rerank_count']
-                return self.success(data={'provider': provider})
-            elif quart.request.method == 'PUT':
-                json_data = await quart.request.json
-                await self.ap.provider_service.update_provider(provider_uuid, json_data)
-                return self.success()
-            elif quart.request.method == 'DELETE':
-                try:
-                    await self.ap.provider_service.delete_provider(provider_uuid)
-                    return self.success()
-                except ValueError as e:
-                    return self.http_status(400, -1, str(e))
+            return self.success(data={'providers': providers})

-        @self.route('/<provider_uuid>/scan-models', methods=['GET'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def _(provider_uuid: str) -> str:
+        @self.route(
+            '',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.PROVIDER_SECRET_MANAGE,
+        )
+        async def _(request_context: RequestContext) -> str:
+            json_data = await quart.request.json
+            try:
+                provider_uuid = await self.ap.provider_service.create_provider(request_context, json_data)
+            except ValueError as exc:
+                return self.http_status(400, -1, str(exc))
+            return self.success(data={'uuid': provider_uuid})
+
+        @self.route(
+            '/<provider_uuid>',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(provider_uuid: str, request_context: RequestContext) -> str:
+            provider = await self.ap.provider_service.get_provider(
+                request_context,
+                provider_uuid,
+                include_secret=has_permission(request_context, Permission.PROVIDER_SECRET_MANAGE),
+            )
+            if provider is None:
+                return self.http_status(404, -1, 'provider not found')
+            counts = await self.ap.provider_service.get_provider_model_counts(request_context, provider_uuid)
+            provider['llm_count'] = counts['llm_count']
+            provider['embedding_count'] = counts['embedding_count']
+            provider['rerank_count'] = counts['rerank_count']
+            return self.success(data={'provider': provider})
+
+        @self.route(
+            '/<provider_uuid>',
+            methods=['PUT'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.PROVIDER_SECRET_MANAGE,
+        )
+        async def _(provider_uuid: str, request_context: RequestContext) -> str:
+            json_data = await quart.request.json
+            try:
+                await self.ap.provider_service.update_provider(request_context, provider_uuid, json_data)
+            except ValueError as exc:
+                return self.http_status(400, -1, str(exc))
+            return self.success()
+
+        @self.route(
+            '/<provider_uuid>',
+            methods=['DELETE'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(provider_uuid: str, request_context: RequestContext) -> str:
+            try:
+                await self.ap.provider_service.delete_provider(request_context, provider_uuid)
+                return self.success()
+            except ValueError as e:
+                return self.http_status(400, -1, str(e))
+
+        @self.route(
+            '/<provider_uuid>/scan-models',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.PROVIDER_SECRET_MANAGE,
+        )
+        async def _(provider_uuid: str, request_context: RequestContext) -> str:
            try:
                model_type = quart.request.args.get('type')
-                result = await self.ap.provider_service.scan_provider_models(provider_uuid, model_type)
+                result = await self.ap.provider_service.scan_provider_models(request_context, provider_uuid, model_type)
                return self.success(data=result)
            except ValueError as e:
                return self.http_status(400, -1, str(e))
@@ -1,103 +1,138 @@
 from __future__ import annotations

 import quart
-import traceback
 from urllib.parse import unquote

-
+from ....authz import Permission
+from ....context import RequestContext
+from ......provider.tools.loaders.mcp_policy import MCPStdioDisabledError
 from ... import group


@group.group_class('mcp', '/api/v1/mcp')
 class MCPRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('/servers', methods=['GET', 'POST'], auth_type=group.AuthType.USER_TOKEN)
-        async def _() -> str:
-            """获取MCP服务器列表"""
-            if quart.request.method == 'GET':
-                servers = await self.ap.mcp_service.get_mcp_servers(contain_runtime_info=True)
-
-                return self.success(data={'servers': servers})
-
-            elif quart.request.method == 'POST':
-                data = await quart.request.json
-
-                try:
-                    uuid = await self.ap.mcp_service.create_mcp_server(data)
-                    return self.success(data={'uuid': uuid})
-                except Exception as e:
-                    traceback.print_exc()
-                    return self.http_status(500, -1, f'Failed to create MCP server: {str(e)}')
+        @self.route(
+            '/servers',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
+            servers = await self.ap.mcp_service.get_mcp_servers(request_context, contain_runtime_info=True)
+            return self.success(data={'servers': servers})

        @self.route(
-            '/servers/<path:server_name>', methods=['GET', 'PUT', 'DELETE'], auth_type=group.AuthType.USER_TOKEN
+            '/servers',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
        )
-        async def _(server_name: str) -> str:
-            """获取、更新或删除MCP服务器配置"""
-            server_name = unquote(server_name)
+        async def _(request_context: RequestContext) -> str:
+            data = await quart.request.json
+            try:
+                server_uuid = await self.ap.mcp_service.create_mcp_server(request_context, data)
+            except MCPStdioDisabledError as exc:
+                return self.http_status(403, exc.code, str(exc))
+            except ValueError as exc:
+                return self.http_status(400, -1, str(exc))
+            return self.success(data={'uuid': server_uuid})

-            server_data = await self.ap.mcp_service.get_mcp_server_by_name(server_name)
+        @self.route(
+            '/servers/<path:server_name>',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(server_name: str, request_context: RequestContext) -> str:
+            server_name = unquote(server_name)
+            server_data = await self.ap.mcp_service.get_mcp_server_by_name(request_context, server_name)
            if server_data is None:
                return self.http_status(404, -1, 'Server not found')
+            return self.success(data={'server': server_data})

-            if quart.request.method == 'GET':
-                return self.success(data={'server': server_data})
-
-            elif quart.request.method == 'PUT':
+        @self.route(
+            '/servers/<path:server_name>',
+            methods=['PUT', 'DELETE'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(server_name: str, request_context: RequestContext) -> str:
+            server_name = unquote(server_name)
+            server_data = await self.ap.mcp_service.get_mcp_server_by_name(request_context, server_name)
+            if server_data is None:
+                return self.http_status(404, -1, 'Server not found')
+            if quart.request.method == 'PUT':
                data = await quart.request.json
                try:
-                    await self.ap.mcp_service.update_mcp_server(server_data['uuid'], data)
-                    return self.success()
-                except Exception as e:
-                    return self.http_status(500, -1, f'Failed to update MCP server: {str(e)}')
+                    await self.ap.mcp_service.update_mcp_server(request_context, server_data['uuid'], data)
+                except MCPStdioDisabledError as exc:
+                    return self.http_status(403, exc.code, str(exc))
+                except ValueError as exc:
+                    return self.http_status(400, -1, str(exc))
+            else:
+                await self.ap.mcp_service.delete_mcp_server(request_context, server_data['uuid'])
+            return self.success()

-            elif quart.request.method == 'DELETE':
-                try:
-                    await self.ap.mcp_service.delete_mcp_server(server_data['uuid'])
-                    return self.success()
-                except Exception as e:
-                    return self.http_status(500, -1, f'Failed to delete MCP server: {str(e)}')
-
-        @self.route('/servers/<path:server_name>/test', methods=['POST'], auth_type=group.AuthType.USER_TOKEN)
-        async def _(server_name: str) -> str:
+        @self.route(
+            '/servers/<path:server_name>/test',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(server_name: str, request_context: RequestContext) -> str:
            """测试MCP服务器连接"""
            server_name = unquote(server_name)
            server_data = await quart.request.json
-            task_id = await self.ap.mcp_service.test_mcp_server(server_name=server_name, server_data=server_data)
+            try:
+                task_id = await self.ap.mcp_service.test_mcp_server(
+                    request_context,
+                    server_name=server_name,
+                    server_data=server_data,
+                )
+            except MCPStdioDisabledError as exc:
+                return self.http_status(403, exc.code, str(exc))
            return self.success(data={'task_id': task_id})

-        @self.route('/servers/<path:server_name>/resources', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def _(server_name: str) -> str:
+        @self.route(
+            '/servers/<path:server_name>/resources',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(server_name: str, request_context: RequestContext) -> str:
            """Get resources from an MCP server"""
            server_name = unquote(server_name)
-            try:
-                resources = await self.ap.mcp_service.get_mcp_server_resources(server_name)
-                templates = await self.ap.mcp_service.get_mcp_server_resource_templates(server_name)
-                runtime_info = await self.ap.mcp_service.get_runtime_info(server_name)
-                return self.success(
-                    data={
-                        'resources': resources,
-                        'resource_templates': templates,
-                        'resource_capabilities': (runtime_info or {}).get('resource_capabilities', {}),
-                    }
-                )
-            except Exception as e:
-                return self.http_status(500, -1, f'Failed to get resources: {str(e)}')
+            resources = await self.ap.mcp_service.get_mcp_server_resources(request_context, server_name)
+            templates = await self.ap.mcp_service.get_mcp_server_resource_templates(request_context, server_name)
+            runtime_info = await self.ap.mcp_service.get_runtime_info(request_context, server_name)
+            return self.success(
+                data={
+                    'resources': resources,
+                    'resource_templates': templates,
+                    'resource_capabilities': (runtime_info or {}).get('resource_capabilities', {}),
+                }
+            )

        @self.route(
-            '/servers/<path:server_name>/resource-templates', methods=['GET'], auth_type=group.AuthType.USER_TOKEN
+            '/servers/<path:server_name>/resource-templates',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
        )
-        async def _(server_name: str) -> str:
+        async def _(server_name: str, request_context: RequestContext) -> str:
            """Get resource templates from an MCP server"""
            server_name = unquote(server_name)
-            try:
-                templates = await self.ap.mcp_service.get_mcp_server_resource_templates(server_name)
-                return self.success(data={'resource_templates': templates})
-            except Exception as e:
-                return self.http_status(500, -1, f'Failed to get resource templates: {str(e)}')
+            templates = await self.ap.mcp_service.get_mcp_server_resource_templates(request_context, server_name)
+            return self.success(data={'resource_templates': templates})

-        @self.route('/servers/<path:server_name>/logs', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def _(server_name: str) -> str:
+        @self.route(
+            '/servers/<path:server_name>/logs',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.AUDIT_VIEW,
+        )
+        async def _(server_name: str, request_context: RequestContext) -> str:
            """Get logs from an MCP server"""
            server_name = unquote(server_name)
            try:
@@ -106,24 +141,32 @@ class MCPRouterGroup(group.RouterGroup):
                limit = 200
            limit = min(limit, 500)
            level = quart.request.args.get('level') or None
-            logs = await self.ap.mcp_service.get_mcp_server_logs(server_name, limit=limit, level=level)
+            logs = await self.ap.mcp_service.get_mcp_server_logs(
+                request_context,
+                server_name,
+                limit=limit,
+                level=level,
+            )
            return self.success(data={'logs': logs})

-        @self.route('/servers/<path:server_name>/resources/read', methods=['POST'], auth_type=group.AuthType.USER_TOKEN)
-        async def _(server_name: str) -> str:
+        @self.route(
+            '/servers/<path:server_name>/resources/read',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(server_name: str, request_context: RequestContext) -> str:
            """Read a resource from an MCP server"""
            server_name = unquote(server_name)
            data = await quart.request.json
            uri = data.get('uri')
            if not uri:
                return self.http_status(400, -1, 'URI is required')
-            try:
-                envelope = await self.ap.mcp_service.read_mcp_server_resource_envelope(
-                    server_name,
-                    uri,
-                    max_bytes=data.get('max_bytes'),
-                    include_blob=bool(data.get('include_blob', False)),
-                )
-                return self.success(data=envelope)
-            except Exception as e:
-                return self.http_status(500, -1, f'Failed to read resource: {str(e)}')
+            envelope = await self.ap.mcp_service.read_mcp_server_resource_envelope(
+                request_context,
+                server_name,
+                uri,
+                max_bytes=data.get('max_bytes'),
+                include_blob=bool(data.get('include_blob', False)),
+            )
+            return self.success(data=envelope)
@@ -2,21 +2,28 @@ from __future__ import annotations

 import quart

+from ....authz import Permission
+from ....context import RequestContext
 from ... import group


@group.group_class('tools', '/api/v1/tools')
 class ToolsRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def _() -> str:
+        @self.route(
+            '',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
            """获取所有可用工具列表"""
            pipeline_uuid = quart.request.args.get('pipeline_uuid') or quart.request.args.get('pipeline_id')
            bound_plugins: list[str] | None = None
            bound_mcp_servers: list[str] | None = None

            if pipeline_uuid:
-                pipeline = await self.ap.pipeline_service.get_pipeline(pipeline_uuid)
+                pipeline = await self.ap.pipeline_service.get_pipeline(request_context, pipeline_uuid)
                if pipeline is None:
                    return self.http_status(404, -1, 'pipeline not found')

@@ -35,6 +42,7 @@ class ToolsRouterGroup(group.RouterGroup):
            return self.success(
                data={
                    'tools': await self.ap.tool_mgr.get_tool_catalog(
+                        request_context,
                        bound_plugins,
                        bound_mcp_servers,
                        include_skill_authoring=True,
@@ -42,10 +50,15 @@ class ToolsRouterGroup(group.RouterGroup):
                }
            )

-        @self.route('/<tool_name>', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def _(tool_name: str) -> str:
+        @self.route(
+            '/<tool_name>',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(tool_name: str, request_context: RequestContext) -> str:
            """获取特定工具详情"""
-            tools = await self.ap.tool_mgr.get_all_tools(include_skill_authoring=True)
+            tools = await self.ap.tool_mgr.get_all_tools(request_context, include_skill_authoring=True)

            for tool in tools:
                if tool.name == tool_name:
@@ -2,8 +2,11 @@ from __future__ import annotations

 import quart

+from langbot.pkg.cloud.entitlements import EntitlementFeatureUnavailableError
 from langbot_plugin.box.errors import BoxError

+from ...authz import Permission
+from ...context import RequestContext
 from .. import group


@@ -12,58 +15,91 @@ class SkillsRouterGroup(group.RouterGroup):
    """Skills management API endpoints."""

    async def initialize(self) -> None:
-        @self.route('', methods=['GET', 'POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def list_or_create_skills() -> quart.Response:
-            if quart.request.method == 'GET':
-                try:
-                    skills = await self.ap.skill_service.list_skills()
-                except (ValueError, BoxError) as exc:
-                    return self.http_status(400, -1, str(exc))
-                return self.success(data={'skills': skills})
+        @self.route(
+            '',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def list_skills(request_context: RequestContext) -> quart.Response:
+            try:
+                skills = await self.ap.skill_service.list_skills(request_context)
+            except EntitlementFeatureUnavailableError:
+                # Plans without managed sandbox support have no runnable skills.
+                # Treat that capability absence as an empty collection so the
+                # shared UI can render normally instead of surfacing a 500.
+                return self.success(data={'skills': []})
+            except (ValueError, BoxError) as exc:
+                return self.http_status(400, -1, str(exc))
+            return self.success(data={'skills': skills})

+        @self.route(
+            '',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def create_skill(request_context: RequestContext) -> quart.Response:
            data = await quart.request.json
            if 'name' not in data or not data['name']:
                return self.http_status(400, -1, 'Missing required field: name')

            try:
-                skill = await self.ap.skill_service.create_skill(data)
+                skill = await self.ap.skill_service.create_skill(request_context, data)
                return self.success(data={'skill': skill})
            except (ValueError, BoxError) as exc:
                return self.http_status(400, -1, str(exc))

-        @self.route('/<skill_name>', methods=['GET', 'PUT', 'DELETE'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def get_update_delete_skill(skill_name: str) -> quart.Response:
-            if quart.request.method == 'GET':
-                try:
-                    skill = await self.ap.skill_service.get_skill(skill_name)
-                except (ValueError, BoxError) as exc:
-                    return self.http_status(400, -1, str(exc))
-                if not skill:
-                    return self.http_status(404, -1, 'Skill not found')
-                return self.success(data={'skill': skill})
+        @self.route(
+            '/<skill_name>',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def get_skill(skill_name: str, request_context: RequestContext) -> quart.Response:
+            try:
+                skill = await self.ap.skill_service.get_skill(request_context, skill_name)
+            except (ValueError, BoxError) as exc:
+                return self.http_status(400, -1, str(exc))
+            if not skill:
+                return self.http_status(404, -1, 'Skill not found')
+            return self.success(data={'skill': skill})

+        @self.route(
+            '/<skill_name>',
+            methods=['PUT', 'DELETE'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def update_delete_skill(skill_name: str, request_context: RequestContext) -> quart.Response:
            if quart.request.method == 'PUT':
                data = await quart.request.json
                try:
-                    skill = await self.ap.skill_service.update_skill(skill_name, data)
+                    skill = await self.ap.skill_service.update_skill(request_context, skill_name, data)
                    return self.success(data={'skill': skill})
                except (ValueError, BoxError) as exc:
                    return self.http_status(400, -1, str(exc))

            try:
-                await self.ap.skill_service.delete_skill(skill_name)
+                await self.ap.skill_service.delete_skill(request_context, skill_name)
                return self.success()
            except (ValueError, BoxError) as exc:
                return self.http_status(400, -1, str(exc))

-        @self.route('/<skill_name>/files', methods=['GET'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def list_skill_files(skill_name: str) -> quart.Response:
+        @self.route(
+            '/<skill_name>/files',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def list_skill_files(skill_name: str, request_context: RequestContext) -> quart.Response:
            """List files in skill package directory."""
            path = quart.request.args.get('path', '.').strip()
            include_hidden = quart.request.args.get('include_hidden', 'false').lower() == 'true'

            try:
                result = await self.ap.skill_service.list_skill_files(
+                    request_context,
                    skill_name,
                    path=path,
                    include_hidden=include_hidden,
@@ -73,38 +109,55 @@ class SkillsRouterGroup(group.RouterGroup):
                return self.http_status(400, -1, str(exc))

        @self.route(
-            '/<skill_name>/files/<path:path>', methods=['GET', 'PUT'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY
+            '/<skill_name>/files/<path:path>',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
        )
-        async def read_or_write_skill_file(skill_name: str, path: str) -> quart.Response:
-            """Read or write a file in skill package."""
-            if quart.request.method == 'GET':
-                try:
-                    result = await self.ap.skill_service.read_skill_file(skill_name, path)
-                    return self.success(data=result)
-                except (ValueError, BoxError) as exc:
-                    return self.http_status(400, -1, str(exc))
+        async def read_skill_file(skill_name: str, path: str, request_context: RequestContext) -> quart.Response:
+            try:
+                result = await self.ap.skill_service.read_skill_file(request_context, skill_name, path)
+                return self.success(data=result)
+            except (ValueError, BoxError) as exc:
+                return self.http_status(400, -1, str(exc))

-            # PUT - write file
+        @self.route(
+            '/<skill_name>/files/<path:path>',
+            methods=['PUT'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def write_skill_file(skill_name: str, path: str, request_context: RequestContext) -> quart.Response:
            data = await quart.request.json
            content = data.get('content', '')
            if content is None:
                return self.http_status(400, -1, 'Missing required field: content')

            try:
-                result = await self.ap.skill_service.write_skill_file(skill_name, path, content)
+                result = await self.ap.skill_service.write_skill_file(request_context, skill_name, path, content)
                return self.success(data=result)
            except (ValueError, BoxError) as exc:
                return self.http_status(400, -1, str(exc))

-        @self.route('/<skill_name>/preview', methods=['GET'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def preview_skill(skill_name: str) -> quart.Response:
-            skill = self.ap.skill_mgr.get_skill_by_name(skill_name)
+        @self.route(
+            '/<skill_name>/preview',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def preview_skill(skill_name: str, request_context: RequestContext) -> quart.Response:
+            skill = await self.ap.skill_service.get_skill(request_context, skill_name)
            if not skill:
                return self.http_status(404, -1, 'Skill not found')
            return self.success(data={'instructions': skill.get('instructions', '')})

-        @self.route('/install/github', methods=['POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def install_skill_from_github() -> quart.Response:
+        @self.route(
+            '/install/github',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def install_skill_from_github(request_context: RequestContext) -> quart.Response:
            data = await quart.request.json
            required_fields = ['asset_url', 'owner', 'repo']
            for field in required_fields:
@@ -115,15 +168,20 @@ class SkillsRouterGroup(group.RouterGroup):
                return self.http_status(400, -1, 'Missing required field: release_tag')

            try:
-                skill = await self.ap.skill_service.install_from_github(data)
+                skill = await self.ap.skill_service.install_from_github(request_context, data)
                return self.success(data={'skills': skill})
            except (ValueError, BoxError) as exc:
                return self.http_status(400, -1, str(exc))
-            except Exception as exc:
-                return self.http_status(500, -1, f'Failed to install skill: {exc}')
+            except Exception:
+                raise

-        @self.route('/install/github/preview', methods=['POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def preview_skill_from_github() -> quart.Response:
+        @self.route(
+            '/install/github/preview',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def preview_skill_from_github(request_context: RequestContext) -> quart.Response:
            data = await quart.request.json
            required_fields = ['asset_url', 'owner', 'repo']
            for field in required_fields:
@@ -134,15 +192,20 @@ class SkillsRouterGroup(group.RouterGroup):
                return self.http_status(400, -1, 'Missing required field: release_tag')

            try:
-                preview = await self.ap.skill_service.preview_install_from_github(data)
+                preview = await self.ap.skill_service.preview_install_from_github(request_context, data)
                return self.success(data={'skills': preview})
            except (ValueError, BoxError) as exc:
                return self.http_status(400, -1, str(exc))
-            except Exception as exc:
-                return self.http_status(500, -1, f'Failed to preview skill: {exc}')
+            except Exception:
+                raise

-        @self.route('/install/upload', methods=['POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def install_skill_from_upload() -> quart.Response:
+        @self.route(
+            '/install/upload',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def install_skill_from_upload(request_context: RequestContext) -> quart.Response:
            file = (await quart.request.files).get('file')
            if file is None:
                return self.http_status(400, -1, 'file is required')
@@ -150,6 +213,7 @@ class SkillsRouterGroup(group.RouterGroup):

            try:
                skill = await self.ap.skill_service.install_from_zip_upload(
+                    request_context,
                    file_bytes=file.read(),
                    filename=file.filename or '',
                    source_paths=form.getlist('source_paths'),
@@ -157,34 +221,45 @@ class SkillsRouterGroup(group.RouterGroup):
                return self.success(data={'skills': skill})
            except (ValueError, BoxError) as exc:
                return self.http_status(400, -1, str(exc))
-            except Exception as exc:
-                return self.http_status(500, -1, f'Failed to install skill: {exc}')
+            except Exception:
+                raise

-        @self.route('/install/upload/preview', methods=['POST'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def preview_skill_from_upload() -> quart.Response:
+        @self.route(
+            '/install/upload/preview',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def preview_skill_from_upload(request_context: RequestContext) -> quart.Response:
            file = (await quart.request.files).get('file')
            if file is None:
                return self.http_status(400, -1, 'file is required')

            try:
                preview = await self.ap.skill_service.preview_install_from_zip_upload(
+                    request_context,
                    file_bytes=file.read(),
                    filename=file.filename or '',
                )
                return self.success(data={'skills': preview})
            except (ValueError, BoxError) as exc:
                return self.http_status(400, -1, str(exc))
-            except Exception as exc:
-                return self.http_status(500, -1, f'Failed to preview skill: {exc}')
+            except Exception:
+                raise

-        @self.route('/scan', methods=['GET'], auth_type=group.AuthType.USER_TOKEN_OR_API_KEY)
-        async def scan_skill_directory() -> quart.Response:
+        @self.route(
+            '/scan',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN_OR_API_KEY,
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def scan_skill_directory(request_context: RequestContext) -> quart.Response:
            path = quart.request.args.get('path', '').strip()
            if not path:
                return self.http_status(400, -1, 'Missing required parameter: path')

            try:
-                result = await self.ap.skill_service.scan_directory_async(path)
+                result = await self.ap.skill_service.scan_directory_async(request_context, path)
                return self.success(data=result)
            except (ValueError, BoxError) as exc:
                return self.http_status(400, -1, str(exc))
@@ -1,19 +1,39 @@
 from .. import group
+from ...authz import Permission
+from ...context import ExecutionContext, RequestContext
+
+
+def collect_basic_stats(ap, request_context: RequestContext) -> dict[str, int]:
+    """Collect runtime counters only from the selected Workspace placement."""
+
+    execution_context = ExecutionContext.from_request(request_context)
+    sessions = [
+        session
+        for session in ap.sess_mgr.session_list
+        if (
+            getattr(session, 'instance_uuid', None) == execution_context.instance_uuid
+            and getattr(session, 'workspace_uuid', None) == execution_context.workspace_uuid
+            and getattr(session, 'placement_generation', None) == execution_context.placement_generation
+        )
+    ]
+    conversation_count = sum(
+        len(session.conversations if session.conversations is not None else []) for session in sessions
+    )
+    return {
+        'active_session_count': len(sessions),
+        'conversation_count': conversation_count,
+        'query_count': ap.query_pool.get_query_count(execution_context),
+    }


@group.group_class('stats', '/api/v1/stats')
 class StatsRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('/basic', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def _() -> str:
-            conv_count = 0
-            for session in self.ap.sess_mgr.session_list:
-                conv_count += len(session.conversations if session.conversations is not None else [])
-
-            return self.success(
-                data={
-                    'active_session_count': len(self.ap.sess_mgr.session_list),
-                    'conversation_count': conv_count,
-                    'query_count': self.ap.query_pool.query_id_counter,
-                }
-            )
+        @self.route(
+            '/basic',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
+            return self.success(data=collect_basic_stats(self.ap, request_context))
@@ -1,3 +1,4 @@
+import asyncio
 import base64

 import quart
@@ -59,7 +60,14 @@ class SurveyRouterGroup(group.RouterGroup):
                    continue
                try:
                    payload = data_url.split(',', 1)[1]
-                    if len(base64.b64decode(payload, validate=True)) > 1024 * 1024:
+                    if len(payload) > 4 * ((1024 * 1024 + 2) // 3) + 4:
+                        return self.fail(5, 'attachment too large')
+                    decoded = await asyncio.to_thread(
+                        base64.b64decode,
+                        payload,
+                        validate=True,
+                    )
+                    if len(decoded) > 1024 * 1024:
                        return self.fail(5, 'attachment too large')
                except Exception:
                    return self.fail(5, 'attachment too large')
@@ -5,7 +5,11 @@ import sqlalchemy

 from .. import group
 from .....utils import constants
-from .....entity.persistence.metadata import Metadata
+from .....entity.persistence.metadata import WorkspaceMetadata
+from ...authz import Permission
+from ...context import RequestContext
+from .....provider.tools.loaders.mcp_policy import stdio_mcp_enabled
+from .....workspace.invitation_delivery import InvitationDeliveryService


@group.group_class('system', '/api/v1/system')
@@ -17,17 +21,46 @@ class SystemRouterGroup(group.RouterGroup):
            wizard_status = 'none'
            wizard_progress = None
            try:
-                result = await self.ap.persistence_mgr.execute_async(
-                    sqlalchemy.select(Metadata).where(Metadata.key.in_(['wizard_status', 'wizard_progress']))
-                )
-                for row in result:
-                    if row.key == 'wizard_status':
-                        wizard_status = row.value
-                    elif row.key == 'wizard_progress':
-                        try:
-                            wizard_progress = json.loads(row.value)
-                        except (json.JSONDecodeError, TypeError):
-                            wizard_progress = None
+                authorization = quart.request.headers.get('Authorization', '')
+                if authorization.startswith('Bearer '):
+                    account, _ = await self._authenticate_account(authorization.removeprefix('Bearer '))
+                    request_context = await self._resolve_account_context(account, group.AuthType.USER_TOKEN)
+                    if request_context is not None:
+                        tenant_uow = getattr(self.ap.persistence_mgr, 'tenant_uow', None)
+
+                        async def load_workspace_metadata():
+                            return await self.ap.persistence_mgr.execute_async(
+                                sqlalchemy.select(
+                                    WorkspaceMetadata.key,
+                                    WorkspaceMetadata.value,
+                                ).where(
+                                    WorkspaceMetadata.workspace_uuid == request_context.workspace_uuid,
+                                    WorkspaceMetadata.key.in_(['wizard_status', 'wizard_progress']),
+                                )
+                            )
+
+                        cloud_runtime = (
+                            getattr(getattr(self.ap.persistence_mgr, 'mode', None), 'value', None) == 'cloud_runtime'
+                        )
+                        if cloud_runtime:
+                            if not callable(tenant_uow):
+                                raise RuntimeError('Cloud system metadata requires an explicit tenant UoW')
+                            async with tenant_uow(request_context.workspace_uuid):
+                                result = await load_workspace_metadata()
+                        else:
+                            result = await load_workspace_metadata()
+                        # ``execute_async`` deliberately preserves its historical
+                        # AsyncConnection result shape. Selecting the two fields
+                        # explicitly keeps this reader independent of ORM Session
+                        # scalar semantics inside a tenant UoW.
+                        for row in result:
+                            if row.key == 'wizard_status':
+                                wizard_status = row.value
+                            elif row.key == 'wizard_progress':
+                                try:
+                                    wizard_progress = json.loads(row.value)
+                                except (json.JSONDecodeError, TypeError):
+                                    wizard_progress = None
            except Exception:
                pass

@@ -43,6 +76,10 @@ class SystemRouterGroup(group.RouterGroup):
            else:
                outbound_ips = []

+            invitation_delivery_service = getattr(self.ap, 'invitation_delivery_service', None)
+            if invitation_delivery_service is None:
+                invitation_delivery_service = InvitationDeliveryService(self.ap)
+
            return self.success(
                data={
                    'version': constants.semantic_version,
@@ -60,15 +97,24 @@ class SystemRouterGroup(group.RouterGroup):
                    'disable_models_service': self.ap.instance_config.data.get('space', {}).get(
                        'disable_models_service', False
                    ),
+                    # Exposed independently of Box status so the WebUI cannot
+                    # infer stdio permission from sandbox availability.
+                    'mcp_stdio_enabled': stdio_mcp_enabled(self.ap),
                    'limitation': self.ap.instance_config.data.get('system', {}).get('limitation', {}),
                    'outbound_ips': outbound_ips,
+                    'invitation_delivery': invitation_delivery_service.capability(),
                    'wizard_status': wizard_status,
                    'wizard_progress': wizard_progress,
                }
            )

-        @self.route('/wizard/completed', methods=['POST'], auth_type=group.AuthType.USER_TOKEN)
-        async def _() -> str:
+        @self.route(
+            '/wizard/completed',
+            methods=['POST'],
+            auth_type=group.AuthType.USER_TOKEN,
+            permission=Permission.WORKSPACE_UPDATE,
+        )
+        async def _(request_context: RequestContext) -> str:
            """Mark wizard status in metadata table and clear progress.

            Accepts JSON body: { "status": "skipped" | "completed" }
@@ -80,28 +126,48 @@ class SystemRouterGroup(group.RouterGroup):

            try:
                result = await self.ap.persistence_mgr.execute_async(
-                    sqlalchemy.select(Metadata).where(Metadata.key == 'wizard_status')
+                    sqlalchemy.select(WorkspaceMetadata).where(
+                        WorkspaceMetadata.workspace_uuid == request_context.workspace_uuid,
+                        WorkspaceMetadata.key == 'wizard_status',
+                    )
                )
                if result.first():
                    await self.ap.persistence_mgr.execute_async(
-                        sqlalchemy.update(Metadata).where(Metadata.key == 'wizard_status').values(value=status)
+                        sqlalchemy.update(WorkspaceMetadata)
+                        .where(
+                            WorkspaceMetadata.workspace_uuid == request_context.workspace_uuid,
+                            WorkspaceMetadata.key == 'wizard_status',
+                        )
+                        .values(value=status)
                    )
                else:
                    await self.ap.persistence_mgr.execute_async(
-                        sqlalchemy.insert(Metadata).values(key='wizard_status', value=status)
+                        sqlalchemy.insert(WorkspaceMetadata).values(
+                            workspace_uuid=request_context.workspace_uuid,
+                            key='wizard_status',
+                            value=status,
+                        )
                    )

                # Clear wizard progress when wizard is completed/skipped
                await self.ap.persistence_mgr.execute_async(
-                    sqlalchemy.delete(Metadata).where(Metadata.key == 'wizard_progress')
+                    sqlalchemy.delete(WorkspaceMetadata).where(
+                        WorkspaceMetadata.workspace_uuid == request_context.workspace_uuid,
+                        WorkspaceMetadata.key == 'wizard_progress',
+                    )
                )
-            except Exception as e:
-                return self.http_status(500, 500, f'Failed to update wizard status: {e}')
+            except Exception:
+                raise

            return self.success(data={})

-        @self.route('/wizard/progress', methods=['PUT'], auth_type=group.AuthType.USER_TOKEN)
-        async def _() -> str:
+        @self.route(
+            '/wizard/progress',
+            methods=['PUT'],
+            auth_type=group.AuthType.USER_TOKEN,
+            permission=Permission.WORKSPACE_UPDATE,
+        )
+        async def _(request_context: RequestContext) -> str:
            """Save wizard progress to metadata table.

            Accepts JSON body with wizard state fields:
@@ -113,23 +179,40 @@ class SystemRouterGroup(group.RouterGroup):

            try:
                result = await self.ap.persistence_mgr.execute_async(
-                    sqlalchemy.select(Metadata).where(Metadata.key == 'wizard_progress')
+                    sqlalchemy.select(WorkspaceMetadata).where(
+                        WorkspaceMetadata.workspace_uuid == request_context.workspace_uuid,
+                        WorkspaceMetadata.key == 'wizard_progress',
+                    )
                )
                if result.first():
                    await self.ap.persistence_mgr.execute_async(
-                        sqlalchemy.update(Metadata).where(Metadata.key == 'wizard_progress').values(value=progress_json)
+                        sqlalchemy.update(WorkspaceMetadata)
+                        .where(
+                            WorkspaceMetadata.workspace_uuid == request_context.workspace_uuid,
+                            WorkspaceMetadata.key == 'wizard_progress',
+                        )
+                        .values(value=progress_json)
                    )
                else:
                    await self.ap.persistence_mgr.execute_async(
-                        sqlalchemy.insert(Metadata).values(key='wizard_progress', value=progress_json)
+                        sqlalchemy.insert(WorkspaceMetadata).values(
+                            workspace_uuid=request_context.workspace_uuid,
+                            key='wizard_progress',
+                            value=progress_json,
+                        )
                    )
-            except Exception as e:
-                return self.http_status(500, 500, f'Failed to save wizard progress: {e}')
+            except Exception:
+                raise

            return self.success(data={})

-        @self.route('/tasks', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def _() -> str:
+        @self.route(
+            '/tasks',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
            task_type = quart.request.args.get('type')
            task_kind = quart.request.args.get('kind')

@@ -138,30 +221,56 @@ class SystemRouterGroup(group.RouterGroup):
            if task_kind == '':
                task_kind = None

-            return self.success(data=self.ap.task_mgr.get_tasks_dict(task_type, task_kind))
+            return self.success(
+                data=self.ap.task_mgr.get_tasks_dict(
+                    task_type,
+                    task_kind,
+                    instance_uuid=request_context.instance_uuid,
+                    workspace_uuid=request_context.workspace_uuid,
+                    placement_generation=request_context.placement_generation,
+                )
+            )

-        @self.route('/tasks/<task_id>', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def _(task_id: str) -> str:
-            task = self.ap.task_mgr.get_task_by_id(int(task_id))
+        @self.route(
+            '/tasks/<task_id>',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN,
+            permission=Permission.RESOURCE_VIEW,
+        )
+        async def _(task_id: str, request_context: RequestContext) -> str:
+            task = self.ap.task_mgr.get_task_by_id(
+                int(task_id),
+                instance_uuid=request_context.instance_uuid,
+                workspace_uuid=request_context.workspace_uuid,
+                placement_generation=request_context.placement_generation,
+            )

            if task is None:
                return self.http_status(404, 404, 'Task not found')

            return self.success(data=task.to_dict())

-        @self.route('/storage-analysis', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def _() -> str:
-            return self.success(data=await self.ap.maintenance_service.get_storage_analysis())
+        @self.route(
+            '/storage-analysis',
+            methods=['GET'],
+            auth_type=group.AuthType.USER_TOKEN,
+            permission=Permission.AUDIT_VIEW,
+        )
+        async def _(request_context: RequestContext) -> str:
+            return self.success(data=await self.ap.maintenance_service.get_storage_analysis(request_context))

        @self.route(
            '/debug/plugin/action',
            methods=['POST'],
            auth_type=group.AuthType.USER_TOKEN,
+            permission=Permission.RUNTIME_OPERATE,
        )
-        async def _() -> str:
+        async def _(request_context: RequestContext) -> str:
            if not constants.debug_mode:
                return self.http_status(403, 403, 'Forbidden')

+            await self.ap.plugin_connector.require_workspace_context(request_context)
+
            data = await quart.request.json

            class AnoymousAction:
@@ -174,6 +283,7 @@ class SystemRouterGroup(group.RouterGroup):
                AnoymousAction(data['action']),
                data['data'],
                timeout=data.get('timeout', 10),
+                action_context=self.ap.plugin_connector.handler.require_bound_action_context().without_installation(),
            )

            return self.success(data=resp)
@@ -182,8 +292,10 @@ class SystemRouterGroup(group.RouterGroup):
            '/status/plugin-system',
            methods=['GET'],
            auth_type=group.AuthType.USER_TOKEN,
+            permission=Permission.RESOURCE_VIEW,
        )
-        async def _() -> str:
+        async def _(request_context: RequestContext) -> str:
+            await self.ap.plugin_connector.require_workspace_context(request_context)
            plugin_connector_error = 'ok'
            is_connected = True

@@ -1,14 +1,55 @@
 import quart
 import argon2
 import asyncio
-import traceback
+import uuid
+from urllib.parse import parse_qs, urlsplit

 from .. import group
 from .....entity.errors import account as account_errors
+from ...context import RequestContext
+from .....cloud.launch import SpaceLaunchError
+from ...service.user import ControlPlaneDirectoryRequiredError, PublicRegistrationClosedError


@group.group_class('user', '/api/v1/user')
 class UserRouterGroup(group.RouterGroup):
+    @staticmethod
+    def _origin(value: str) -> tuple[str, str, int | None] | None:
+        parsed = urlsplit(value)
+        if parsed.scheme not in {'http', 'https'} or not parsed.hostname:
+            return None
+        return parsed.scheme, parsed.hostname.casefold(), parsed.port
+
+    def _validate_space_redirect_uri(self, redirect_uri: str, *, bind: bool) -> str:
+        parsed = urlsplit(redirect_uri)
+        if (
+            parsed.scheme not in {'http', 'https'}
+            or not parsed.hostname
+            or parsed.username is not None
+            or parsed.password is not None
+            or parsed.fragment
+            or parsed.path != '/auth/space/callback'
+        ):
+            raise ValueError('Invalid redirect_uri parameter')
+
+        query = parse_qs(parsed.query, keep_blank_values=True)
+        if bind:
+            if query != {'mode': ['bind']}:
+                raise ValueError('Invalid Space binding redirect_uri')
+        elif query:
+            raise ValueError('Invalid Space login redirect_uri')
+
+        redirect_origin = self._origin(redirect_uri)
+        api_config = self.ap.instance_config.data.get('api', {})
+        trusted_origins = {
+            self._origin(str(api_config.get(config_key, '') or '').strip())
+            for config_key in ('webui_url', 'webhook_prefix')
+        }
+        trusted_origins.discard(None)
+        if redirect_origin not in trusted_origins:
+            raise ValueError('Untrusted redirect_uri origin')
+        return redirect_uri
+
    async def initialize(self) -> None:
        @self.route('/init', methods=['GET', 'POST'], auth_type=group.AuthType.NONE)
        async def _() -> str:
@@ -23,12 +64,19 @@ class UserRouterGroup(group.RouterGroup):
            user_email = json_data['user']
            password = json_data['password']

-            await self.ap.user_service.create_user(user_email, password)
+            try:
+                await self.ap.user_service.create_user(user_email, password)
+            except ControlPlaneDirectoryRequiredError as exc:
+                return self.http_status(409, exc.code, str(exc))
+            except PublicRegistrationClosedError:
+                return self.http_status(409, 'registration_closed', 'System already initialized')

            return self.success()

        @self.route('/auth', methods=['POST'], auth_type=group.AuthType.NONE)
        async def _() -> str:
+            if getattr(getattr(self.ap, 'deployment', None), 'mode', 'oss') == 'cloud':
+                return self.http_status(403, 'password_login_disabled', 'Password login is disabled on LangBot Cloud')
            json_data = await quart.request.json

            try:
@@ -40,9 +88,9 @@ class UserRouterGroup(group.RouterGroup):

            return self.success(data={'token': token})

-        @self.route('/check-token', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def _(user_email: str) -> str:
-            token = await self.ap.user_service.generate_jwt_token(user_email)
+        @self.route('/check-token', methods=['GET'], auth_type=group.AuthType.ACCOUNT_TOKEN)
+        async def _(account) -> str:
+            token = await self.ap.user_service.generate_jwt_token(account)

            return self.success(data={'token': token})

@@ -101,15 +149,50 @@ class UserRouterGroup(group.RouterGroup):
        async def _() -> str:
            """Get Space OAuth authorization URL for redirect"""
            redirect_uri = quart.request.args.get('redirect_uri', '')
-            state = quart.request.args.get('state', '')

            if not redirect_uri:
                return self.fail(1, 'Missing redirect_uri parameter')
+            if 'state' in quart.request.args:
+                return self.fail(1, 'Caller-supplied OAuth state is not allowed')

            try:
+                redirect_uri = self._validate_space_redirect_uri(redirect_uri, bind=False)
+                launch_workspace_uuid = quart.request.args.get('launch_workspace_uuid')
+                if launch_workspace_uuid:
+                    if not getattr(getattr(self.ap, 'deployment', None), 'multi_workspace_enabled', False):
+                        return self.fail(1, 'Space launch requires Cloud mode')
+                    try:
+                        uuid.UUID(launch_workspace_uuid)
+                    except ValueError:
+                        return self.fail(1, 'Invalid launch Workspace')
+                    state = await self.ap.user_service.issue_space_oauth_state(
+                        'login',
+                        launch_workspace_uuid=launch_workspace_uuid,
+                    )
+                else:
+                    state = await self.ap.user_service.issue_space_oauth_state('login')
                authorize_url = self.ap.space_service.get_oauth_authorize_url(redirect_uri, state)
                return self.success(data={'authorize_url': authorize_url})
-            except Exception as e:
+            except ValueError as e:
+                return self.fail(1, str(e))
+
+        @self.route('/space/bind-authorize-url', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
+        async def _(request_context: RequestContext) -> str:
+            """Issue an account-bound, one-time Space OAuth redirect."""
+            redirect_uri = quart.request.args.get('redirect_uri', '')
+            if not redirect_uri:
+                return self.fail(1, 'Missing redirect_uri parameter')
+            if not request_context.account_uuid:
+                return self.http_status(403, 'account_required', 'An Account is required')
+            try:
+                redirect_uri = self._validate_space_redirect_uri(redirect_uri, bind=True)
+                state = await self.ap.user_service.issue_space_oauth_state(
+                    'bind',
+                    account_uuid=request_context.account_uuid,
+                )
+                authorize_url = self.ap.space_service.get_oauth_authorize_url(redirect_uri, state)
+                return self.success(data={'authorize_url': authorize_url})
+            except ValueError as e:
                return self.fail(1, str(e))

        @self.route('/space/callback', methods=['POST'], auth_type=group.AuthType.NONE)
@@ -117,11 +200,23 @@ class UserRouterGroup(group.RouterGroup):
            """Handle OAuth callback - exchange code for tokens and authenticate"""
            json_data = await quart.request.json
            code = json_data.get('code')
+            state = json_data.get('state')
+            launch_assertion = json_data.get('launch_assertion')
+            workspace_uuid = json_data.get('workspace_uuid')
+
+            if launch_assertion:
+                return await self._handle_space_direct_launch(
+                    str(launch_assertion),
+                    str(workspace_uuid or '') or None,
+                )

            if not code:
                return self.fail(1, 'Missing authorization code')
+            if not state:
+                return self.fail(1, 'Missing state parameter')

            try:
+                consumed_state = await self.ap.user_service.consume_space_oauth_state_details(state, 'login')
                # Exchange code for tokens
                token_data = await self.ap.space_service.exchange_oauth_code(code)
                access_token = token_data.get('access_token')
@@ -136,61 +231,80 @@ class UserRouterGroup(group.RouterGroup):
                    access_token, refresh_token, expires_in
                )

+                launch_workspace_uuid = consumed_state.launch_workspace_uuid
+                if launch_workspace_uuid:
+                    try:
+                        access = await self.ap.workspace_collaboration_service.resolve_account_workspace(
+                            user_obj.uuid,
+                            launch_workspace_uuid,
+                        )
+                    except Exception:
+                        self.ap.logger.warning('Rejected Space OAuth launch for unauthorized Workspace')
+                        return self.fail(1, 'Space OAuth failed')
+                    return self.success(
+                        data={
+                            'token': jwt_token,
+                            'user': user_obj.user,
+                            'workspace_uuid': access.workspace.uuid,
+                        }
+                    )
+
                return self.success(
                    data={
                        'token': jwt_token,
                        'user': user_obj.user,
                    }
                )
+            except ControlPlaneDirectoryRequiredError as e:
+                return self.http_status(409, e.code, str(e))
            except account_errors.AccountEmailMismatchError as e:
-                return self.fail(3, str(e))
-            except ValueError as e:
-                traceback.print_exc()
-                self.ap.logger.warning(f'Space OAuth callback failed: {e}')
-                return self.fail(1, str(e))
-            except Exception as e:
-                traceback.print_exc()
-                return self.fail(2, f'OAuth callback failed: {str(e)}')
-
-        @self.route('/info', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def _(user_email: str) -> str:
-            """Get current user information including account type"""
-            user_obj = await self.ap.user_service.get_user_by_email(user_email)
-
-            if user_obj is None:
-                return self.http_status(404, -1, 'User not found')
+                return self.fail(getattr(e, 'code', 3), str(e))
+            except ValueError:
+                self.ap.logger.exception('Space OAuth callback failed')
+                return self.fail(1, 'Space OAuth failed')
+            except Exception:
+                raise

+        @self.route('/info', methods=['GET'], auth_type=group.AuthType.ACCOUNT_TOKEN)
+        async def _(account) -> str:
+            """Get current Account information without re-querying under Workspace RLS."""
            return self.success(
                data={
-                    'user': user_obj.user,
-                    'account_type': user_obj.account_type,
-                    'has_password': bool(user_obj.password and user_obj.password.strip()),
+                    'account_uuid': account.uuid,
+                    'user': account.user,
+                    'account_type': account.account_type,
+                    'has_password': bool(account.password and account.password.strip()),
                }
            )

        @self.route('/space-credits', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def _(user_email: str) -> str:
-            """Get Space credits balance for current user"""
-            credits = await self.ap.space_service.get_credits(user_email)
-            return self.success(data={'credits': credits})
+        async def _(request_context: RequestContext) -> str:
+            """Get Space credits using only the selected Workspace owner's credentials."""
+            access = await self.ap.workspace_collaboration_service.resolve_account_workspace(
+                request_context.account_uuid,
+                request_context.workspace_uuid,
+            )
+            owner = await self.ap.user_service.get_workspace_owner(access.workspace.uuid)
+            owner_space_bound = bool(owner and owner.space_account_uuid)
+            credits = await self.ap.space_service.get_credits(owner.user) if owner_space_bound else None
+            return self.success(
+                data={
+                    'credits': credits,
+                    'owner_space_bound': owner_space_bound,
+                    'is_workspace_owner': access.membership.role == 'owner',
+                }
+            )

        @self.route('/account-info', methods=['GET'], auth_type=group.AuthType.NONE)
        async def _() -> str:
-            """Get account info for login page (account type and has_password)"""
+            """Return instance login capabilities without disclosing an account."""
            if not await self.ap.user_service.is_initialized():
                return self.success(data={'initialized': False})

-            user_obj = await self.ap.user_service.get_first_user()
-            if user_obj is None:
-                return self.success(data={'initialized': False})
-
-            return self.success(
-                data={
-                    'initialized': True,
-                    'account_type': user_obj.account_type,
-                    'has_password': bool(user_obj.password and user_obj.password.strip()),
-                }
-            )
+            capabilities = await self.ap.user_service.get_login_capabilities()
+            if getattr(getattr(self.ap, 'deployment', None), 'mode', 'oss') == 'cloud':
+                capabilities['password_login_enabled'] = False
+            return self.success(data={'initialized': True, **capabilities})

        @self.route('/set-password', methods=['POST'], auth_type=group.AuthType.USER_TOKEN)
        async def _(user_email: str) -> str:
@@ -233,7 +347,7 @@ class UserRouterGroup(group.RouterGroup):

            json_data = await quart.request.json
            code = json_data.get('code')
-            state = json_data.get('state')  # JWT token passed as state
+            state = json_data.get('state')

            if not code:
                return self.http_status(400, -1, 'Missing authorization code')
@@ -241,13 +355,10 @@ class UserRouterGroup(group.RouterGroup):
            if not state:
                return self.http_status(400, -1, 'Missing state parameter')

-            # Verify state is a valid JWT token
            try:
-                user_email = await self.ap.user_service.verify_jwt_token(state)
+                user_obj = await self.ap.user_service.consume_space_oauth_state(state, 'bind')
            except Exception:
                return self.http_status(401, -1, 'Invalid or expired state')
-
-            user_obj = await self.ap.user_service.get_user_by_email(user_email)
            if user_obj is None:
                return self.http_status(404, -1, 'User not found')

@@ -255,8 +366,8 @@ class UserRouterGroup(group.RouterGroup):
                return self.http_status(400, -1, 'Only local accounts can bind to Space')

            try:
-                updated_user = await self.ap.user_service.bind_space_account(user_email, code)
-                jwt_token = await self.ap.user_service.generate_jwt_token(updated_user.user)
+                updated_user = await self.ap.user_service.bind_space_account(user_obj.user, code)
+                jwt_token = await self.ap.user_service.generate_jwt_token(updated_user)
                return self.success(
                    data={
                        'token': jwt_token,
@@ -264,7 +375,46 @@ class UserRouterGroup(group.RouterGroup):
                        'account_type': updated_user.account_type,
                    }
                )
-            except ValueError as e:
-                return self.http_status(400, -1, str(e))
-            except Exception as e:
-                return self.http_status(500, -1, f'Failed to bind Space account: {str(e)}')
+            except account_errors.AccountEmailMismatchError:
+                return self.http_status(
+                    409,
+                    'space_account_email_mismatch',
+                    'Bind the LangBot Account with the same email as this local Account',
+                )
+            except ValueError:
+                return self.http_status(400, -1, 'Space account binding failed')
+            except Exception:
+                raise
+
+    async def _handle_space_direct_launch(
+        self,
+        launch_assertion: str,
+        workspace_uuid: str | None,
+    ) -> str:
+        try:
+            launch = await self.ap.space_launch_service.consume_assertion(
+                launch_assertion,
+                expected_workspace_uuid=workspace_uuid,
+            )
+            account = await self.ap.user_service.get_user_by_uuid(launch['account_uuid'])
+            if account is None:
+                raise SpaceLaunchError('Launch Account is not projected into Core')
+            self.ap.user_service._require_active_account(account)
+            access = await self.ap.workspace_collaboration_service.resolve_account_workspace(
+                account.uuid,
+                launch['workspace_uuid'],
+            )
+            token = await self.ap.user_service.generate_jwt_token(account)
+            return self.success(
+                data={
+                    'token': token,
+                    'user': account.user,
+                    'workspace_uuid': access.workspace.uuid,
+                }
+            )
+        except SpaceLaunchError:
+            self.ap.logger.warning('Rejected Space direct-launch assertion')
+            return self.fail(1, 'Space launch failed')
+        except Exception:
+            self.ap.logger.exception('Space direct launch failed')
+            return self.fail(1, 'Space launch failed')
@@ -1,49 +1,80 @@
+from __future__ import annotations
+
 import quart

+from ...authz import Permission, has_permission
+from ...context import RequestContext
 from .. import group


@group.group_class('webhook_mgmt', '/api/v1/webhooks')
 class WebhookManagementRouterGroup(group.RouterGroup):
    async def initialize(self) -> None:
-        @self.route('', methods=['GET', 'POST'])
-        async def _() -> str:
-            if quart.request.method == 'GET':
-                webhooks = await self.ap.webhook_service.get_webhooks()
-                return self.success(data={'webhooks': webhooks})
-            elif quart.request.method == 'POST':
-                json_data = await quart.request.json
-                name = json_data.get('name', '')
-                url = json_data.get('url', '')
-                description = json_data.get('description', '')
-                enabled = json_data.get('enabled', True)
+        @self.route('', methods=['GET'], permission=Permission.RESOURCE_VIEW)
+        async def _(request_context: RequestContext) -> str:
+            webhooks = await self.ap.webhook_service.get_webhooks(
+                request_context,
+                include_secret=has_permission(request_context, Permission.RESOURCE_MANAGE),
+            )
+            return self.success(data={'webhooks': webhooks})

-                if not name:
-                    return self.http_status(400, -1, 'Name is required')
-                if not url:
-                    return self.http_status(400, -1, 'URL is required')
+        @self.route('', methods=['POST'], permission=Permission.RESOURCE_MANAGE)
+        async def _(request_context: RequestContext) -> str:
+            json_data = await quart.request.get_json(silent=True) or {}
+            name = json_data.get('name', '')
+            url = json_data.get('url', '')
+            description = json_data.get('description', '')
+            enabled = json_data.get('enabled', True)

-                webhook = await self.ap.webhook_service.create_webhook(name, url, description, enabled)
-                return self.success(data={'webhook': webhook})
+            if not name:
+                return self.http_status(400, -1, 'Name is required')
+            if not url:
+                return self.http_status(400, -1, 'URL is required')

-        @self.route('/<int:webhook_id>', methods=['GET', 'PUT', 'DELETE'])
-        async def _(webhook_id: int) -> str:
-            if quart.request.method == 'GET':
-                webhook = await self.ap.webhook_service.get_webhook(webhook_id)
-                if webhook is None:
+            try:
+                webhook = await self.ap.webhook_service.create_webhook(
+                    request_context,
+                    name,
+                    url,
+                    description,
+                    enabled,
+                )
+            except ValueError as exc:
+                return self.http_status(400, -1, str(exc))
+            return self.success(data={'webhook': webhook})
+
+        @self.route('/<int:webhook_id>', methods=['GET'], permission=Permission.RESOURCE_VIEW)
+        async def _(webhook_id: int, request_context: RequestContext) -> str:
+            webhook = await self.ap.webhook_service.get_webhook(
+                request_context,
+                webhook_id,
+                include_secret=has_permission(request_context, Permission.RESOURCE_MANAGE),
+            )
+            if webhook is None:
+                return self.http_status(404, -1, 'Webhook not found')
+            return self.success(data={'webhook': webhook})
+
+        @self.route(
+            '/<int:webhook_id>',
+            methods=['PUT', 'DELETE'],
+            permission=Permission.RESOURCE_MANAGE,
+        )
+        async def _(webhook_id: int, request_context: RequestContext) -> str:
+            if quart.request.method == 'PUT':
+                json_data = await quart.request.get_json(silent=True) or {}
+                updated = await self.ap.webhook_service.update_webhook(
+                    request_context,
+                    webhook_id,
+                    json_data.get('name'),
+                    json_data.get('url'),
+                    json_data.get('description'),
+                    json_data.get('enabled'),
+                )
+                if not updated:
                    return self.http_status(404, -1, 'Webhook not found')
-                return self.success(data={'webhook': webhook})
-
-            elif quart.request.method == 'PUT':
-                json_data = await quart.request.json
-                name = json_data.get('name')
-                url = json_data.get('url')
-                description = json_data.get('description')
-                enabled = json_data.get('enabled')
-
-                await self.ap.webhook_service.update_webhook(webhook_id, name, url, description, enabled)
                return self.success()

-            elif quart.request.method == 'DELETE':
-                await self.ap.webhook_service.delete_webhook(webhook_id)
-                return self.success()
+            deleted = await self.ap.webhook_service.delete_webhook(request_context, webhook_id)
+            if not deleted:
+                return self.http_status(404, -1, 'Webhook not found')
+            return self.success()
@@ -4,6 +4,7 @@ import quart
 import traceback

 from .. import group
+from .....utils import bounded_executor


@group.group_class('webhooks', '/bots')
@@ -30,7 +31,10 @@ class WebhookRouterGroup(group.RouterGroup):
            适配器返回的响应
        """
        try:
-            runtime_bot = await self.ap.platform_mgr.get_bot_by_uuid(bot_uuid)
+            # Public ingress never accepts X-Workspace-Id.  The opaque bot UUID
+            # is resolved against the already-bound runtime resource, which
+            # carries the trusted Workspace and placement generation.
+            runtime_bot = await self.ap.platform_mgr.resolve_public_bot(bot_uuid)

            if not runtime_bot:
                return quart.jsonify({'error': 'Bot not found'}), 404
@@ -41,14 +45,40 @@ class WebhookRouterGroup(group.RouterGroup):
            if not hasattr(runtime_bot.adapter, 'handle_unified_webhook'):
                return quart.jsonify({'error': 'Adapter does not support unified webhook'}), 501

-            response = await runtime_bot.adapter.handle_unified_webhook(
-                bot_uuid=bot_uuid,
-                path=path,
-                request=quart.request,
-            )
+            async def dispatch():
+                await self.ap.workspace_service.get_execution_binding(
+                    runtime_bot.workspace_uuid,
+                    expected_generation=runtime_bot.placement_generation,
+                )
+                return await runtime_bot.adapter.handle_unified_webhook(
+                    bot_uuid=bot_uuid,
+                    path=path,
+                    request=quart.request,
+                )
+
+            with bounded_executor.blocking_work_scope(runtime_bot.workspace_uuid):
+                persistence_mgr = self.ap.persistence_mgr
+                cloud_runtime = getattr(getattr(persistence_mgr, 'mode', None), 'value', None) == 'cloud_runtime'
+                if cloud_runtime:
+                    tenant_scope = getattr(persistence_mgr, 'tenant_scope', None)
+                    if not callable(tenant_scope):
+                        raise RuntimeError('Cloud webhook dispatch requires an explicit tenant scope')
+                    async with tenant_scope(runtime_bot.workspace_uuid):
+                        response = await dispatch()
+                else:
+                    response = await dispatch()

            return response

-        except Exception as e:
-            self.ap.logger.error(f'Webhook dispatch error for bot {bot_uuid}: {traceback.format_exc()}')
-            return quart.jsonify({'error': str(e)}), 500
+        except bounded_executor.BlockingWorkCapacityError as exc:
+            return self.http_status(
+                429,
+                'blocking_work_capacity_exceeded',
+                str(exc),
+            )
+        except Exception:
+            request_id = self.request_id()
+            self.ap.logger.error(
+                f'Webhook dispatch error request_id={request_id} bot={bot_uuid}: {traceback.format_exc()}'
+            )
+            return self.internal_error_response(request_id)
@@ -0,0 +1,363 @@
+from __future__ import annotations
+
+import typing
+
+import quart
+
+from ...authz import Permission, permissions_for_role
+from ...context import RequestContext
+from ...service.user import AccountExistsLoginRequiredError, ControlPlaneDirectoryRequiredError
+from .....entity.persistence.workspace import Workspace, WorkspaceInvitation, WorkspaceMembership
+from .....entity.persistence.workspace import WorkspaceSource
+from .....workspace.collaboration import WorkspaceMemberView
+from .....workspace.errors import WorkspaceNotFoundError
+from .....workspace.invitation_delivery import InvitationDeliveryService
+from .. import group
+
+
+def _workspace_payload(workspace: Workspace) -> dict[str, typing.Any]:
+    return {
+        'uuid': workspace.uuid,
+        'instance_uuid': workspace.instance_uuid,
+        'name': workspace.name,
+        'slug': workspace.slug,
+        'type': workspace.type,
+        'status': workspace.status,
+        'source': workspace.source,
+    }
+
+
+def _membership_payload(
+    membership: WorkspaceMembership,
+    *,
+    email: str,
+) -> dict[str, typing.Any]:
+    return {
+        'uuid': membership.uuid,
+        'workspace_uuid': membership.workspace_uuid,
+        'account_uuid': membership.account_uuid,
+        'email': email,
+        'role': membership.role,
+        'status': membership.status,
+        'joined_at': membership.joined_at.isoformat() if membership.joined_at else None,
+        'created_at': membership.created_at.isoformat() if membership.created_at else None,
+    }
+
+
+def _invitation_payload(invitation: WorkspaceInvitation) -> dict[str, typing.Any]:
+    """Serialize an invitation without its bearer-secret hash."""
+
+    return {
+        'uuid': invitation.uuid,
+        'workspace_uuid': invitation.workspace_uuid,
+        'normalized_email': invitation.normalized_email,
+        'role': invitation.role,
+        'status': invitation.status,
+        'expires_at': invitation.expires_at.isoformat(),
+        'created_at': invitation.created_at.isoformat() if invitation.created_at else None,
+    }
+
+
+@group.group_class('workspaces', '/api/v1/workspaces')
+class WorkspacesRouterGroup(group.RouterGroup):
+    async def _run_in_workspace_uow(
+        self, workspace_uuid: str, operation: typing.Callable[[], typing.Awaitable[typing.Any]]
+    ):
+        """Bind collaboration persistence to the selected tenant in Cloud."""
+        cloud_runtime = getattr(getattr(self.ap.persistence_mgr, 'mode', None), 'value', None) == 'cloud_runtime'
+        if cloud_runtime:
+            async with self.ap.persistence_mgr.tenant_uow(workspace_uuid):
+                return await operation()
+        return await operation()
+
+    async def initialize(self) -> None:
+        @self.route('/bootstrap', methods=['GET'], auth_type=group.AuthType.ACCOUNT_TOKEN)
+        async def _(account) -> typing.Any:
+            """List the active Workspaces available to an authenticated Account.
+
+            This account-only endpoint intentionally runs before Workspace
+            selection. It never accepts a selector as authority and does not
+            choose a default Workspace for a multi-membership Account.
+            """
+
+            accesses = await self.ap.workspace_collaboration_service.list_account_workspaces(account.uuid)
+            resolver = getattr(self.ap, 'entitlement_resolver', None)
+            workspaces: list[dict[str, typing.Any]] = []
+            for access in accesses:
+                plan_name: str | None = None
+                if access.workspace.source == WorkspaceSource.CLOUD_PROJECTION.value and resolver is not None:
+                    entitlement = await resolver.resolve(
+                        access.workspace.uuid,
+                        minimum_revision=access.membership.projection_revision,
+                    )
+                    plan_name = entitlement.plan_name
+                workspaces.append(
+                    {
+                        'workspace': _workspace_payload(access.workspace),
+                        'membership': _membership_payload(access.membership, email=account.user),
+                        'permissions': sorted(permissions_for_role(access.membership.role)),
+                        'placement_generation': access.execution.placement_generation,
+                        'plan_name': plan_name,
+                    }
+                )
+            return self.success(data={'workspaces': workspaces})
+
+        @self.route('', methods=['GET'], auth_type=group.AuthType.ACCOUNT_TOKEN)
+        async def _(account) -> typing.Any:
+            accesses = await self.ap.workspace_collaboration_service.list_account_workspaces(account.uuid)
+            return self.success(data={'workspaces': [_workspace_payload(access.workspace) for access in accesses]})
+
+        @self.route('', methods=['POST'], permission=Permission.WORKSPACE_VIEW)
+        async def _(request_context: RequestContext) -> typing.Any:
+            if self.ap.workspace_service.policy.multi_workspace_enabled:
+                return self.http_status(
+                    409,
+                    'control_plane_required',
+                    'Cloud Workspaces are created by the SaaS control plane',
+                )
+            return self.http_status(403, 'edition_limit', 'This edition supports one Workspace per instance')
+
+        @self.route('/current', methods=['GET'], permission=Permission.WORKSPACE_VIEW)
+        async def _(request_context: RequestContext) -> typing.Any:
+            membership = quart.g.workspace_membership
+            account = await self.ap.user_service.get_user_by_uuid(request_context.account_uuid)
+            if account is None:
+                return self.http_status(401, 'invalid_authentication', 'Account not found')
+            workspace = await self.ap.workspace_service.get_workspace(request_context.workspace_uuid)
+            plan_name: str | None = None
+            resolver = getattr(self.ap, 'entitlement_resolver', None)
+            if workspace.source == WorkspaceSource.CLOUD_PROJECTION.value and resolver is not None:
+                entitlement = await resolver.resolve(
+                    workspace.uuid,
+                    minimum_revision=request_context.entitlement_revision,
+                )
+                plan_name = entitlement.plan_name
+            return self.success(
+                data={
+                    'workspace': _workspace_payload(workspace),
+                    'membership': _membership_payload(membership, email=account.user),
+                    'permissions': sorted(request_context.workspace.permissions),
+                    'placement_generation': request_context.placement_generation,
+                    'plan_name': plan_name,
+                }
+            )
+
+        @self.route('/<workspace_uuid>', methods=['GET'], permission=Permission.WORKSPACE_VIEW)
+        async def _(workspace_uuid: str, request_context: RequestContext) -> typing.Any:
+            self._require_current_workspace(workspace_uuid, request_context)
+            workspace = await self.ap.workspace_service.get_workspace(workspace_uuid)
+            return self.success(data={'workspace': _workspace_payload(workspace)})
+
+        @self.route('/<workspace_uuid>/members', methods=['GET'], permission=Permission.MEMBER_VIEW)
+        async def _(workspace_uuid: str, request_context: RequestContext) -> typing.Any:
+            self._require_current_workspace(workspace_uuid, request_context)
+
+            async def list_members():
+                return await self.ap.workspace_collaboration_service.list_members(
+                    workspace_uuid, quart.g.workspace_membership
+                )
+
+            members = await self._run_in_workspace_uow(workspace_uuid, list_members)
+            return self.success(data={'members': [self._member_view_payload(item) for item in members]})
+
+        @self.route(
+            '/<workspace_uuid>/invitations',
+            methods=['GET', 'POST'],
+            permission=Permission.MEMBER_INVITE,
+        )
+        async def _(workspace_uuid: str, request_context: RequestContext) -> typing.Any:
+            self._require_current_workspace(workspace_uuid, request_context)
+            if quart.request.method == 'GET':
+
+                async def list_invitations():
+                    return await self.ap.workspace_collaboration_service.list_invitations(
+                        workspace_uuid, quart.g.workspace_membership
+                    )
+
+                invitations = await self._run_in_workspace_uow(workspace_uuid, list_invitations)
+                return self.success(data={'invitations': [_invitation_payload(item) for item in invitations]})
+
+            data = await quart.request.get_json(silent=True) or {}
+
+            async def create_invitation():
+                return await self.ap.workspace_collaboration_service.create_invitation(
+                    workspace_uuid,
+                    quart.g.workspace_membership,
+                    str(data.get('email', '')),
+                    str(data.get('role', 'viewer')),
+                )
+
+            created = await self._run_in_workspace_uow(workspace_uuid, create_invitation)
+            delivery_service = self._invitation_delivery_service()
+            link = delivery_service.build_invitation_link(created.token)
+            workspace = await self.ap.workspace_service.get_workspace(workspace_uuid)
+            delivery = await delivery_service.deliver_invitation(
+                recipient_email=created.invitation.normalized_email,
+                workspace_name=workspace.name,
+                invitation_link=link,
+            )
+            return self.success(
+                data={
+                    'invitation': _invitation_payload(created.invitation),
+                    'token': created.token,
+                    'link': link,
+                    'delivery': delivery.to_public_dict(),
+                }
+            )
+
+        @self.route(
+            '/<workspace_uuid>/invitations/<invitation_uuid>',
+            methods=['DELETE'],
+            permission=Permission.MEMBER_INVITE,
+        )
+        async def _(
+            workspace_uuid: str,
+            invitation_uuid: str,
+            request_context: RequestContext,
+        ) -> typing.Any:
+            self._require_current_workspace(workspace_uuid, request_context)
+
+            async def revoke_invitation():
+                return await self.ap.workspace_collaboration_service.revoke_invitation(
+                    workspace_uuid, invitation_uuid, quart.g.workspace_membership
+                )
+
+            invitation = await self._run_in_workspace_uow(workspace_uuid, revoke_invitation)
+            return self.success(data={'invitation': _invitation_payload(invitation)})
+
+        @self.route(
+            '/<workspace_uuid>/members/<account_uuid>',
+            methods=['PATCH', 'DELETE'],
+            permission=Permission.MEMBER_UPDATE_ROLE,
+        )
+        async def _(
+            workspace_uuid: str,
+            account_uuid: str,
+            request_context: RequestContext,
+        ) -> typing.Any:
+            self._require_current_workspace(workspace_uuid, request_context)
+            if quart.request.method == 'DELETE':
+                if Permission.MEMBER_REMOVE.value not in request_context.workspace.permissions:
+                    return self.http_status(403, 'permission_denied', 'Member removal permission is required')
+
+                async def remove_member():
+                    return await self.ap.workspace_collaboration_service.remove_member(
+                        workspace_uuid, account_uuid, quart.g.workspace_membership
+                    )
+
+                member = await self._run_in_workspace_uow(workspace_uuid, remove_member)
+                return self.success(data={'account_uuid': member.account_uuid})
+
+            data = await quart.request.get_json(silent=True) or {}
+
+            async def update_member_role():
+                return await self.ap.workspace_collaboration_service.update_member_role(
+                    workspace_uuid,
+                    account_uuid,
+                    str(data.get('role', '')),
+                    quart.g.workspace_membership,
+                )
+
+            member = await self._run_in_workspace_uow(workspace_uuid, update_member_role)
+            account = await self.ap.user_service.get_user_by_uuid(member.account_uuid)
+            return self.success(
+                data={
+                    'member': _membership_payload(
+                        member,
+                        email=account.user if account is not None else '',
+                    )
+                }
+            )
+
+    @staticmethod
+    def _require_current_workspace(workspace_uuid: str, request_context: RequestContext) -> None:
+        if workspace_uuid != request_context.workspace_uuid:
+            raise WorkspaceNotFoundError('Workspace not found')
+
+    def _invitation_delivery_service(self) -> InvitationDeliveryService:
+        service = getattr(self.ap, 'invitation_delivery_service', None)
+        if service is None:
+            service = InvitationDeliveryService(self.ap)
+            self.ap.invitation_delivery_service = service
+        return service
+
+    @staticmethod
+    def _member_view_payload(view: WorkspaceMemberView) -> dict[str, typing.Any]:
+        return _membership_payload(view.membership, email=view.email)
+
+
+@group.group_class('invitations', '/api/v1/invitations')
+class InvitationsRouterGroup(group.RouterGroup):
+    async def initialize(self) -> None:
+        @self.route('/inspect', methods=['POST'], auth_type=group.AuthType.NONE)
+        async def _() -> typing.Any:
+            data = await quart.request.get_json(silent=True) or {}
+            invitation, workspace = await self.ap.workspace_collaboration_service.inspect_invitation(
+                str(data.get('token', ''))
+            )
+            return self.success(
+                data={
+                    'invitation': _invitation_payload(invitation),
+                    'workspace': _workspace_payload(workspace),
+                }
+            )
+
+        @self.route('/accept', methods=['POST'], auth_type=group.AuthType.NONE)
+        async def _() -> typing.Any:
+            data = await quart.request.get_json(silent=True) or {}
+            invitation_token = str(data.get('token', ''))
+            if not invitation_token:
+                return self.http_status(400, 'invitation_invalid', 'Invitation token is required')
+
+            authorization = quart.request.headers.get('Authorization', '')
+            if authorization.startswith('Bearer '):
+                if getattr(getattr(self.ap, 'deployment', None), 'mode', 'oss') != 'cloud':
+                    return self.http_status(
+                        409,
+                        'invitation_logout_required',
+                        'Sign out before creating the invited local Account',
+                    )
+                try:
+                    account = await self.ap.user_service.get_authenticated_account(
+                        authorization.removeprefix('Bearer ')
+                    )
+                    if isinstance(account, str):
+                        account = await self.ap.user_service.get_user_by_email(account)
+                except Exception as exc:
+                    return self._auth_error_response(exc)
+                if account is None:
+                    return self.http_status(401, 'invalid_authentication', 'Account not found')
+                membership = await self.ap.workspace_collaboration_service.accept_invitation(
+                    invitation_token,
+                    account.uuid,
+                )
+                token = await self.ap.user_service.generate_jwt_token(account)
+                return self.success(data={'token': token, 'workspace_uuid': membership.workspace_uuid})
+
+            registration = data.get('registration')
+            if getattr(getattr(self.ap, 'deployment', None), 'mode', 'oss') == 'cloud':
+                return self.http_status(
+                    401,
+                    'account_exists_login_required',
+                    'Login with your LangBot Account to accept this invitation',
+                )
+            if not isinstance(registration, dict):
+                return self.http_status(
+                    401,
+                    'account_exists_login_required',
+                    'Sign in or provide registration details to accept this invitation',
+                )
+            password = registration.get('password')
+            if not isinstance(password, str) or len(password) < 8:
+                return self.http_status(400, 'invalid_password', 'Password must contain at least 8 characters')
+            try:
+                _, membership = await self.ap.user_service.register_invited_account(
+                    invitation_token,
+                    str(registration.get('email', '')),
+                    password,
+                )
+            except ControlPlaneDirectoryRequiredError as exc:
+                return self.http_status(409, exc.code, str(exc))
+            except AccountExistsLoginRequiredError as exc:
+                return self.http_status(409, exc.code, str(exc))
+            return self.success(data={'workspace_uuid': membership.workspace_uuid, 'login_required': True})
@@ -2,6 +2,7 @@ from __future__ import annotations

 import asyncio
 import os
+import typing

 import quart
 import quart_cors
@@ -27,6 +28,37 @@ importutil.import_modules_in_pkg(groups_knowledge)
 importutil.import_modules_in_pkg(groups_resources)


+class BoundedJSONRequest(quart.Request):
+    """Parse bounded HTTP JSON bodies outside the shared event loop."""
+
+    async def get_json(
+        self,
+        force: bool = False,
+        silent: bool = False,
+        cache: bool = True,
+    ) -> typing.Any:
+        # Keep Quart's cache and error semantics, changing only where the
+        # potentially 10 MiB JSON decoder runs. The RouterGroup establishes a
+        # trusted Workspace blocking-work scope before calling route handlers.
+        if cache and self._cached_json[silent] is not Ellipsis:
+            return self._cached_json[silent]
+        if not (force or self.is_json):
+            return None
+
+        data = await self.get_data(cache=cache, as_text=False)
+        try:
+            result = await asyncio.to_thread(self.json_module.loads, data)
+        except ValueError as error:
+            if silent:
+                result = None
+            else:
+                result = self.on_json_loading_failed(error)
+
+        if cache:
+            self._cached_json[silent] = result
+        return result
+
+
 class HTTPController:
    ap: app.Application

@@ -35,6 +67,7 @@ class HTTPController:
    def __init__(self, ap: app.Application) -> None:
        self.ap = ap
        self.quart_app = quart.Quart(__name__)
+        self.quart_app.request_class = BoundedJSONRequest
        quart_cors.cors(self.quart_app, allow_origin='*')

        # Set maximum content length to prevent large file uploads
@@ -103,6 +136,7 @@ class HTTPController:
        config.accesslog = '-'
        config.bind = [f'{host}:{port}']
        config.errorlog = config.accesslog
+        config.websocket_max_message_size = group.MAX_FILE_SIZE

        asgi_app = self.quart_app
        if self.mcp_mount is not None:
@@ -113,7 +147,16 @@ class HTTPController:
    async def register_routes(self) -> None:
        @self.quart_app.route('/healthz')
        async def healthz():
-            return {'code': 0, 'msg': 'ok'}
+            get_resource_stats = getattr(
+                self.ap,
+                'get_runtime_resource_stats',
+                None,
+            )
+            return {
+                'code': 0,
+                'msg': 'ok',
+                'resources': (get_resource_stats() if callable(get_resource_stats) else {}),
+            }

        for g in group.preregistered_groups:
            ginst = g(self.ap, self.quart_app)
@@ -1,97 +1,304 @@
 from __future__ import annotations

+import dataclasses
+import datetime
+import hashlib
 import secrets
+import typing
+import uuid
+
 import sqlalchemy

-from ....core import app
 from ....entity.persistence import apikey
+from ....workspace.errors import WorkspaceNotFoundError
+from ..authz import Permission, PermissionDeniedError
+from .tenant import TenantContext, require_workspace_uuid, scope_statement
+
+if typing.TYPE_CHECKING:
+    from ....core.app import Application
+
+
+@dataclasses.dataclass(frozen=True, slots=True)
+class ApiKeyIdentity:
+    """Trusted Workspace identity derived from an API-key secret."""
+
+    instance_uuid: str
+    workspace_uuid: str
+    placement_generation: int
+    api_key_uuid: str
+    permissions: frozenset[str]


 class ApiKeyService:
-    ap: app.Application
+    """Manage hashed, Workspace-bound API keys."""

-    def __init__(self, ap: app.Application) -> None:
+    def __init__(self, ap: Application) -> None:
        self.ap = ap

-    async def get_api_keys(self) -> list[dict]:
-        """Get all API keys"""
-        result = await self.ap.persistence_mgr.execute_async(sqlalchemy.select(apikey.ApiKey))
+    @staticmethod
+    def _hash_secret(secret: str) -> str:
+        return hashlib.sha256(secret.encode('utf-8')).hexdigest()

-        keys = result.all()
-        return [self.ap.persistence_mgr.serialize_model(apikey.ApiKey, key) for key in keys]
+    @staticmethod
+    def _utcnow() -> datetime.datetime:
+        return datetime.datetime.now(datetime.UTC).replace(tzinfo=None)

-    async def create_api_key(self, name: str, description: str = '') -> dict:
-        """Create a new API key"""
-        # Generate a secure random API key
-        key = f'lbk_{secrets.token_urlsafe(32)}'
+    @staticmethod
+    def _normalize_scopes(
+        scopes: typing.Iterable[str] | None,
+        *,
+        default: typing.Iterable[str] = (),
+    ) -> list[str]:
+        requested = list(default if scopes is None else scopes)
+        valid = {permission.value for permission in Permission}
+        normalized: list[str] = []
+        for scope in requested:
+            if not isinstance(scope, str):
+                raise ValueError('API key scopes must be strings')
+            value = scope.strip()
+            if value not in valid:
+                raise ValueError(f'Unknown API key scope: {value}')
+            if value not in normalized:
+                normalized.append(value)
+        return normalized

-        key_data = {'name': name, 'key': key, 'description': description}
+    def _serialize(self, row: typing.Any) -> dict[str, typing.Any]:
+        value = self.ap.persistence_mgr.serialize_model(apikey.ApiKey, row)
+        value.pop('key_hash', None)
+        # The secret is deliberately unrecoverable after creation.
+        value['secret_available'] = False
+        return value

-        await self.ap.persistence_mgr.execute_async(sqlalchemy.insert(apikey.ApiKey).values(**key_data))
-
-        # Retrieve the created key
+    async def get_api_keys(self, context: TenantContext) -> list[dict[str, typing.Any]]:
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(apikey.ApiKey).where(apikey.ApiKey.key == key)
+            scope_statement(
+                sqlalchemy.select(apikey.ApiKey).order_by(apikey.ApiKey.created_at, apikey.ApiKey.id),
+                apikey.ApiKey,
+                context,
+            )
        )
-        created_key = result.first()
+        return [self._serialize(key) for key in result.all()]

-        return self.ap.persistence_mgr.serialize_model(apikey.ApiKey, created_key)
+    async def create_api_key(
+        self,
+        context: TenantContext,
+        name: str,
+        description: str = '',
+        *,
+        scopes: typing.Iterable[str] | None = None,
+        expires_at: datetime.datetime | None = None,
+    ) -> dict[str, typing.Any]:
+        workspace_uuid = require_workspace_uuid(context)
+        normalized_name = name.strip()
+        if not normalized_name:
+            raise ValueError('Name is required')
+        if expires_at is not None:
+            if expires_at.tzinfo is not None:
+                expires_at = expires_at.astimezone(datetime.UTC).replace(tzinfo=None)
+            if expires_at <= self._utcnow():
+                raise ValueError('API key expiry must be in the future')

-    async def get_api_key(self, key_id: int) -> dict | None:
-        """Get a specific API key by ID"""
+        default_scopes = getattr(getattr(context, 'workspace', None), 'permissions', frozenset())
+        normalized_scopes = self._normalize_scopes(scopes, default=default_scopes)
+        allowed_scopes = frozenset(default_scopes)
+        unauthorized_scopes = sorted(set(normalized_scopes) - allowed_scopes)
+        if unauthorized_scopes:
+            # API-key management delegates the caller's authority; it must not
+            # become a path for minting a stronger principal.
+            raise PermissionDeniedError(unauthorized_scopes[0])
+        secret = f'lbk_{secrets.token_urlsafe(32)}'
+        key_uuid = str(uuid.uuid4())
+        await self.ap.persistence_mgr.execute_async(
+            sqlalchemy.insert(apikey.ApiKey).values(
+                uuid=key_uuid,
+                workspace_uuid=workspace_uuid,
+                created_by_account_uuid=getattr(context, 'account_uuid', None),
+                name=normalized_name,
+                key_hash=self._hash_secret(secret),
+                scopes=normalized_scopes,
+                status=apikey.ApiKeyStatus.ACTIVE.value,
+                expires_at=expires_at,
+                description=description.strip(),
+            )
+        )
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(apikey.ApiKey).where(apikey.ApiKey.id == key_id)
+            scope_statement(
+                sqlalchemy.select(apikey.ApiKey).where(apikey.ApiKey.uuid == key_uuid),
+                apikey.ApiKey,
+                workspace_uuid,
+            )
        )
+        created = result.first()
+        if created is None:
+            raise RuntimeError('Created API key could not be loaded')
+        value = self._serialize(created)
+        value['key'] = secret
+        value['secret_available'] = True
+        return value

+    async def get_api_key(self, context: TenantContext, key_id: int) -> dict[str, typing.Any] | None:
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.select(apikey.ApiKey).where(apikey.ApiKey.id == key_id),
+                apikey.ApiKey,
+                context,
+            )
+        )
        key = result.first()
+        return None if key is None else self._serialize(key)

-        if key is None:
+    async def authenticate_api_key(self, secret: str) -> ApiKeyIdentity | None:
+        """Authenticate a secret and derive its Workspace without trusting headers."""
+
+        if not isinstance(secret, str) or not secret.strip():
            return None

-        return self.ap.persistence_mgr.serialize_model(apikey.ApiKey, key)
+        global_secret = self.ap.instance_config.data.get('api', {}).get('global_api_key', '')
+        if global_secret and secrets.compare_digest(secret, global_secret):
+            workspace_service = getattr(self.ap, 'workspace_service', None)
+            if workspace_service is None or workspace_service.policy.multi_workspace_enabled:
+                return None
+            binding = await workspace_service.get_local_execution_binding()
+            return ApiKeyIdentity(
+                instance_uuid=binding.instance_uuid,
+                workspace_uuid=binding.workspace_uuid,
+                placement_generation=binding.placement_generation,
+                api_key_uuid='global-oss-api-key',
+                permissions=frozenset(permission.value for permission in Permission),
+            )

-    async def verify_api_key(self, key: str) -> bool:
-        """Verify if an API key is valid.
+        if not secret.startswith('lbk_'):
+            return None
+        secret_hash = self._hash_secret(secret)
+        current_session = getattr(self.ap.persistence_mgr, 'current_session', lambda: None)
+        discovery_uow = getattr(self.ap.persistence_mgr, 'api_key_discovery_uow', None)
+        if current_session() is None and callable(discovery_uow):
+            async with discovery_uow(secret_hash) as discovery:
+                key = await discovery.session.scalar(
+                    sqlalchemy.select(apikey.ApiKey).where(apikey.ApiKey.key_hash == secret_hash)
+                )
+        else:
+            result = await self.ap.persistence_mgr.execute_async(
+                sqlalchemy.select(apikey.ApiKey).where(apikey.ApiKey.key_hash == secret_hash)
+            )
+            key = result.first()
+        if key is None:
+            return None
+        discovered_workspace_uuid = key.workspace_uuid
+        discovered_key_id = key.id
+        now = self._utcnow()

-        A key is accepted if it matches the global API key configured in
-        ``config.yaml`` (``api.global_api_key``) — which requires no login
-        session and no database record — or if it matches a key created via
-        the web UI (stored in the database, prefixed with ``lbk_``).
-        """
-        if not isinstance(key, str) or not key:
-            return False
+        async def bind_and_record_use() -> tuple[typing.Any, typing.Any] | None:
+            # Re-read inside the tenant transaction. A revoke/expiry racing
+            # discovery must not result in an authenticated identity.
+            active_session = current_session()
+            if active_session is not None:
+                scoped_key = await active_session.scalar(
+                    sqlalchemy.select(apikey.ApiKey).where(
+                        apikey.ApiKey.id == discovered_key_id,
+                        apikey.ApiKey.workspace_uuid == discovered_workspace_uuid,
+                        apikey.ApiKey.key_hash == secret_hash,
+                    )
+                )
+            else:  # compatibility for isolated service tests
+                scoped_result = await self.ap.persistence_mgr.execute_async(
+                    sqlalchemy.select(apikey.ApiKey).where(
+                        apikey.ApiKey.id == discovered_key_id,
+                        apikey.ApiKey.workspace_uuid == discovered_workspace_uuid,
+                        apikey.ApiKey.key_hash == secret_hash,
+                    )
+                )
+                scoped_key = scoped_result.first()
+            if scoped_key is None or scoped_key.status != apikey.ApiKeyStatus.ACTIVE.value:
+                return None
+            if scoped_key.expires_at is not None and scoped_key.expires_at <= now:
+                return None

-        # 1. Global API key from config.yaml (no DB lookup, no login state).
-        #    Note: config completion only backfills top-level keys, so existing
-        #    installs may not have this key — access it defensively.
-        global_api_key = self.ap.instance_config.data.get('api', {}).get('global_api_key', '')
-        if global_api_key and secrets.compare_digest(key, global_api_key):
-            return True
+            binding = await self.ap.workspace_service.get_execution_binding(discovered_workspace_uuid)
+            updated = await self.ap.persistence_mgr.execute_async(
+                sqlalchemy.update(apikey.ApiKey)
+                .where(
+                    apikey.ApiKey.id == scoped_key.id,
+                    apikey.ApiKey.workspace_uuid == discovered_workspace_uuid,
+                    apikey.ApiKey.key_hash == secret_hash,
+                    apikey.ApiKey.status == apikey.ApiKeyStatus.ACTIVE.value,
+                )
+                .values(last_used_at=now)
+                .returning(apikey.ApiKey.id)
+            )
+            # Authentication and revocation race on this atomic predicate. If
+            # revoke won, no active row is returned and the stale object read
+            # above must never become an authenticated identity.
+            if updated.scalar_one_or_none() is None:
+                return None
+            return binding, scoped_key

-        # 2. Web-UI-created keys are stored in the database and prefixed lbk_.
-        if not key.startswith('lbk_'):
-            return False
-
-        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(apikey.ApiKey).where(apikey.ApiKey.key == key)
+        tenant_uow = getattr(self.ap.persistence_mgr, 'tenant_uow', None)
+        if current_session() is None and callable(tenant_uow):
+            async with tenant_uow(discovered_workspace_uuid):
+                bound = await bind_and_record_use()
+        else:
+            bound = await bind_and_record_use()
+        if bound is None:
+            return None
+        binding, scoped_key = bound
+        raw_scopes = list(scoped_key.scopes or [])
+        permissions = (
+            frozenset(permission.value for permission in Permission)
+            if '*' in raw_scopes
+            else frozenset(self._normalize_scopes(raw_scopes))
+        )
+        return ApiKeyIdentity(
+            instance_uuid=binding.instance_uuid,
+            workspace_uuid=binding.workspace_uuid,
+            placement_generation=binding.placement_generation,
+            api_key_uuid=scoped_key.uuid,
+            permissions=permissions,
        )

-        key_obj = result.first()
-        return key_obj is not None
+    async def verify_api_key(self, secret: str) -> bool:
+        try:
+            return await self.authenticate_api_key(secret) is not None
+        except Exception:
+            return False

-    async def delete_api_key(self, key_id: int) -> None:
-        """Delete an API key"""
-        await self.ap.persistence_mgr.execute_async(sqlalchemy.delete(apikey.ApiKey).where(apikey.ApiKey.id == key_id))
-
-    async def update_api_key(self, key_id: int, name: str = None, description: str = None) -> None:
-        """Update an API key's metadata (name, description)"""
-        update_data = {}
-        if name is not None:
-            update_data['name'] = name
-        if description is not None:
-            update_data['description'] = description
-
-        if update_data:
-            await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.update(apikey.ApiKey).where(apikey.ApiKey.id == key_id).values(**update_data)
+    async def delete_api_key(self, context: TenantContext, key_id: int) -> None:
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.update(apikey.ApiKey)
+                .where(apikey.ApiKey.id == key_id)
+                .values(status=apikey.ApiKeyStatus.REVOKED.value),
+                apikey.ApiKey,
+                context,
            )
+        )
+        if getattr(result, 'rowcount', 0) == 0:
+            raise WorkspaceNotFoundError('API key not found')
+
+    async def update_api_key(
+        self,
+        context: TenantContext,
+        key_id: int,
+        name: str | None = None,
+        description: str | None = None,
+    ) -> None:
+        update_data: dict[str, typing.Any] = {}
+        if name is not None:
+            normalized_name = name.strip()
+            if not normalized_name:
+                raise ValueError('Name is required')
+            update_data['name'] = normalized_name
+        if description is not None:
+            update_data['description'] = description.strip()
+        if not update_data:
+            return
+
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.update(apikey.ApiKey).where(apikey.ApiKey.id == key_id).values(**update_data),
+                apikey.ApiKey,
+                context,
+            )
+        )
+        if getattr(result, 'rowcount', 0) == 0:
+            raise WorkspaceNotFoundError('API key not found')
@@ -2,11 +2,12 @@ from __future__ import annotations

 import uuid
 import sqlalchemy
-import typing

 from ....core import app
 from ....entity.persistence import bot as persistence_bot
 from ....entity.persistence import pipeline as persistence_pipeline
+from ....workspace.errors import WorkspaceNotFoundError
+from .tenant import TenantContext, require_workspace_uuid, scope_statement


 class BotService:
@@ -17,9 +18,11 @@ class BotService:
    def __init__(self, ap: app.Application) -> None:
        self.ap = ap

-    async def get_bots(self, include_secret: bool = True) -> list[dict]:
+    async def get_bots(self, context: TenantContext, include_secret: bool = False) -> list[dict]:
        """获取所有机器人"""
-        result = await self.ap.persistence_mgr.execute_async(sqlalchemy.select(persistence_bot.Bot))
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(sqlalchemy.select(persistence_bot.Bot), persistence_bot.Bot, context)
+        )

        bots = result.all()

@@ -29,10 +32,14 @@ class BotService:

        return [self.ap.persistence_mgr.serialize_model(persistence_bot.Bot, bot, masked_columns) for bot in bots]

-    async def get_bot(self, bot_uuid: str, include_secret: bool = True) -> dict | None:
+    async def get_bot(self, context: TenantContext, bot_uuid: str, include_secret: bool = False) -> dict | None:
        """获取机器人"""
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_bot.Bot).where(persistence_bot.Bot.uuid == bot_uuid)
+            scope_statement(
+                sqlalchemy.select(persistence_bot.Bot).where(persistence_bot.Bot.uuid == bot_uuid),
+                persistence_bot.Bot,
+                context,
+            )
        )

        bot = result.first()
@@ -46,15 +53,20 @@ class BotService:

        return self.ap.persistence_mgr.serialize_model(persistence_bot.Bot, bot, masked_columns)

-    async def get_runtime_bot_info(self, bot_uuid: str, include_secret: bool = True) -> dict:
+    async def get_runtime_bot_info(
+        self,
+        context: TenantContext,
+        bot_uuid: str,
+        include_secret: bool = False,
+    ) -> dict:
        """获取机器人运行时信息"""
-        persistence_bot = await self.get_bot(bot_uuid, include_secret)
+        persistence_bot = await self.get_bot(context, bot_uuid, include_secret)
        if persistence_bot is None:
-            raise Exception('Bot not found')
+            raise WorkspaceNotFoundError('Bot not found')

        adapter_runtime_values = {}

-        runtime_bot = await self.ap.platform_mgr.get_bot_by_uuid(bot_uuid)
+        runtime_bot = await self.ap.platform_mgr.get_bot_by_uuid(context, bot_uuid)
        if runtime_bot is not None:
            adapter_runtime_values['bot_account_id'] = runtime_bot.adapter.bot_account_id

@@ -86,22 +98,29 @@ class BotService:

        return persistence_bot

-    async def create_bot(self, bot_data: dict) -> str:
+    async def create_bot(self, context: TenantContext, bot_data: dict) -> str:
        """Create bot"""
+        workspace_uuid = require_workspace_uuid(context)
        # Check limitation
        limitation = self.ap.instance_config.data.get('system', {}).get('limitation', {})
        max_bots = limitation.get('max_bots', -1)
        if max_bots >= 0:
-            existing_bots = await self.get_bots()
+            existing_bots = await self.get_bots(context)
            if len(existing_bots) >= max_bots:
                raise ValueError(f'Maximum number of bots ({max_bots}) reached')

        # TODO: 检查配置信息格式
+        bot_data = bot_data.copy()
        bot_data['uuid'] = str(uuid.uuid4())
+        bot_data['workspace_uuid'] = workspace_uuid

        # bind the most recently updated pipeline if any exist
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_pipeline.LegacyPipeline)
+            scope_statement(
+                sqlalchemy.select(persistence_pipeline.LegacyPipeline),
+                persistence_pipeline.LegacyPipeline,
+                context,
+            )
            .order_by(persistence_pipeline.LegacyPipeline.updated_at.desc())
            .limit(1)
        )
@@ -112,61 +131,84 @@ class BotService:

        await self.ap.persistence_mgr.execute_async(sqlalchemy.insert(persistence_bot.Bot).values(bot_data))

-        bot = await self.get_bot(bot_data['uuid'])
+        bot = await self.get_bot(context, bot_data['uuid'], include_secret=True)

-        await self.ap.platform_mgr.load_bot(bot)
+        await self.ap.platform_mgr.load_bot(context, bot)

        return bot_data['uuid']

-    async def update_bot(self, bot_uuid: str, bot_data: dict) -> None:
+    async def update_bot(self, context: TenantContext, bot_uuid: str, bot_data: dict) -> None:
        """Update bot"""
+        workspace_uuid = require_workspace_uuid(context)
        update_data = bot_data.copy()

-        if 'uuid' in update_data:
-            del update_data['uuid']
+        update_data.pop('uuid', None)
+        update_data.pop('workspace_uuid', None)

        # set use_pipeline_name
        if 'use_pipeline_uuid' in update_data:
            result = await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.select(persistence_pipeline.LegacyPipeline).where(
-                    persistence_pipeline.LegacyPipeline.uuid == update_data['use_pipeline_uuid']
+                scope_statement(
+                    sqlalchemy.select(persistence_pipeline.LegacyPipeline).where(
+                        persistence_pipeline.LegacyPipeline.uuid == update_data['use_pipeline_uuid']
+                    ),
+                    persistence_pipeline.LegacyPipeline,
+                    workspace_uuid,
                )
            )
            pipeline = result.first()
            if pipeline is not None:
                update_data['use_pipeline_name'] = pipeline.name
            else:
-                raise Exception('Pipeline not found')
+                raise WorkspaceNotFoundError('Pipeline not found')

-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.update(persistence_bot.Bot).values(update_data).where(persistence_bot.Bot.uuid == bot_uuid)
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.update(persistence_bot.Bot).values(update_data).where(persistence_bot.Bot.uuid == bot_uuid),
+                persistence_bot.Bot,
+                workspace_uuid,
+            )
        )
-        await self.ap.platform_mgr.remove_bot(bot_uuid)
+        if getattr(result, 'rowcount', None) == 0:
+            raise WorkspaceNotFoundError('Bot not found')
+        await self.ap.platform_mgr.remove_bot(context, bot_uuid)

        # select from db
-        bot = await self.get_bot(bot_uuid)
+        bot = await self.get_bot(context, bot_uuid, include_secret=True)

-        runtime_bot = await self.ap.platform_mgr.load_bot(bot)
+        runtime_bot = await self.ap.platform_mgr.load_bot(context, bot)

        if runtime_bot.enable:
            await runtime_bot.run()

        # update all conversation that use this bot
        for session in self.ap.sess_mgr.session_list:
-            if session.using_conversation is not None and session.using_conversation.bot_uuid == bot_uuid:
+            if (
+                session.using_conversation is not None
+                and session.using_conversation.bot_uuid == bot_uuid
+                and getattr(session, 'workspace_uuid', workspace_uuid) == workspace_uuid
+            ):
                session.using_conversation = None

-    async def delete_bot(self, bot_uuid: str) -> None:
+    async def delete_bot(self, context: TenantContext, bot_uuid: str) -> None:
        """Delete bot"""
-        await self.ap.platform_mgr.remove_bot(bot_uuid)
-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.delete(persistence_bot.Bot).where(persistence_bot.Bot.uuid == bot_uuid)
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.delete(persistence_bot.Bot).where(persistence_bot.Bot.uuid == bot_uuid),
+                persistence_bot.Bot,
+                context,
+            )
        )
+        if getattr(result, 'rowcount', None) == 0:
+            raise WorkspaceNotFoundError('Bot not found')
+        await self.ap.platform_mgr.remove_bot(context, bot_uuid)

    async def list_event_logs(
-        self, bot_uuid: str, from_index: int, max_count: int
-    ) -> typing.Tuple[list[dict], int, int, int]:
-        runtime_bot = await self.ap.platform_mgr.get_bot_by_uuid(bot_uuid)
+        self, context: TenantContext, bot_uuid: str, from_index: int, max_count: int
+    ) -> tuple[list[dict], int]:
+        if await self.get_bot(context, bot_uuid, include_secret=False) is None:
+            raise WorkspaceNotFoundError('Bot not found')
+        runtime_bot = await self.ap.platform_mgr.get_bot_by_uuid(context, bot_uuid)
        if runtime_bot is None:
            raise Exception('Bot not found')

@@ -174,7 +216,14 @@ class BotService:

        return [log.to_json() for log in logs], total_count

-    async def send_message(self, bot_uuid: str, target_type: str, target_id: str, message_chain_data: dict) -> None:
+    async def send_message(
+        self,
+        context: TenantContext,
+        bot_uuid: str,
+        target_type: str,
+        target_id: str,
+        message_chain_data: dict,
+    ) -> None:
        """Send message to a specific target via bot

        Args:
@@ -183,11 +232,14 @@ class BotService:
            target_id: The ID of the target
            message_chain_data: The message chain data in dict format
        """
+        if await self.get_bot(context, bot_uuid, include_secret=False) is None:
+            raise WorkspaceNotFoundError('Bot not found')
+
        # Import here to avoid circular imports
        import langbot_plugin.api.entities.builtin.platform.message as platform_message

        # Get runtime bot
-        runtime_bot = await self.ap.platform_mgr.get_bot_by_uuid(bot_uuid)
+        runtime_bot = await self.ap.platform_mgr.get_bot_by_uuid(context, bot_uuid)
        if runtime_bot is None:
            raise Exception(f'Bot not found: {bot_uuid}')

@@ -202,19 +254,29 @@ class BotService:

    # ============ Bot Admins ============

-    async def get_bot_admins(self, bot_uuid: str) -> list[dict]:
+    async def get_bot_admins(self, context: TenantContext, bot_uuid: str) -> list[dict]:
        from ....entity.persistence import bot as persistence_bot

+        if await self.get_bot(context, bot_uuid, include_secret=False) is None:
+            raise WorkspaceNotFoundError('Bot not found')
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_bot.BotAdmin).where(persistence_bot.BotAdmin.bot_uuid == bot_uuid)
+            scope_statement(
+                sqlalchemy.select(persistence_bot.BotAdmin).where(persistence_bot.BotAdmin.bot_uuid == bot_uuid),
+                persistence_bot.BotAdmin,
+                context,
+            )
        )
        return [{'id': r.id, 'launcher_type': r.launcher_type, 'launcher_id': r.launcher_id} for r in result.all()]

-    async def add_bot_admin(self, bot_uuid: str, launcher_type: str, launcher_id: str) -> int:
+    async def add_bot_admin(self, context: TenantContext, bot_uuid: str, launcher_type: str, launcher_id: str) -> int:
        from ....entity.persistence import bot as persistence_bot

+        workspace_uuid = require_workspace_uuid(context)
+        if await self.get_bot(context, bot_uuid, include_secret=False) is None:
+            raise WorkspaceNotFoundError('Bot not found')
        result = await self.ap.persistence_mgr.execute_async(
            sqlalchemy.insert(persistence_bot.BotAdmin).values(
+                workspace_uuid=workspace_uuid,
                bot_uuid=bot_uuid,
                launcher_type=launcher_type,
                launcher_id=launcher_id,
@@ -222,12 +284,18 @@ class BotService:
        )
        return result.inserted_primary_key[0]

-    async def delete_bot_admin(self, bot_uuid: str, admin_id: int) -> None:
+    async def delete_bot_admin(self, context: TenantContext, bot_uuid: str, admin_id: int) -> None:
        from ....entity.persistence import bot as persistence_bot

+        if await self.get_bot(context, bot_uuid, include_secret=False) is None:
+            raise WorkspaceNotFoundError('Bot not found')
        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.delete(persistence_bot.BotAdmin).where(
-                persistence_bot.BotAdmin.bot_uuid == bot_uuid,
-                persistence_bot.BotAdmin.id == admin_id,
+            scope_statement(
+                sqlalchemy.delete(persistence_bot.BotAdmin).where(
+                    persistence_bot.BotAdmin.bot_uuid == bot_uuid,
+                    persistence_bot.BotAdmin.id == admin_id,
+                ),
+                persistence_bot.BotAdmin,
+                context,
            )
        )
@@ -2,8 +2,13 @@ from __future__ import annotations

 import sqlalchemy

+from ....api.http.authz import WorkspaceRequiredError
+from ....api.http.context import ExecutionContext, RequestContext
 from ....core import app
 from ....entity.persistence import rag as persistence_rag
+from ....workspace.errors import WorkspaceNotFoundError
+from .secrets import redact_secrets, restore_secret_placeholders
+from .tenant import TenantContext, require_workspace_uuid


 class KnowledgeService:
@@ -14,34 +19,69 @@ class KnowledgeService:
    def __init__(self, ap: app.Application) -> None:
        self.ap = ap

-    async def get_knowledge_bases(self) -> list[dict]:
+    @staticmethod
+    def _execution_context(context: RequestContext | ExecutionContext) -> ExecutionContext:
+        if isinstance(context, RequestContext):
+            return ExecutionContext.from_request(context)
+        if isinstance(context, ExecutionContext):
+            return context
+        raise WorkspaceRequiredError('RequestContext or ExecutionContext is required')
+
+    async def get_knowledge_bases(self, context: TenantContext, *, include_secret: bool = False) -> list[dict]:
        """获取所有知识库"""
-        return await self.ap.rag_mgr.get_all_knowledge_base_details()
+        require_workspace_uuid(context)
+        knowledge_bases = await self.ap.rag_mgr.get_all_knowledge_base_details(context)
+        return knowledge_bases if include_secret else [redact_secrets(base) for base in knowledge_bases]

-    async def get_knowledge_base(self, kb_uuid: str) -> dict | None:
+    async def get_knowledge_base(
+        self,
+        context: TenantContext,
+        kb_uuid: str,
+        *,
+        include_secret: bool = False,
+    ) -> dict | None:
        """获取知识库"""
-        return await self.ap.rag_mgr.get_knowledge_base_details(kb_uuid)
+        require_workspace_uuid(context)
+        knowledge_base = await self.ap.rag_mgr.get_knowledge_base_details(context, kb_uuid)
+        if knowledge_base is None or include_secret:
+            return knowledge_base
+        return redact_secrets(knowledge_base)

-    async def create_knowledge_base(self, kb_data: dict) -> str:
+    async def create_knowledge_base(
+        self,
+        context: RequestContext | ExecutionContext,
+        kb_data: dict,
+    ) -> str:
        """创建知识库"""
+        require_workspace_uuid(context)
        # In new architecture, we delegate entirely to RAGManager which uses plugins.
        # Legacy internal KB creation is removed.
+        limitation = (
+            getattr(getattr(self.ap, 'instance_config', None), 'data', {}).get('system', {}).get('limitation', {})
+        )
+        max_knowledge_bases = limitation.get('max_knowledge_bases', -1)
+        if max_knowledge_bases >= 0:
+            knowledge_bases = await self.ap.rag_mgr.get_all_knowledge_base_details(context)
+            if len(knowledge_bases) >= max_knowledge_bases:
+                raise ValueError(f'Maximum number of knowledge bases ({max_knowledge_bases}) reached')

        knowledge_engine_plugin_id = kb_data.get('knowledge_engine_plugin_id')
        if not knowledge_engine_plugin_id:
            raise ValueError('knowledge_engine_plugin_id is required')

-        creation_settings = kb_data.get('creation_settings', {})
+        creation_settings = restore_secret_placeholders(kb_data.get('creation_settings', {}))
        retrieval_settings = kb_data.get('retrieval_settings', {})

        # Validate required fields based on plugin's creation_schema and retrieval_schema
        await self._validate_schema_required_fields(
+            context,
            knowledge_engine_plugin_id,
            creation_settings,
            retrieval_settings,
        )

        kb = await self.ap.rag_mgr.create_knowledge_base(
+            context,
            name=kb_data.get('name', 'Untitled'),
            knowledge_engine_plugin_id=knowledge_engine_plugin_id,
            creation_settings=creation_settings,
@@ -52,6 +92,7 @@ class KnowledgeService:

    async def _validate_schema_required_fields(
        self,
+        context: RequestContext | ExecutionContext,
        plugin_id: str,
        creation_settings: dict,
        retrieval_settings: dict,
@@ -69,7 +110,11 @@ class KnowledgeService:
        Raises:
            ValueError: If any required field is missing or empty.
        """
+        if not self.ap.plugin_connector.is_enable_plugin:
+            return
+
        # Validate creation_schema
+        await self.ap.plugin_connector.require_workspace_context(context)
        try:
            creation_schema = await self.ap.plugin_connector.get_rag_creation_schema(plugin_id)
            self._check_required_fields(creation_schema, creation_settings, 'creation_settings')
@@ -79,6 +124,7 @@ class KnowledgeService:
            self.ap.logger.warning(f'Failed to get creation_schema for validation: {e}')

        # Validate retrieval_schema
+        await self.ap.plugin_connector.require_workspace_context(context)
        try:
            retrieval_schema = await self.ap.plugin_connector.get_rag_retrieval_schema(plugin_id)
            self._check_required_fields(retrieval_schema, retrieval_settings, 'retrieval_settings')
@@ -151,8 +197,16 @@ class KnowledgeService:
                )
                raise ValueError(f'{field_label} is required ({context}.{field_name})')

-    async def update_knowledge_base(self, kb_uuid: str, kb_data: dict) -> None:
+    async def update_knowledge_base(
+        self,
+        context: RequestContext | ExecutionContext,
+        kb_uuid: str,
+        kb_data: dict,
+    ) -> None:
        """更新知识库"""
+        workspace_uuid = require_workspace_uuid(context)
+        if await self.get_knowledge_base(context, kb_uuid) is None:
+            raise WorkspaceNotFoundError('Knowledge base not found')
        # Filter to only mutable fields
        filtered_data = {k: v for k, v in kb_data.items() if k in persistence_rag.KnowledgeBase.MUTABLE_FIELDS}

@@ -162,17 +216,18 @@ class KnowledgeService:
        await self.ap.persistence_mgr.execute_async(
            sqlalchemy.update(persistence_rag.KnowledgeBase)
            .values(filtered_data)
+            .where(persistence_rag.KnowledgeBase.workspace_uuid == workspace_uuid)
            .where(persistence_rag.KnowledgeBase.uuid == kb_uuid)
        )
-        await self.ap.rag_mgr.remove_knowledge_base_from_runtime(kb_uuid)
+        await self.ap.rag_mgr.remove_knowledge_base_from_runtime(context, kb_uuid)

-        kb = await self.get_knowledge_base(kb_uuid)
+        kb = await self.get_knowledge_base(context, kb_uuid, include_secret=True)
        if kb is None:
-            raise Exception('Knowledge base not found after update')
+            raise WorkspaceNotFoundError('Knowledge base not found')

-        await self.ap.rag_mgr.load_knowledge_base(kb)
+        await self.ap.rag_mgr.load_knowledge_base(context, kb)

-    async def _check_doc_capability(self, kb_uuid: str, operation: str) -> None:
+    async def _check_doc_capability(self, context: TenantContext, kb_uuid: str, operation: str) -> None:
        """Check if the KB's Knowledge Engine supports document operations.

        Args:
@@ -182,104 +237,145 @@ class KnowledgeService:
        Raises:
            Exception: If the KB does not support doc_ingestion.
        """
-        kb_info = await self.ap.rag_mgr.get_knowledge_base_details(kb_uuid)
+        kb_info = await self.ap.rag_mgr.get_knowledge_base_details(context, kb_uuid)
        if not kb_info:
-            raise Exception('Knowledge base not found')
+            raise WorkspaceNotFoundError('Knowledge base not found')
        capabilities = kb_info.get('knowledge_engine', {}).get('capabilities', [])
        if 'doc_ingestion' not in capabilities:
            raise Exception(f'This knowledge base does not support {operation}')

-    async def store_file(self, kb_uuid: str, file_id: str, parser_plugin_id: str | None = None) -> str:
+    async def store_file(
+        self,
+        context: RequestContext | ExecutionContext,
+        kb_uuid: str,
+        file_id: str,
+        parser_plugin_id: str | None = None,
+    ) -> str:
        """存储文件"""
-        runtime_kb = await self.ap.rag_mgr.get_knowledge_base_by_uuid(kb_uuid)
+        execution_context = self._execution_context(context)
+        runtime_kb = await self.ap.rag_mgr.get_knowledge_base_by_uuid(execution_context, kb_uuid)
        if runtime_kb is None:
-            raise Exception('Knowledge base not found')
+            raise WorkspaceNotFoundError('Knowledge base not found')

-        await self._check_doc_capability(kb_uuid, 'document upload')
+        await self._check_doc_capability(context, kb_uuid, 'document upload')

-        result = await runtime_kb.store_file(file_id, parser_plugin_id=parser_plugin_id)
+        result = await runtime_kb.store_file(execution_context, file_id, parser_plugin_id=parser_plugin_id)

        # Update the KB's updated_at timestamp
        await self.ap.persistence_mgr.execute_async(
            sqlalchemy.update(persistence_rag.KnowledgeBase)
            .values(updated_at=sqlalchemy.func.now())
+            .where(persistence_rag.KnowledgeBase.workspace_uuid == execution_context.workspace_uuid)
            .where(persistence_rag.KnowledgeBase.uuid == kb_uuid)
        )

        return result

    async def retrieve_knowledge_base(
-        self, kb_uuid: str, query: str, retrieval_settings: dict | None = None
+        self,
+        context: RequestContext | ExecutionContext,
+        kb_uuid: str,
+        query: str,
+        retrieval_settings: dict | None = None,
    ) -> list[dict]:
        """检索知识库"""
-        runtime_kb = await self.ap.rag_mgr.get_knowledge_base_by_uuid(kb_uuid)
+        execution_context = self._execution_context(context)
+        runtime_kb = await self.ap.rag_mgr.get_knowledge_base_by_uuid(execution_context, kb_uuid)
        if runtime_kb is None:
-            raise Exception('Knowledge base not found')
+            raise WorkspaceNotFoundError('Knowledge base not found')

        # Pass retrieval_settings
-        results = await runtime_kb.retrieve(query, settings=retrieval_settings)
+        results = await runtime_kb.retrieve(execution_context, query, settings=retrieval_settings)

        return [result.model_dump() for result in results]

-    async def get_files_by_knowledge_base(self, kb_uuid: str) -> list[dict]:
+    async def get_files_by_knowledge_base(self, context: TenantContext, kb_uuid: str) -> list[dict]:
        """获取知识库文件"""
+        workspace_uuid = require_workspace_uuid(context)
+        if await self.get_knowledge_base(context, kb_uuid) is None:
+            raise WorkspaceNotFoundError('Knowledge base not found')
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_rag.File).where(persistence_rag.File.kb_id == kb_uuid)
+            sqlalchemy.select(persistence_rag.File)
+            .where(persistence_rag.File.workspace_uuid == workspace_uuid)
+            .where(persistence_rag.File.kb_id == kb_uuid)
        )
        files = result.all()
        return [self.ap.persistence_mgr.serialize_model(persistence_rag.File, file) for file in files]

-    async def delete_file(self, kb_uuid: str, file_id: str) -> None:
+    async def delete_file(
+        self,
+        context: RequestContext | ExecutionContext,
+        kb_uuid: str,
+        file_id: str,
+    ) -> None:
        """删除文件"""
-        runtime_kb = await self.ap.rag_mgr.get_knowledge_base_by_uuid(kb_uuid)
+        execution_context = self._execution_context(context)
+        runtime_kb = await self.ap.rag_mgr.get_knowledge_base_by_uuid(execution_context, kb_uuid)
        if runtime_kb is None:
-            raise Exception('Knowledge base not found')
+            raise WorkspaceNotFoundError('Knowledge base not found')

-        await self._check_doc_capability(kb_uuid, 'document deletion')
+        await self._check_doc_capability(context, kb_uuid, 'document deletion')

-        await runtime_kb.delete_file(file_id)
+        await runtime_kb.delete_file(execution_context, file_id)

        # Update the KB's updated_at timestamp
        await self.ap.persistence_mgr.execute_async(
            sqlalchemy.update(persistence_rag.KnowledgeBase)
            .values(updated_at=sqlalchemy.func.now())
+            .where(persistence_rag.KnowledgeBase.workspace_uuid == execution_context.workspace_uuid)
            .where(persistence_rag.KnowledgeBase.uuid == kb_uuid)
        )

-    async def delete_knowledge_base(self, kb_uuid: str) -> None:
+    async def delete_knowledge_base(
+        self,
+        context: RequestContext | ExecutionContext,
+        kb_uuid: str,
+    ) -> None:
        """删除知识库"""
-        # Delete from DB first to commit the deletion, then clean up runtime/plugin (best-effort)
-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.delete(persistence_rag.KnowledgeBase).where(persistence_rag.KnowledgeBase.uuid == kb_uuid)
-        )
+        workspace_uuid = require_workspace_uuid(context)
+        if await self.get_knowledge_base(context, kb_uuid) is None:
+            raise WorkspaceNotFoundError('Knowledge base not found')

        # delete files
        # NOTE: Chunk cleanup is for legacy (pre-plugin) KBs that stored chunks locally.
        # For plugin-based Knowledge Engines, the Chunk table is not populated, so this is a no-op.
        files = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_rag.File).where(persistence_rag.File.kb_id == kb_uuid)
+            sqlalchemy.select(persistence_rag.File)
+            .where(persistence_rag.File.workspace_uuid == workspace_uuid)
+            .where(persistence_rag.File.kb_id == kb_uuid)
        )
        for file in files:
            # delete chunks
            await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.delete(persistence_rag.Chunk).where(persistence_rag.Chunk.file_id == file.uuid)
+                sqlalchemy.delete(persistence_rag.Chunk)
+                .where(persistence_rag.Chunk.workspace_uuid == workspace_uuid)
+                .where(persistence_rag.Chunk.file_id == file.uuid)
            )
            # delete file
            await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.delete(persistence_rag.File).where(persistence_rag.File.uuid == file.uuid)
+                sqlalchemy.delete(persistence_rag.File)
+                .where(persistence_rag.File.workspace_uuid == workspace_uuid)
+                .where(persistence_rag.File.uuid == file.uuid)
            )

-        # Remove from runtime and notify plugin (best-effort, DB is already cleaned up)
-        await self.ap.rag_mgr.delete_knowledge_base(kb_uuid)
+        # Remove from runtime and notify plugin before deleting the owning row.
+        await self.ap.rag_mgr.delete_knowledge_base(context, kb_uuid)
+        await self.ap.persistence_mgr.execute_async(
+            sqlalchemy.delete(persistence_rag.KnowledgeBase)
+            .where(persistence_rag.KnowledgeBase.workspace_uuid == workspace_uuid)
+            .where(persistence_rag.KnowledgeBase.uuid == kb_uuid)
+        )

    # ================= Knowledge Engine Discovery =================

-    async def list_knowledge_engines(self) -> list[dict]:
+    async def list_knowledge_engines(self, context: TenantContext) -> list[dict]:
        """List all available Knowledge Engines from plugins."""
+        require_workspace_uuid(context)
        engines = []

        if not self.ap.plugin_connector.is_enable_plugin:
            return engines
+        await self.ap.plugin_connector.require_workspace_context(context)

        # Get KnowledgeEngine plugins
        try:
@@ -290,10 +386,12 @@ class KnowledgeService:

        return engines

-    async def list_parsers(self, mime_type: str | None = None) -> list[dict]:
+    async def list_parsers(self, context: TenantContext, mime_type: str | None = None) -> list[dict]:
        """List available parsers, optionally filtered by MIME type."""
+        require_workspace_uuid(context)
        if not self.ap.plugin_connector.is_enable_plugin:
            return []
+        await self.ap.plugin_connector.require_workspace_context(context)
        try:
            parsers = await self.ap.plugin_connector.list_parsers()
            if mime_type:
@@ -303,16 +401,24 @@ class KnowledgeService:
            self.ap.logger.warning(f'Failed to list parsers: {e}')
            return []

-    async def get_engine_creation_schema(self, plugin_id: str) -> dict:
+    async def get_engine_creation_schema(self, context: TenantContext, plugin_id: str) -> dict:
        """Get creation settings schema for a specific Knowledge Engine."""
+        require_workspace_uuid(context)
+        if not self.ap.plugin_connector.is_enable_plugin:
+            return {}
+        await self.ap.plugin_connector.require_workspace_context(context)
        try:
            return await self.ap.plugin_connector.get_rag_creation_schema(plugin_id)
        except Exception as e:
            self.ap.logger.warning(f'Failed to get creation schema for {plugin_id}: {e}')
            return {}

-    async def get_engine_retrieval_schema(self, plugin_id: str) -> dict:
+    async def get_engine_retrieval_schema(self, context: TenantContext, plugin_id: str) -> dict:
        """Get retrieval settings schema for a specific Knowledge Engine."""
+        require_workspace_uuid(context)
+        if not self.ap.plugin_connector.is_enable_plugin:
+            return {}
+        await self.ap.plugin_connector.require_workspace_context(context)
        try:
            return await self.ap.plugin_connector.get_rag_retrieval_schema(plugin_id)
        except Exception as e:
@@ -1,6 +1,8 @@
 from __future__ import annotations

+import asyncio
 import datetime
+import functools
 import os
 import re
 from pathlib import Path
@@ -11,11 +13,36 @@ import sqlalchemy
 from ....core import app
 from ....entity.persistence import bstorage as persistence_bstorage
 from ....entity.persistence import monitoring as persistence_monitoring
+from ..authz import WorkspaceRequiredError
+from ..context import ExecutionContext
+from .tenant import TenantContext, require_workspace_uuid


 LOG_FILE_PATTERN = re.compile(r'^langbot-(\d{4}-\d{2}-\d{2})\.log(?:\.\d+)?$')
 DEFAULT_UPLOAD_FILE_RETENTION_DAYS = 7
 DEFAULT_LOG_RETENTION_DAYS = 3
+DEFAULT_MAX_FILES_PER_RUN = 1000
+HARD_MAX_FILES_PER_RUN = 10000
+UPLOAD_OWNER_TYPES = ('upload_image', 'upload_document', 'upload')
+
+
+def _workspace_scope(method):
+    """Bind maintenance work to a Workspace without spanning external I/O."""
+
+    @functools.wraps(method)
+    async def wrapped(self, context, *args, **kwargs):
+        workspace_uuid = require_workspace_uuid(context)
+        persistence_mgr = getattr(self.ap, 'persistence_mgr', None)
+        tenant_scope = getattr(persistence_mgr, 'tenant_scope', None)
+        cloud_runtime = getattr(getattr(persistence_mgr, 'mode', None), 'value', None) == 'cloud_runtime'
+        if cloud_runtime:
+            if not callable(tenant_scope):
+                raise RuntimeError('Cloud maintenance requires an explicit tenant scope')
+            async with tenant_scope(workspace_uuid):
+                return await method(self, context, *args, **kwargs)
+        return await method(self, context, *args, **kwargs)
+
+    return wrapped


 class MaintenanceService:
@@ -26,7 +53,22 @@ class MaintenanceService:
    def __init__(self, ap: app.Application) -> None:
        self.ap = ap

-    async def cleanup_expired_files(self) -> dict[str, int]:
+    def _max_files_per_run(self) -> int:
+        cleanup_cfg = (
+            getattr(getattr(self.ap, 'instance_config', None), 'data', {}).get('storage', {}).get('cleanup', {})
+        )
+        value = self._positive_int(
+            cleanup_cfg.get('max_files_per_run', DEFAULT_MAX_FILES_PER_RUN),
+            DEFAULT_MAX_FILES_PER_RUN,
+            'storage.cleanup.max_files_per_run',
+        )
+        return min(value, HARD_MAX_FILES_PER_RUN)
+
+    @_workspace_scope
+    async def cleanup_expired_files(self, context: ExecutionContext) -> dict[str, int]:
+        if not isinstance(context, ExecutionContext):
+            raise WorkspaceRequiredError('Storage cleanup requires an ExecutionContext')
+        require_workspace_uuid(context)
        cleanup_cfg = self.ap.instance_config.data.get('storage', {}).get('cleanup', {})
        upload_retention_days = self._positive_int(
            cleanup_cfg.get('uploaded_file_retention_days'),
@@ -40,11 +82,17 @@ class MaintenanceService:
        )

        return {
-            'uploaded_files': await self._cleanup_expired_uploaded_files(upload_retention_days),
-            'log_files': self._cleanup_expired_log_files(log_retention_days),
+            'uploaded_files': await self._cleanup_expired_uploaded_files(context, upload_retention_days),
+            'log_files': await asyncio.to_thread(
+                self._cleanup_expired_log_files,
+                log_retention_days,
+            )
+            if await self._is_oss_singleton(context)
+            else 0,
        }

-    async def get_storage_analysis(self) -> dict[str, Any]:
+    async def get_storage_analysis(self, context: TenantContext) -> dict[str, Any]:
+        require_workspace_uuid(context)
        cleanup_cfg = self.ap.instance_config.data.get('storage', {}).get('cleanup', {})
        upload_retention_days = self._positive_int(
            cleanup_cfg.get('uploaded_file_retention_days'),
@@ -62,32 +110,34 @@ class MaintenanceService:
        database_path = (
            Path(database_cfg.get('sqlite', {}).get('path', 'data/langbot.db')) if database_type == 'sqlite' else None
        )
-        roots: list[tuple[str, Path | None]] = [
-            ('database', database_path),
-            ('logs', Path('data/logs')),
-            ('storage', Path('data/storage')),
-            ('vector_store', Path('data/chroma')),
-            ('plugins', Path('data/plugins')),
-            ('mcp', Path('data/mcp')),
-            ('temp', Path('data/temp')),
-        ]
+        is_oss_singleton = await self._is_oss_singleton(context)
+        if is_oss_singleton:
+            roots: list[tuple[str, Path | None]] = [
+                ('database', database_path),
+                ('logs', Path('data/logs')),
+                ('storage', Path('data/storage')),
+                ('vector_store', Path('data/chroma')),
+                ('plugins', Path('data/plugins')),
+                ('mcp', Path('data/mcp')),
+                ('temp', Path('data/temp')),
+            ]
+        else:
+            scoped_storage_path = Path('data/storage') / self.ap.storage_mgr.scoped_prefix(context)
+            roots = [('storage', scoped_storage_path)]

-        sections = []
-        for key, path in roots:
-            sections.append(
-                {
-                    'key': key,
-                    'path': str(path) if path else '',
-                    'exists': path.exists() if path else False,
-                    'size_bytes': self._path_size(path) if path else 0,
-                    'file_count': self._file_count(path) if path else 0,
-                }
+        sections = await asyncio.to_thread(self._collect_sections, roots)
+
+        monitoring_counts = await self._monitoring_counts(context)
+        binary_storage = await self._binary_storage_stats(context)
+        upload_candidates = await self._expired_uploaded_candidates(context, upload_retention_days)
+        log_candidates = (
+            await asyncio.to_thread(
+                self._expired_log_candidates,
+                log_retention_days,
            )
-
-        monitoring_counts = await self._monitoring_counts()
-        binary_storage = await self._binary_storage_stats()
-        upload_candidates = await self._expired_uploaded_candidates(upload_retention_days)
-        log_candidates = self._expired_log_candidates(log_retention_days)
+            if is_oss_singleton
+            else []
+        )

        return {
            'generated_at': datetime.datetime.now(datetime.timezone.utc).isoformat(),
@@ -105,70 +155,156 @@ class MaintenanceService:
                'uploaded_files': upload_candidates,
                'log_files': log_candidates,
            },
-            'tasks': self.ap.task_mgr.get_stats() if self.ap.task_mgr else {},
+            'tasks': self.ap.task_mgr.get_stats() if is_oss_singleton and self.ap.task_mgr else {},
        }

-    async def _cleanup_expired_uploaded_files(self, retention_days: int) -> int:
+    def _collect_sections(
+        self,
+        roots: list[tuple[str, Path | None]],
+    ) -> list[dict[str, Any]]:
+        sections = []
+        for key, path in roots:
+            sections.append(
+                {
+                    'key': key,
+                    'path': str(path) if path else '',
+                    'exists': path.exists() if path else False,
+                    'size_bytes': self._path_size(path) if path else 0,
+                    'file_count': self._file_count(path) if path else 0,
+                }
+            )
+        return sections
+
+    async def _is_oss_singleton(self, context: TenantContext) -> bool:
+        try:
+            await self.ap.workspace_service.get_local_execution_binding(
+                require_workspace_uuid(context),
+                expected_generation=getattr(context, 'placement_generation', None),
+            )
+        except Exception:
+            return False
+        return True
+
+    async def _cleanup_expired_uploaded_files(
+        self,
+        context: ExecutionContext,
+        retention_days: int,
+    ) -> int:
        provider = self.ap.storage_mgr.storage_provider
        provider_name = provider.__class__.__name__
        if provider_name == 'LocalStorageProvider':
-            candidates = self._expired_local_upload_candidates(retention_days, include_paths=True)
-            deleted = 0
-            for item in candidates:
-                try:
-                    os.remove(item['path'])
-                    deleted += 1
-                except FileNotFoundError:
-                    pass
-                except Exception as e:
-                    self.ap.logger.warning(f'Failed to delete expired uploaded file {item["key"]}: {e}')
-            return deleted
+            candidates = await asyncio.to_thread(
+                self._expired_local_upload_candidates,
+                context,
+                retention_days,
+                True,
+            )
+            return await asyncio.to_thread(
+                self._delete_local_candidates,
+                candidates,
+            )

        if provider_name == 'S3StorageProvider':
-            return await self._cleanup_expired_s3_uploaded_files(retention_days)
+            return await self._cleanup_expired_s3_uploaded_files(context, retention_days)

        return 0

-    async def _expired_uploaded_candidates(self, retention_days: int) -> list[dict[str, Any]]:
+    async def _expired_uploaded_candidates(
+        self,
+        context: TenantContext,
+        retention_days: int,
+    ) -> list[dict[str, Any]]:
        provider_name = self.ap.storage_mgr.storage_provider.__class__.__name__
        if provider_name == 'LocalStorageProvider':
-            return self._expired_local_upload_candidates(retention_days)
+            return await asyncio.to_thread(
+                self._expired_local_upload_candidates,
+                context,
+                retention_days,
+            )
        if provider_name == 'S3StorageProvider':
-            return await self._expired_s3_upload_candidates(retention_days)
+            return await self._expired_s3_upload_candidates(context, retention_days)
        return []

-    async def _cleanup_expired_s3_uploaded_files(self, retention_days: int) -> int:
+    async def _cleanup_expired_s3_uploaded_files(
+        self,
+        context: ExecutionContext,
+        retention_days: int,
+    ) -> int:
        provider = self.ap.storage_mgr.storage_provider
-        candidates = await self._expired_s3_upload_candidates(retention_days)
+        candidates = await self._expired_s3_upload_candidates(context, retention_days)
        deleted = 0
        for item in candidates:
            await provider.delete(item['key'])
            deleted += 1
        return deleted

-    async def _expired_s3_upload_candidates(self, retention_days: int) -> list[dict[str, Any]]:
+    async def _expired_s3_upload_candidates(
+        self,
+        context: TenantContext,
+        retention_days: int,
+    ) -> list[dict[str, Any]]:
+        provider = self.ap.storage_mgr.storage_provider
+        run_io = getattr(provider, '_run_io', None)
+        if callable(run_io):
+            return await run_io(
+                self._expired_s3_upload_candidates_sync,
+                context,
+                retention_days,
+            )
+        return await asyncio.to_thread(
+            self._expired_s3_upload_candidates_sync,
+            context,
+            retention_days,
+        )
+
+    def _expired_s3_upload_candidates_sync(
+        self,
+        context: TenantContext,
+        retention_days: int,
+    ) -> list[dict[str, Any]]:
        provider = self.ap.storage_mgr.storage_provider
        cutoff = datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(days=retention_days)
        candidates = []
+        max_candidates = self._max_files_per_run()
        paginator = provider.s3_client.get_paginator('list_objects_v2')

-        for page in paginator.paginate(Bucket=provider.bucket_name):
-            for obj in page.get('Contents', []):
-                key = obj.get('Key', '')
-                last_modified = obj.get('LastModified')
-                if not self._is_uploaded_file_key(key):
-                    continue
-                if last_modified and last_modified < cutoff:
-                    candidates.append(
-                        {
-                            'key': key,
-                            'size_bytes': obj.get('Size', 0),
-                            'modified_at': last_modified.isoformat(),
-                        }
-                    )
+        seen_prefixes: set[str] = set()
+        for owner_type in UPLOAD_OWNER_TYPES:
+            prefix = self.ap.storage_mgr.scoped_prefix(context, owner_type=owner_type)
+            if prefix in seen_prefixes:
+                continue
+            seen_prefixes.add(prefix)
+            for page in paginator.paginate(Bucket=provider.bucket_name, Prefix=prefix):
+                for obj in page.get('Contents', []):
+                    key = obj.get('Key', '')
+                    last_modified = obj.get('LastModified')
+                    if not self._is_uploaded_file_key(context, key):
+                        continue
+                    if last_modified and last_modified < cutoff:
+                        candidates.append(
+                            {
+                                'key': key,
+                                'size_bytes': obj.get('Size', 0),
+                                'modified_at': last_modified.isoformat(),
+                            }
+                        )
+                        if len(candidates) >= max_candidates:
+                            return candidates

        return candidates

+    def _delete_local_candidates(self, candidates: list[dict[str, Any]]) -> int:
+        deleted = 0
+        for item in candidates:
+            try:
+                os.remove(item['path'])
+                deleted += 1
+            except FileNotFoundError:
+                pass
+            except Exception as e:
+                self.ap.logger.warning(f'Failed to delete expired uploaded file {item["key"]}: {e}')
+        return deleted
+
    def _cleanup_expired_log_files(self, retention_days: int) -> int:
        deleted = 0
        for item in self._expired_log_candidates(retention_days, include_paths=True):
@@ -182,28 +318,42 @@ class MaintenanceService:
        return deleted

    def _expired_local_upload_candidates(
-        self, retention_days: int, include_paths: bool = False
+        self,
+        context: TenantContext,
+        retention_days: int,
+        include_paths: bool = False,
    ) -> list[dict[str, Any]]:
        storage_root = Path('data/storage')
-        if not storage_root.exists():
-            return []
-
        cutoff = datetime.datetime.now().timestamp() - retention_days * 86400
        candidates = []
-        for entry in storage_root.iterdir():
-            if not entry.is_file() or not self._is_uploaded_file_key(entry.name):
+        max_candidates = self._max_files_per_run()
+        seen_roots: set[Path] = set()
+        for owner_type in UPLOAD_OWNER_TYPES:
+            scoped_root = storage_root / self.ap.storage_mgr.scoped_prefix(context, owner_type=owner_type)
+            if scoped_root in seen_roots:
                continue
-            stat = entry.stat()
-            if stat.st_mtime >= cutoff:
+            seen_roots.add(scoped_root)
+            if not scoped_root.exists():
                continue
-            item = {
-                'key': entry.name,
-                'size_bytes': stat.st_size,
-                'modified_at': datetime.datetime.fromtimestamp(stat.st_mtime, datetime.timezone.utc).isoformat(),
-            }
-            if include_paths:
-                item['path'] = str(entry)
-            candidates.append(item)
+            for entry in scoped_root.rglob('*'):
+                if not entry.is_file():
+                    continue
+                stat = entry.stat()
+                if stat.st_mtime >= cutoff:
+                    continue
+                item = {
+                    'key': entry.relative_to(storage_root).as_posix(),
+                    'size_bytes': stat.st_size,
+                    'modified_at': datetime.datetime.fromtimestamp(
+                        stat.st_mtime,
+                        datetime.timezone.utc,
+                    ).isoformat(),
+                }
+                if include_paths:
+                    item['path'] = str(entry)
+                candidates.append(item)
+                if len(candidates) >= max_candidates:
+                    return candidates
        return candidates

    def _expired_log_candidates(self, retention_days: int, include_paths: bool = False) -> list[dict[str, Any]]:
@@ -236,33 +386,51 @@ class MaintenanceService:
            candidates.append(item)
        return candidates

-    def _is_uploaded_file_key(self, key: str) -> bool:
-        return '/' not in key and not key.startswith('plugin_config_')
+    def _is_uploaded_file_key(self, context: TenantContext, key: str) -> bool:
+        return any(
+            key.startswith(self.ap.storage_mgr.scoped_prefix(context, owner_type=owner_type))
+            and self.ap.storage_mgr.is_scoped_object_key(key, expected_owner_type=owner_type)
+            for owner_type in UPLOAD_OWNER_TYPES
+        )

-    async def _monitoring_counts(self) -> dict[str, int]:
+    async def _monitoring_counts(self, context: TenantContext) -> dict[str, int]:
+        workspace_uuid = require_workspace_uuid(context)
        tables = {
-            'messages': persistence_monitoring.MonitoringMessage.id,
-            'llm_calls': persistence_monitoring.MonitoringLLMCall.id,
-            'tool_calls': persistence_monitoring.MonitoringToolCall.id,
-            'embedding_calls': persistence_monitoring.MonitoringEmbeddingCall.id,
-            'errors': persistence_monitoring.MonitoringError.id,
-            'sessions': persistence_monitoring.MonitoringSession.session_id,
-            'feedback': persistence_monitoring.MonitoringFeedback.id,
+            'messages': (persistence_monitoring.MonitoringMessage, persistence_monitoring.MonitoringMessage.id),
+            'llm_calls': (persistence_monitoring.MonitoringLLMCall, persistence_monitoring.MonitoringLLMCall.id),
+            'tool_calls': (persistence_monitoring.MonitoringToolCall, persistence_monitoring.MonitoringToolCall.id),
+            'embedding_calls': (
+                persistence_monitoring.MonitoringEmbeddingCall,
+                persistence_monitoring.MonitoringEmbeddingCall.id,
+            ),
+            'errors': (persistence_monitoring.MonitoringError, persistence_monitoring.MonitoringError.id),
+            'sessions': (
+                persistence_monitoring.MonitoringSession,
+                persistence_monitoring.MonitoringSession.session_id,
+            ),
+            'feedback': (persistence_monitoring.MonitoringFeedback, persistence_monitoring.MonitoringFeedback.id),
        }
        counts: dict[str, int] = {}
-        for key, column in tables.items():
-            result = await self.ap.persistence_mgr.execute_async(sqlalchemy.select(sqlalchemy.func.count(column)))
+        for key, (model, column) in tables.items():
+            result = await self.ap.persistence_mgr.execute_async(
+                sqlalchemy.select(sqlalchemy.func.count(column)).where(model.workspace_uuid == workspace_uuid)
+            )
            counts[key] = result.scalar() or 0
        return counts

-    async def _binary_storage_stats(self) -> dict[str, Any]:
+    async def _binary_storage_stats(self, context: TenantContext) -> dict[str, Any]:
+        workspace_uuid = require_workspace_uuid(context)
        count_result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(sqlalchemy.func.count(persistence_bstorage.BinaryStorage.unique_key))
+            sqlalchemy.select(sqlalchemy.func.count(persistence_bstorage.BinaryStorage.unique_key)).where(
+                persistence_bstorage.BinaryStorage.workspace_uuid == workspace_uuid
+            )
        )
        size_bytes = None
        try:
            size_result = await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.select(sqlalchemy.func.sum(sqlalchemy.func.length(persistence_bstorage.BinaryStorage.value)))
+                sqlalchemy.select(
+                    sqlalchemy.func.sum(sqlalchemy.func.length(persistence_bstorage.BinaryStorage.value))
+                ).where(persistence_bstorage.BinaryStorage.workspace_uuid == workspace_uuid)
            )
            size_bytes = size_result.scalar() or 0
        except Exception as e:
@@ -1,198 +1,451 @@
 from __future__ import annotations

-import sqlalchemy
+import copy
+import re
 import uuid
-import asyncio

-from ....core import app
+import sqlalchemy
+
+from ....core import app, taskmgr
+from ....core.task_boundary import create_detached_task
 from ....entity.persistence import mcp as persistence_mcp
-from ....core import taskmgr
-from ....provider.tools.loaders.mcp import RuntimeMCPSession, MCPSessionStatus
+from ....entity.persistence import plugin as persistence_plugin
+from ....provider.tools.loaders.mcp import MCPSessionStatus, RuntimeMCPSession
+from ....provider.tools.loaders.mcp_policy import require_stdio_mcp_enabled
+from ....workspace.errors import WorkspaceNotFoundError
+from ..context import ExecutionContext
+from .secrets import is_url_key, redact_url_secrets, restore_url_secret_placeholders
+from .tenant import TenantContext, require_workspace_uuid, scope_statement
+
+
+_SECRET_MASK = '***'
+_MISSING_SECRET = object()
+_SENSITIVE_CONFIG_NAMES = frozenset(
+    {
+        'api_key',
+        'apikey',
+        'auth',
+        'authorization',
+        'cookie',
+        'credentials',
+        'database_url',
+        'dsn',
+        'key',
+        'proxy_authorization',
+        'set_cookie',
+    }
+)
+_SENSITIVE_CONFIG_TOKENS = frozenset(
+    {
+        'credential',
+        'credentials',
+        'passwd',
+        'password',
+        'secret',
+        'token',
+    }
+)
+_SENSITIVE_KEY_QUALIFIERS = frozenset(
+    {
+        'access',
+        'api',
+        'auth',
+        'bearer',
+        'client',
+        'debug',
+        'encryption',
+        'private',
+        'signing',
+    }
+)
+
+
+def _normalize_config_key(key: object) -> str:
+    value = re.sub(r'([a-z0-9])([A-Z])', r'\1_\2', str(key or ''))
+    return re.sub(r'[^a-zA-Z0-9]+', '_', value).strip('_').lower()
+
+
+def _is_sensitive_config_key(key: object) -> bool:
+    normalized = _normalize_config_key(key)
+    if normalized in _SENSITIVE_CONFIG_NAMES:
+        return True
+    tokens = frozenset(token for token in normalized.split('_') if token)
+    if tokens & _SENSITIVE_CONFIG_TOKENS:
+        return True
+    return 'key' in tokens and bool(tokens & _SENSITIVE_KEY_QUALIFIERS)
+
+
+def _mask_secret_structure(value):
+    if isinstance(value, dict):
+        return {key: _mask_secret_structure(item) for key, item in value.items()}
+    if isinstance(value, list):
+        return [_mask_secret_structure(item) for item in value]
+    if isinstance(value, tuple):
+        return tuple(_mask_secret_structure(item) for item in value)
+    if value is None or value == '':
+        return value
+    return _SECRET_MASK
+
+
+def redact_mcp_secrets(value):
+    """Return a recursively redacted copy of MCP configuration data."""
+
+    if isinstance(value, dict):
+        return {
+            key: (
+                _mask_secret_structure(item)
+                if _is_sensitive_config_key(key)
+                else redact_url_secrets(item)
+                if is_url_key(key)
+                else redact_mcp_secrets(item)
+            )
+            for key, item in value.items()
+        }
+    if isinstance(value, list):
+        return [redact_mcp_secrets(item) for item in value]
+    if isinstance(value, tuple):
+        return tuple(redact_mcp_secrets(item) for item in value)
+    return value
+
+
+def restore_mcp_secret_placeholders(value, current_value=_MISSING_SECRET, *, sensitive: bool = False):
+    """Restore masked leaves from the current MCP config before a write."""
+
+    if sensitive and value == _SECRET_MASK:
+        if current_value is _MISSING_SECRET:
+            raise ValueError('Masked MCP secret has no existing value')
+        return copy.deepcopy(current_value)
+    if isinstance(value, dict):
+        current_mapping = current_value if isinstance(current_value, dict) else {}
+        return {
+            key: (
+                restore_url_secret_placeholders(
+                    item,
+                    current_mapping.get(key, _MISSING_SECRET),
+                )
+                if not sensitive and not _is_sensitive_config_key(key) and is_url_key(key)
+                else restore_mcp_secret_placeholders(
+                    item,
+                    current_mapping.get(key, _MISSING_SECRET),
+                    sensitive=sensitive or _is_sensitive_config_key(key),
+                )
+            )
+            for key, item in value.items()
+        }
+    if isinstance(value, list):
+        current_items = current_value if isinstance(current_value, (list, tuple)) else ()
+        return [
+            restore_mcp_secret_placeholders(
+                item,
+                current_items[index] if index < len(current_items) else _MISSING_SECRET,
+                sensitive=sensitive,
+            )
+            for index, item in enumerate(value)
+        ]
+    if isinstance(value, tuple):
+        current_items = current_value if isinstance(current_value, (list, tuple)) else ()
+        return tuple(
+            restore_mcp_secret_placeholders(
+                item,
+                current_items[index] if index < len(current_items) else _MISSING_SECRET,
+                sensitive=sensitive,
+            )
+            for index, item in enumerate(value)
+        )
+    return value


 class MCPService:
+    """Workspace-scoped MCP configuration and runtime facade."""
+
    ap: app.Application

    def __init__(self, ap: app.Application) -> None:
        self.ap = ap

-    async def get_runtime_info(self, server_name: str) -> dict | None:
-        session = self.ap.tool_mgr.mcp_tool_loader.get_session(server_name)
-        if session:
-            return session.get_runtime_info_dict()
-        return None
+    async def _execution_context(self, context: TenantContext) -> ExecutionContext:
+        workspace_uuid = require_workspace_uuid(context)
+        instance_uuid = str(getattr(context, 'instance_uuid', '') or '').strip()
+        generation = getattr(context, 'placement_generation', None)
+        if not instance_uuid or not isinstance(generation, int) or isinstance(generation, bool) or generation <= 0:
+            raise ValueError('MCP operations require an explicit fenced execution context')
+        binding = await self.ap.workspace_service.get_execution_binding(
+            workspace_uuid,
+            expected_generation=generation,
+        )
+        if binding.instance_uuid != instance_uuid:
+            raise ValueError('MCP execution context belongs to another LangBot instance')
+        return ExecutionContext(
+            instance_uuid=instance_uuid,
+            workspace_uuid=workspace_uuid,
+            placement_generation=generation,
+            bot_uuid=getattr(context, 'bot_uuid', None),
+            pipeline_uuid=getattr(context, 'pipeline_uuid', None),
+            query_uuid=getattr(context, 'query_uuid', None),
+        )

-    async def get_mcp_servers(self, contain_runtime_info: bool = False) -> list[dict]:
-        result = await self.ap.persistence_mgr.execute_async(sqlalchemy.select(persistence_mcp.MCPServer))
+    async def get_runtime_info(self, context: TenantContext, server_name: str) -> dict | None:
+        execution_context = await self._execution_context(context)
+        session = self.ap.tool_mgr.mcp_tool_loader.get_session(execution_context, server_name)
+        return session.get_runtime_info_dict() if session else None

-        servers = result.all()
+    async def get_mcp_servers(self, context: TenantContext, contain_runtime_info: bool = False) -> list[dict]:
+        execution_context = await self._execution_context(context)
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(sqlalchemy.select(persistence_mcp.MCPServer), persistence_mcp.MCPServer, context)
+        )
        serialized_servers = [
-            self.ap.persistence_mgr.serialize_model(persistence_mcp.MCPServer, server) for server in servers
+            redact_mcp_secrets(self.ap.persistence_mgr.serialize_model(persistence_mcp.MCPServer, server))
+            for server in result.all()
        ]
        if contain_runtime_info:
            for server in serialized_servers:
-                runtime_info = await self.get_runtime_info(server['name'])
-
-                server['runtime_info'] = runtime_info if runtime_info else None
-
+                session = self.ap.tool_mgr.mcp_tool_loader.get_session(execution_context, server['name'])
+                server['runtime_info'] = session.get_runtime_info_dict() if session else None
        return serialized_servers

-    async def create_mcp_server(self, server_data: dict) -> str:
-        # Check limitation (extensions = MCP servers + plugins)
+    async def create_mcp_server(self, context: TenantContext, server_data: dict) -> str:
+        execution_context = await self._execution_context(context)
+        workspace_uuid = execution_context.workspace_uuid
+
+        # This gate is independent of Box availability.  Cloud v2 disables
+        # stdio MCP even though Box Runtime itself remains available.
+        require_stdio_mcp_enabled(self.ap, server_data)
+
        limitation = self.ap.instance_config.data.get('system', {}).get('limitation', {})
        max_extensions = limitation.get('max_extensions', -1)
        if max_extensions >= 0:
-            existing_mcp_servers = await self.get_mcp_servers()
-            plugins = await self.ap.plugin_connector.list_plugins()
-            total_extensions = len(existing_mcp_servers) + len(plugins)
-            if total_extensions >= max_extensions:
+            mcp_count_result = await self.ap.persistence_mgr.execute_async(
+                sqlalchemy.select(sqlalchemy.func.count(persistence_mcp.MCPServer.uuid)).where(
+                    persistence_mcp.MCPServer.workspace_uuid == workspace_uuid
+                )
+            )
+            plugin_count_result = await self.ap.persistence_mgr.execute_async(
+                sqlalchemy.select(sqlalchemy.func.count())
+                .select_from(persistence_plugin.PluginSetting)
+                .where(persistence_plugin.PluginSetting.workspace_uuid == workspace_uuid)
+            )
+            if (mcp_count_result.scalar() or 0) + (plugin_count_result.scalar() or 0) >= max_extensions:
                raise ValueError(f'Maximum number of extensions ({max_extensions}) reached')

-        server_name = str(server_data.get('name') or '').strip()
+        payload = dict(server_data)
+        payload.pop('workspace_uuid', None)
+        server_name = str(payload.get('name') or '').strip()
        if not server_name:
            raise ValueError('MCP server name is required')
-        server_data['name'] = server_name
+        payload['name'] = server_name
+        payload['workspace_uuid'] = workspace_uuid
+        payload['uuid'] = str(uuid.uuid4())

        existing_result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_mcp.MCPServer).where(persistence_mcp.MCPServer.name == server_name)
+            sqlalchemy.select(persistence_mcp.MCPServer).where(
+                persistence_mcp.MCPServer.workspace_uuid == workspace_uuid,
+                persistence_mcp.MCPServer.name == server_name,
+            )
        )
        if existing_result.first() is not None:
            raise ValueError(f'MCP server already exists: {server_name}')

-        server_data['uuid'] = str(uuid.uuid4())
-        await self.ap.persistence_mgr.execute_async(sqlalchemy.insert(persistence_mcp.MCPServer).values(server_data))
-
-        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_mcp.MCPServer).where(persistence_mcp.MCPServer.uuid == server_data['uuid'])
-        )
-        server_entity = result.first()
-        if server_entity:
-            server_config = self.ap.persistence_mgr.serialize_model(persistence_mcp.MCPServer, server_entity)
-            if self.ap.tool_mgr.mcp_tool_loader:
-                task = asyncio.create_task(self.ap.tool_mgr.mcp_tool_loader.host_mcp_server(server_config))
+        await self.ap.persistence_mgr.execute_async(sqlalchemy.insert(persistence_mcp.MCPServer).values(payload))
+        created = await self._get_mcp_server_by_uuid_raw(execution_context, payload['uuid'])
+        if created and self.ap.tool_mgr.mcp_tool_loader:
+            task = create_detached_task(
+                self.ap.tool_mgr.mcp_tool_loader.host_mcp_server(execution_context, created),
+                after_commit_manager=self.ap.persistence_mgr,
+                workspace_uuid=execution_context.workspace_uuid,
+            )
+            tracker = getattr(
+                self.ap.tool_mgr.mcp_tool_loader,
+                'track_hosted_task',
+                None,
+            )
+            if callable(tracker):
+                tracker(task, execution_context)
+            else:
                self.ap.tool_mgr.mcp_tool_loader._hosted_mcp_tasks.append(task)
+        return payload['uuid']

-        return server_data['uuid']
+    async def get_mcp_server_by_uuid(self, context: TenantContext, server_uuid: str) -> dict | None:
+        execution_context = await self._execution_context(context)
+        server_data = await self._get_mcp_server_by_uuid_raw(execution_context, server_uuid)
+        return redact_mcp_secrets(server_data) if server_data is not None else None

-    async def get_mcp_server_by_name(self, server_name: str) -> dict | None:
+    async def _get_mcp_server_by_uuid_raw(
+        self,
+        execution_context: ExecutionContext,
+        server_uuid: str,
+    ) -> dict | None:
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_mcp.MCPServer).where(persistence_mcp.MCPServer.name == server_name)
+            scope_statement(
+                sqlalchemy.select(persistence_mcp.MCPServer).where(persistence_mcp.MCPServer.uuid == server_uuid),
+                persistence_mcp.MCPServer,
+                execution_context,
+            )
+        )
+        server = result.first()
+        return self.ap.persistence_mgr.serialize_model(persistence_mcp.MCPServer, server) if server else None
+
+    async def get_mcp_server_by_name(self, context: TenantContext, server_name: str) -> dict | None:
+        execution_context = await self._execution_context(context)
+        server_data = await self._get_mcp_server_by_name_raw(execution_context, server_name)
+        if server_data is None:
+            return None
+        session = self.ap.tool_mgr.mcp_tool_loader.get_session(execution_context, server_name)
+        response_data = {
+            **server_data,
+            'runtime_info': session.get_runtime_info_dict() if session else None,
+        }
+        return redact_mcp_secrets(response_data)
+
+    async def _get_mcp_server_by_name_raw(
+        self,
+        execution_context: ExecutionContext,
+        server_name: str,
+    ) -> dict | None:
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.select(persistence_mcp.MCPServer).where(persistence_mcp.MCPServer.name == server_name),
+                persistence_mcp.MCPServer,
+                execution_context,
+            )
        )
        server = result.first()
        if server is None:
            return None
+        return self.ap.persistence_mgr.serialize_model(persistence_mcp.MCPServer, server)

-        runtime_info = await self.get_runtime_info(server.name)
-        server_data = self.ap.persistence_mgr.serialize_model(persistence_mcp.MCPServer, server)
-        server_data['runtime_info'] = runtime_info if runtime_info else None
-        return server_data
+    async def update_mcp_server(self, context: TenantContext, server_uuid: str, server_data: dict) -> None:
+        execution_context = await self._execution_context(context)
+        old_server = await self._get_mcp_server_by_uuid_raw(execution_context, server_uuid)
+        if old_server is None:
+            raise WorkspaceNotFoundError('MCP server not found')
+
+        payload = dict(server_data)
+        payload.pop('uuid', None)
+        payload.pop('workspace_uuid', None)
+        payload = restore_mcp_secret_placeholders(payload, old_server)
+        if 'name' in payload:
+            payload['name'] = str(payload['name'] or '').strip()
+            if not payload['name']:
+                raise ValueError('MCP server name is required')
+            duplicate = await self._get_mcp_server_by_name_raw(execution_context, payload['name'])
+            if duplicate is not None and duplicate['uuid'] != server_uuid:
+                raise ValueError(f'MCP server already exists: {payload["name"]}')
+
+        effective_server = {**old_server, **payload}
+        # Existing disabled rows remain readable/deletable.  Switching away
+        # from stdio or explicitly disabling one is also allowed, but an
+        # update may never leave a disabled stdio server enabled.
+        if bool(effective_server.get('enable', True)):
+            require_stdio_mcp_enabled(self.ap, effective_server)

-    async def update_mcp_server(self, server_uuid: str, server_data: dict) -> None:
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_mcp.MCPServer).where(persistence_mcp.MCPServer.uuid == server_uuid)
+            scope_statement(
+                sqlalchemy.update(persistence_mcp.MCPServer)
+                .where(persistence_mcp.MCPServer.uuid == server_uuid)
+                .values(payload),
+                persistence_mcp.MCPServer,
+                execution_context,
+            )
        )
-        old_server = result.first()
-        old_server_name = old_server.name if old_server else None
-        old_enable = old_server.enable if old_server else False
+        if getattr(result, 'rowcount', None) == 0:
+            raise WorkspaceNotFoundError('MCP server not found')

-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.update(persistence_mcp.MCPServer)
-            .where(persistence_mcp.MCPServer.uuid == server_uuid)
-            .values(server_data)
-        )
+        loader = self.ap.tool_mgr.mcp_tool_loader
+        if loader is None:
+            return
+        old_name = old_server['name']
+        old_enable = bool(old_server['enable'])
+        updated = await self._get_mcp_server_by_uuid_raw(execution_context, server_uuid)
+        if updated is None:
+            raise WorkspaceNotFoundError('MCP server not found')
+        new_enable = bool(updated['enable'])
+        if old_enable and loader.has_session(execution_context, old_name):
+            await loader.remove_mcp_server(execution_context, old_name)
+        if new_enable:
+            task = create_detached_task(
+                loader.host_mcp_server(execution_context, updated),
+                after_commit_manager=self.ap.persistence_mgr,
+                workspace_uuid=execution_context.workspace_uuid,
+            )
+            tracker = getattr(loader, 'track_hosted_task', None)
+            if callable(tracker):
+                tracker(task, execution_context)
+            else:
+                loader._hosted_mcp_tasks.append(task)

-        if self.ap.tool_mgr.mcp_tool_loader:
-            new_enable = server_data.get('enable', False)
-
-            need_remove = old_server_name and old_server_name in self.ap.tool_mgr.mcp_tool_loader.sessions
-
-            if old_enable and not new_enable:
-                if need_remove:
-                    await self.ap.tool_mgr.mcp_tool_loader.remove_mcp_server(old_server_name)
-
-            elif not old_enable and new_enable:
-                result = await self.ap.persistence_mgr.execute_async(
-                    sqlalchemy.select(persistence_mcp.MCPServer).where(persistence_mcp.MCPServer.uuid == server_uuid)
-                )
-                updated_server = result.first()
-                if updated_server:
-                    server_config = self.ap.persistence_mgr.serialize_model(persistence_mcp.MCPServer, updated_server)
-                    task = asyncio.create_task(self.ap.tool_mgr.mcp_tool_loader.host_mcp_server(server_config))
-                    self.ap.tool_mgr.mcp_tool_loader._hosted_mcp_tasks.append(task)
-
-            elif old_enable and new_enable:
-                if need_remove:
-                    await self.ap.tool_mgr.mcp_tool_loader.remove_mcp_server(old_server_name)
-                result = await self.ap.persistence_mgr.execute_async(
-                    sqlalchemy.select(persistence_mcp.MCPServer).where(persistence_mcp.MCPServer.uuid == server_uuid)
-                )
-                updated_server = result.first()
-                if updated_server:
-                    server_config = self.ap.persistence_mgr.serialize_model(persistence_mcp.MCPServer, updated_server)
-                    task = asyncio.create_task(self.ap.tool_mgr.mcp_tool_loader.host_mcp_server(server_config))
-                    self.ap.tool_mgr.mcp_tool_loader._hosted_mcp_tasks.append(task)
-
-    async def delete_mcp_server(self, server_uuid: str) -> None:
+    async def delete_mcp_server(self, context: TenantContext, server_uuid: str) -> None:
+        execution_context = await self._execution_context(context)
+        server = await self._get_mcp_server_by_uuid_raw(execution_context, server_uuid)
+        if server is None:
+            raise WorkspaceNotFoundError('MCP server not found')
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_mcp.MCPServer).where(persistence_mcp.MCPServer.uuid == server_uuid)
+            scope_statement(
+                sqlalchemy.delete(persistence_mcp.MCPServer).where(persistence_mcp.MCPServer.uuid == server_uuid),
+                persistence_mcp.MCPServer,
+                execution_context,
+            )
        )
-        server = result.first()
-        server_name = server.name if server else None
+        if getattr(result, 'rowcount', None) == 0:
+            raise WorkspaceNotFoundError('MCP server not found')
+        loader = self.ap.tool_mgr.mcp_tool_loader
+        if loader and loader.has_session(execution_context, server['name']):
+            await loader.remove_mcp_server(execution_context, server['name'])

-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.delete(persistence_mcp.MCPServer).where(persistence_mcp.MCPServer.uuid == server_uuid)
-        )
+    async def _require_server(self, context: TenantContext, server_name: str) -> tuple[ExecutionContext, dict]:
+        execution_context = await self._execution_context(context)
+        server = await self._get_mcp_server_by_name_raw(execution_context, server_name)
+        if server is None:
+            raise WorkspaceNotFoundError('MCP server not found')
+        return execution_context, server

-        if server_name and self.ap.tool_mgr.mcp_tool_loader:
-            if server_name in self.ap.tool_mgr.mcp_tool_loader.sessions:
-                await self.ap.tool_mgr.mcp_tool_loader.remove_mcp_server(server_name)
+    async def get_mcp_server_resources(self, context: TenantContext, server_name: str) -> list[dict]:
+        execution_context, _ = await self._require_server(context, server_name)
+        return await self.ap.tool_mgr.mcp_tool_loader.get_resources(execution_context, server_name)

-    async def get_mcp_server_resources(self, server_name: str) -> list[dict]:
-        """Get resources from a specific MCP server."""
-        return await self.ap.tool_mgr.mcp_tool_loader.get_resources(server_name)
-
-    async def get_mcp_server_resource_templates(self, server_name: str) -> list[dict]:
-        """Get resource templates from a specific MCP server."""
-        return await self.ap.tool_mgr.mcp_tool_loader.get_resource_templates(server_name)
+    async def get_mcp_server_resource_templates(self, context: TenantContext, server_name: str) -> list[dict]:
+        execution_context, _ = await self._require_server(context, server_name)
+        return await self.ap.tool_mgr.mcp_tool_loader.get_resource_templates(execution_context, server_name)

    async def read_mcp_server_resource_envelope(
        self,
+        context: TenantContext,
        server_name: str,
        uri: str,
        *,
        max_bytes: int | None = None,
        include_blob: bool = False,
    ) -> dict:
-        """Read a resource from a specific MCP server with metadata."""
+        execution_context, _ = await self._require_server(context, server_name)
        kwargs = {'include_blob': include_blob, 'source': 'ui_preview'}
        if max_bytes is not None:
            kwargs['max_bytes'] = max_bytes
-        return await self.ap.tool_mgr.mcp_tool_loader.read_resource_envelope(server_name, uri, **kwargs)
+        return await self.ap.tool_mgr.mcp_tool_loader.read_resource_envelope(
+            execution_context,
+            server_name,
+            uri,
+            **kwargs,
+        )

-    async def read_mcp_server_resource(self, server_name: str, uri: str) -> list[dict]:
-        """Read a resource from a specific MCP server."""
-        return await self.ap.tool_mgr.mcp_tool_loader.read_resource(server_name, uri)
-
-    async def test_mcp_server(self, server_name: str, server_data: dict) -> int:
-        """测试 MCP 服务器连接并返回任务 ID"""
+    async def read_mcp_server_resource(self, context: TenantContext, server_name: str, uri: str) -> list[dict]:
+        execution_context, _ = await self._require_server(context, server_name)
+        return await self.ap.tool_mgr.mcp_tool_loader.read_resource(execution_context, server_name, uri)

+    async def test_mcp_server(self, context: TenantContext, server_name: str, server_data: dict) -> int:
+        execution_context = await self._execution_context(context)
        runtime_mcp_session: RuntimeMCPSession | None = None
-
+        test_session: RuntimeMCPSession | None = None
        ctx = taskmgr.TaskContext.new()

        if server_name != '_':
-            runtime_mcp_session = self.ap.tool_mgr.mcp_tool_loader.get_session(server_name)
+            _, persisted_server = await self._require_server(execution_context, server_name)
+            require_stdio_mcp_enabled(self.ap, persisted_server)
+            runtime_mcp_session = self.ap.tool_mgr.mcp_tool_loader.get_session(execution_context, server_name)
            if runtime_mcp_session is None:
-                raise ValueError(f'Server not found: {server_name}')
-
+                raise WorkspaceNotFoundError('MCP server not found')
            persisted_session = runtime_mcp_session

            async def _refresh_and_report() -> None:
-                # Testing a persisted server should REUSE its live shared-session
-                # process, not rebuild it. Try a lightweight refresh (a real
-                # list_tools probe over the existing connection) first; only fall
-                # back to a full start() when the session has no live connection
-                # to probe (never connected, or the process is actually gone).
                needs_start = persisted_session.status == MCPSessionStatus.ERROR or persisted_session.session is None
                if needs_start:
                    await persisted_session.start()
@@ -200,30 +453,24 @@ class MCPService:
                    try:
                        await persisted_session.refresh()
                    except Exception:
-                        # The live connection was stale/dropped: reconnect once
-                        # (reusing the live managed process where possible) and
-                        # re-probe, instead of reporting a false failure.
                        await persisted_session.start()
-                # Surface the discovered tools so the config page can render them
-                # even for an already-hosted server.
                ctx.metadata['runtime_info'] = persisted_session.get_runtime_info_dict()

            coroutine = _refresh_and_report()
        else:
-            runtime_mcp_session = await self.ap.tool_mgr.mcp_tool_loader.load_mcp_server(server_config=server_data)
-
-            # A transient test owns an isolated Box session. Always tear it down
-            # after the test completes (success or failure) so it does not leak.
+            payload = dict(server_data)
+            payload.pop('workspace_uuid', None)
+            payload['workspace_uuid'] = execution_context.workspace_uuid
+            require_stdio_mcp_enabled(self.ap, payload)
+            runtime_mcp_session = await self.ap.tool_mgr.mcp_tool_loader.load_mcp_server(
+                execution_context,
+                payload,
+            )
            test_session = runtime_mcp_session

            async def _run_and_cleanup() -> None:
                try:
                    await test_session.start()
-                    # Capture the runtime info (status + discovered tools) BEFORE
-                    # shutting the transient session down. The create/edit config
-                    # page has no persisted server to reload from, so without this
-                    # a successful test could only show "no tools found". The
-                    # frontend reads ctx.metadata.runtime_info to render the tools.
                    ctx.metadata['runtime_info'] = test_session.get_runtime_info_dict()
                finally:
                    try:
@@ -236,27 +483,41 @@ class MCPService:

            coroutine = _run_and_cleanup()

-        wrapper = self.ap.task_mgr.create_user_task(
-            coroutine,
-            kind='mcp-operation',
-            name=f'mcp-test-{server_name}',
-            label=f'Testing MCP server {server_name}',
-            context=ctx,
-        )
+        try:
+            wrapper = self.ap.task_mgr.create_user_task(
+                coroutine,
+                kind='mcp-operation',
+                name=f'mcp-test-{execution_context.workspace_uuid}-{server_name}',
+                label=f'Testing MCP server {server_name}',
+                context=ctx,
+                instance_uuid=execution_context.instance_uuid,
+                workspace_uuid=execution_context.workspace_uuid,
+                placement_generation=execution_context.placement_generation,
+            )
+        except taskmgr.TaskCapacityError:
+            if test_session is not None:
+                try:
+                    await test_session.shutdown()
+                except Exception as exc:
+                    self.ap.logger.warning(
+                        f'Failed to tear down rejected transient MCP test session '
+                        f'{test_session.server_name}: {type(exc).__name__}: {exc}'
+                    )
+            raise
        return wrapper.id

-    async def get_mcp_server_logs(self, server_name: str, limit: int = 200, level: str | None = None) -> list[dict]:
-        """Get recent log lines captured from the MCP server's stderr."""
-        session = self.ap.tool_mgr.mcp_tool_loader.get_session(server_name)
+    async def get_mcp_server_logs(
+        self,
+        context: TenantContext,
+        server_name: str,
+        limit: int = 200,
+        level: str | None = None,
+    ) -> list[dict]:
+        execution_context, _ = await self._require_server(context, server_name)
+        session = self.ap.tool_mgr.mcp_tool_loader.get_session(execution_context, server_name)
        if not session:
            return []
-
-        # Get logs from the session's buffer
        logs = list(session._log_buffer)
-
-        # Filter by level if specified
        if level:
            logs = [log for log in logs if log.get('level') == level]
-
-        # Return the most recent 'limit' logs
        return logs[-limit:]
@@ -9,6 +9,9 @@ from ....core import app
 from ....entity.persistence import model as persistence_model
 from ....entity.persistence import pipeline as persistence_pipeline
 from ....provider.modelmgr import requester as model_requester
+from ....workspace.errors import WorkspaceNotFoundError
+from .secrets import mask_secret_value, redact_secrets, restore_secret_placeholders
+from .tenant import TenantContext, require_workspace_uuid, scope_statement


 def _parse_provider_api_keys(provider_dict: dict) -> dict:
@@ -34,7 +37,29 @@ def _runtime_model_data(model_uuid: str, model_data: dict) -> dict:
    return {**model_data, 'uuid': model_uuid}


-async def _validate_provider_supports(ap: app.Application, provider_uuid: str, model_type: str) -> None:
+def _redact_model_secrets(model_data: dict) -> dict:
+    """Return a copy with model args and embedded provider credentials masked."""
+
+    redacted = model_data.copy()
+    if 'extra_args' in redacted:
+        redacted['extra_args'] = redact_secrets(redacted['extra_args'])
+    if isinstance(redacted.get('provider'), dict):
+        provider = redacted['provider'].copy()
+        # ModelProvider never contains another provider. Dropping this key also
+        # makes the serializer robust to a reused/self-referential test double.
+        provider.pop('provider', None)
+        if 'api_keys' in provider:
+            provider['api_keys'] = mask_secret_value(provider['api_keys'])
+        redacted['provider'] = provider
+    return redacted
+
+
+async def _validate_provider_supports(
+    ap: app.Application,
+    context: TenantContext,
+    provider_uuid: str,
+    model_type: str,
+) -> None:
    """Validate that the provider's requester declares support for ``model_type``.

    ``model_type`` is one of the manifest ``support_type`` values:
@@ -47,11 +72,12 @@ async def _validate_provider_supports(ap: app.Application, provider_uuid: str, m
    if model_mgr is None:
        return

-    provider_dict = getattr(model_mgr, 'provider_dict', None)
-    if not provider_dict:
+    get_provider = getattr(model_mgr, 'get_provider_by_uuid', None)
+    if not callable(get_provider):
        return
-    runtime_provider = provider_dict.get(provider_uuid)
-    if runtime_provider is None:
+    try:
+        runtime_provider = await get_provider(context, provider_uuid)
+    except ValueError:
        return

    requester_name = getattr(getattr(runtime_provider, 'provider_entity', None), 'requester', None)
@@ -74,20 +100,48 @@ async def _validate_provider_supports(ap: app.Application, provider_uuid: str, m
        raise ValueError(f'Provider requester "{requester_name}" does not support {model_type} models')


+async def _require_workspace_provider(
+    ap: app.Application,
+    context: TenantContext,
+    provider_uuid: str,
+) -> dict:
+    """Require the referenced provider to belong to the active Workspace."""
+
+    provider = await ap.provider_service.get_provider(context, provider_uuid)
+    if provider is None:
+        raise WorkspaceNotFoundError('Provider not found')
+    return provider
+
+
+async def _require_runtime_provider(
+    ap: app.Application,
+    context: TenantContext,
+    provider_uuid: str,
+) -> model_requester.RuntimeProvider:
+    try:
+        return await ap.model_mgr.get_provider_by_uuid(context, provider_uuid)
+    except ValueError as exc:
+        raise Exception('provider not found') from exc
+
+
 class LLMModelsService:
    ap: app.Application

    def __init__(self, ap: app.Application) -> None:
        self.ap = ap

-    async def get_llm_models(self, include_secret: bool = True) -> list[dict]:
+    async def get_llm_models(self, context: TenantContext, include_secret: bool = False) -> list[dict]:
        """Get all LLM models with provider info"""
-        result = await self.ap.persistence_mgr.execute_async(sqlalchemy.select(persistence_model.LLMModel))
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(sqlalchemy.select(persistence_model.LLMModel), persistence_model.LLMModel, context)
+        )
        models = result.all()

        # Get all providers for lookup
        providers_result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_model.ModelProvider)
+            scope_statement(
+                sqlalchemy.select(persistence_model.ModelProvider), persistence_model.ModelProvider, context
+            )
        )
        providers = {p.uuid: p for p in providers_result.all()}

@@ -98,29 +152,50 @@ class LLMModelsService:
            if provider:
                provider_dict = self.ap.persistence_mgr.serialize_model(persistence_model.ModelProvider, provider)
                provider_dict = _parse_provider_api_keys(provider_dict)
-                if not include_secret:
-                    provider_dict['api_keys'] = ['***'] * len(provider_dict.get('api_keys', []))
                model_dict['provider'] = provider_dict
+            if not include_secret:
+                model_dict = _redact_model_secrets(model_dict)
            models_list.append(model_dict)

        return models_list

-    async def get_llm_models_by_provider(self, provider_uuid: str) -> list[dict]:
+    async def get_llm_models_by_provider(
+        self,
+        context: TenantContext,
+        provider_uuid: str,
+        *,
+        include_secret: bool = False,
+    ) -> list[dict]:
        """Get LLM models by provider UUID"""
+        await _require_workspace_provider(self.ap, context, provider_uuid)
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_model.LLMModel).where(
-                persistence_model.LLMModel.provider_uuid == provider_uuid
+            scope_statement(
+                sqlalchemy.select(persistence_model.LLMModel).where(
+                    persistence_model.LLMModel.provider_uuid == provider_uuid
+                ),
+                persistence_model.LLMModel,
+                context,
            )
        )
        models = result.all()
-        return [self.ap.persistence_mgr.serialize_model(persistence_model.LLMModel, m) for m in models]
+        serialized = [self.ap.persistence_mgr.serialize_model(persistence_model.LLMModel, m) for m in models]
+        return serialized if include_secret else [_redact_model_secrets(model) for model in serialized]

    async def create_llm_model(
-        self, model_data: dict, preserve_uuid: bool = False, auto_set_to_default_pipeline: bool = True
+        self,
+        context: TenantContext,
+        model_data: dict,
+        preserve_uuid: bool = False,
+        auto_set_to_default_pipeline: bool = True,
    ) -> str:
        """Create a new LLM model"""
+        workspace_uuid = require_workspace_uuid(context)
+        model_data = model_data.copy()
        if not preserve_uuid:
            model_data['uuid'] = str(uuid.uuid4())
+        model_data['workspace_uuid'] = workspace_uuid
+        if 'extra_args' in model_data:
+            model_data['extra_args'] = restore_secret_placeholders(model_data['extra_args'])

        # Handle provider creation if needed
        if 'provider' in model_data:
@@ -130,31 +205,35 @@ class LLMModelsService:
            else:
                # Create new provider
                provider_uuid = await self.ap.provider_service.find_or_create_provider(
+                    context,
                    requester=provider_data.get('requester', ''),
                    base_url=provider_data.get('base_url', ''),
                    api_keys=provider_data.get('api_keys', []),
                )
                model_data['provider_uuid'] = provider_uuid

-        await _validate_provider_supports(self.ap, model_data['provider_uuid'], 'llm')
+        await _require_workspace_provider(self.ap, context, model_data['provider_uuid'])
+        await _validate_provider_supports(self.ap, context, model_data['provider_uuid'], 'llm')

        await self.ap.persistence_mgr.execute_async(sqlalchemy.insert(persistence_model.LLMModel).values(**model_data))

-        runtime_provider = self.ap.model_mgr.provider_dict.get(model_data['provider_uuid'])
-        if runtime_provider is None:
-            raise Exception('provider not found')
-
+        runtime_provider = await _require_runtime_provider(self.ap, context, model_data['provider_uuid'])
        runtime_llm_model = await self.ap.model_mgr.load_llm_model_with_provider(
+            context,
            persistence_model.LLMModel(**model_data),
            runtime_provider,
        )
-        self.ap.model_mgr.llm_models.append(runtime_llm_model)
+        await self.ap.model_mgr.cache_llm_model(context, runtime_llm_model)

        if auto_set_to_default_pipeline:
            # set the default pipeline model to this model
            result = await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.select(persistence_pipeline.LegacyPipeline).where(
-                    persistence_pipeline.LegacyPipeline.is_default == True
+                scope_statement(
+                    sqlalchemy.select(persistence_pipeline.LegacyPipeline).where(
+                        persistence_pipeline.LegacyPipeline.is_default == True
+                    ),
+                    persistence_pipeline.LegacyPipeline,
+                    workspace_uuid,
                )
            )
            pipeline = result.first()
@@ -167,14 +246,23 @@ class LLMModelsService:
                        'fallbacks': [],
                    }
                    pipeline_data = {'config': pipeline_config}
-                    await self.ap.pipeline_service.update_pipeline(pipeline.uuid, pipeline_data)
+                    await self.ap.pipeline_service.update_pipeline(context, pipeline.uuid, pipeline_data)

        return model_data['uuid']

-    async def get_llm_model(self, model_uuid: str) -> dict | None:
+    async def get_llm_model(
+        self,
+        context: TenantContext,
+        model_uuid: str,
+        include_secret: bool = False,
+    ) -> dict | None:
        """Get a single LLM model with provider info"""
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_model.LLMModel).where(persistence_model.LLMModel.uuid == model_uuid)
+            scope_statement(
+                sqlalchemy.select(persistence_model.LLMModel).where(persistence_model.LLMModel.uuid == model_uuid),
+                persistence_model.LLMModel,
+                context,
+            )
        )
        model = result.first()
        if model is None:
@@ -184,21 +272,38 @@ class LLMModelsService:

        # Get provider
        provider_result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_model.ModelProvider).where(
-                persistence_model.ModelProvider.uuid == model.provider_uuid
+            scope_statement(
+                sqlalchemy.select(persistence_model.ModelProvider).where(
+                    persistence_model.ModelProvider.uuid == model.provider_uuid
+                ),
+                persistence_model.ModelProvider,
+                context,
            )
        )
        provider = provider_result.first()
        if provider:
            provider_dict = self.ap.persistence_mgr.serialize_model(persistence_model.ModelProvider, provider)
-            model_dict['provider'] = _parse_provider_api_keys(provider_dict)
+            provider_dict = _parse_provider_api_keys(provider_dict)
+            model_dict['provider'] = provider_dict
+
+        if not include_secret:
+            model_dict = _redact_model_secrets(model_dict)

        return model_dict

-    async def update_llm_model(self, model_uuid: str, model_data: dict) -> None:
+    async def update_llm_model(self, context: TenantContext, model_uuid: str, model_data: dict) -> None:
        """Update an existing LLM model"""
-        if 'uuid' in model_data:
-            del model_data['uuid']
+        existing_model = await self.get_llm_model(context, model_uuid, include_secret=True)
+        if existing_model is None:
+            raise WorkspaceNotFoundError('Model not found')
+        model_data = model_data.copy()
+        model_data.pop('uuid', None)
+        model_data.pop('workspace_uuid', None)
+        if 'extra_args' in model_data:
+            model_data['extra_args'] = restore_secret_placeholders(
+                model_data['extra_args'],
+                existing_model.get('extra_args', {}),
+            )

        # Handle provider update if needed
        if 'provider' in model_data:
@@ -207,50 +312,71 @@ class LLMModelsService:
                model_data['provider_uuid'] = provider_data['uuid']
            else:
                provider_uuid = await self.ap.provider_service.find_or_create_provider(
+                    context,
                    requester=provider_data.get('requester', ''),
                    base_url=provider_data.get('base_url', ''),
                    api_keys=provider_data.get('api_keys', []),
                )
                model_data['provider_uuid'] = provider_uuid

-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.update(persistence_model.LLMModel)
-            .where(persistence_model.LLMModel.uuid == model_uuid)
-            .values(**model_data)
+        provider_uuid = model_data.get('provider_uuid', existing_model['provider_uuid'])
+        await _require_workspace_provider(self.ap, context, provider_uuid)
+        await _validate_provider_supports(self.ap, context, provider_uuid, 'llm')
+
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.update(persistence_model.LLMModel)
+                .where(persistence_model.LLMModel.uuid == model_uuid)
+                .values(**model_data),
+                persistence_model.LLMModel,
+                context,
+            )
        )
+        if getattr(result, 'rowcount', None) == 0:
+            raise WorkspaceNotFoundError('Model not found')

-        await self.ap.model_mgr.remove_llm_model(model_uuid)
-
-        runtime_provider = self.ap.model_mgr.provider_dict.get(model_data['provider_uuid'])
-        if runtime_provider is None:
-            raise Exception('provider not found')
-
+        await self.ap.model_mgr.remove_llm_model(context, model_uuid)
+        runtime_provider = await _require_runtime_provider(self.ap, context, provider_uuid)
        runtime_llm_model = await self.ap.model_mgr.load_llm_model_with_provider(
-            persistence_model.LLMModel(**_runtime_model_data(model_uuid, model_data)),
+            context,
+            persistence_model.LLMModel(
+                **_runtime_model_data(
+                    model_uuid,
+                    {
+                        key: value
+                        for key, value in {**existing_model, **model_data, 'provider_uuid': provider_uuid}.items()
+                        if key not in {'provider', 'created_at', 'updated_at'}
+                    },
+                )
+            ),
            runtime_provider,
        )
-        self.ap.model_mgr.llm_models.append(runtime_llm_model)
+        await self.ap.model_mgr.cache_llm_model(context, runtime_llm_model)

-    async def delete_llm_model(self, model_uuid: str) -> None:
+    async def delete_llm_model(self, context: TenantContext, model_uuid: str) -> None:
        """Delete an LLM model"""
-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.delete(persistence_model.LLMModel).where(persistence_model.LLMModel.uuid == model_uuid)
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.delete(persistence_model.LLMModel).where(persistence_model.LLMModel.uuid == model_uuid),
+                persistence_model.LLMModel,
+                context,
+            )
        )
-        await self.ap.model_mgr.remove_llm_model(model_uuid)
+        if getattr(result, 'rowcount', None) == 0:
+            raise WorkspaceNotFoundError('Model not found')
+        await self.ap.model_mgr.remove_llm_model(context, model_uuid)

-    async def test_llm_model(self, model_uuid: str, model_data: dict) -> None:
+    async def test_llm_model(self, context: TenantContext, model_uuid: str, model_data: dict) -> None:
        """Test an LLM model"""
+        require_workspace_uuid(context)
        runtime_llm_model: model_requester.RuntimeLLMModel | None = None

        if model_uuid != '_':
-            for model in self.ap.model_mgr.llm_models:
-                if model.model_entity.uuid == model_uuid:
-                    runtime_llm_model = model
-                    break
-            if runtime_llm_model is None:
-                raise Exception('model not found')
+            if await self.get_llm_model(context, model_uuid) is None:
+                raise WorkspaceNotFoundError('Model not found')
+            runtime_llm_model = await self.ap.model_mgr.get_model_by_uuid(context, model_uuid)
        else:
-            runtime_llm_model = await self.ap.model_mgr.init_temporary_runtime_llm_model(model_data)
+            runtime_llm_model = await self.ap.model_mgr.init_temporary_runtime_llm_model(context, model_data)

        extra_args = model_data.get('extra_args', {})
        await runtime_llm_model.provider.invoke_llm(
@@ -259,6 +385,7 @@ class LLMModelsService:
            messages=[provider_message.Message(role='user', content='Hello, world! Please just reply a "Hello".')],
            funcs=[],
            extra_args=extra_args,
+            execution_context=runtime_llm_model.execution_context,
        )


@@ -268,13 +395,19 @@ class EmbeddingModelsService:
    def __init__(self, ap: app.Application) -> None:
        self.ap = ap

-    async def get_embedding_models(self) -> list[dict]:
+    async def get_embedding_models(self, context: TenantContext, include_secret: bool = False) -> list[dict]:
        """Get all embedding models with provider info"""
-        result = await self.ap.persistence_mgr.execute_async(sqlalchemy.select(persistence_model.EmbeddingModel))
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.select(persistence_model.EmbeddingModel), persistence_model.EmbeddingModel, context
+            )
+        )
        models = result.all()

        providers_result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_model.ModelProvider)
+            scope_statement(
+                sqlalchemy.select(persistence_model.ModelProvider), persistence_model.ModelProvider, context
+            )
        )
        providers = {p.uuid: p for p in providers_result.all()}

@@ -284,25 +417,46 @@ class EmbeddingModelsService:
            provider = providers.get(model.provider_uuid)
            if provider:
                provider_dict = self.ap.persistence_mgr.serialize_model(persistence_model.ModelProvider, provider)
-                model_dict['provider'] = _parse_provider_api_keys(provider_dict)
+                provider_dict = _parse_provider_api_keys(provider_dict)
+                model_dict['provider'] = provider_dict
+            if not include_secret:
+                model_dict = _redact_model_secrets(model_dict)
            models_list.append(model_dict)

        return models_list

-    async def get_embedding_models_by_provider(self, provider_uuid: str) -> list[dict]:
+    async def get_embedding_models_by_provider(
+        self,
+        context: TenantContext,
+        provider_uuid: str,
+        *,
+        include_secret: bool = False,
+    ) -> list[dict]:
        """Get embedding models by provider UUID"""
+        await _require_workspace_provider(self.ap, context, provider_uuid)
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_model.EmbeddingModel).where(
-                persistence_model.EmbeddingModel.provider_uuid == provider_uuid
+            scope_statement(
+                sqlalchemy.select(persistence_model.EmbeddingModel).where(
+                    persistence_model.EmbeddingModel.provider_uuid == provider_uuid
+                ),
+                persistence_model.EmbeddingModel,
+                context,
            )
        )
        models = result.all()
-        return [self.ap.persistence_mgr.serialize_model(persistence_model.EmbeddingModel, m) for m in models]
+        serialized = [self.ap.persistence_mgr.serialize_model(persistence_model.EmbeddingModel, m) for m in models]
+        return serialized if include_secret else [_redact_model_secrets(model) for model in serialized]

-    async def create_embedding_model(self, model_data: dict, preserve_uuid: bool = False) -> str:
+    async def create_embedding_model(
+        self, context: TenantContext, model_data: dict, preserve_uuid: bool = False
+    ) -> str:
        """Create a new embedding model"""
+        model_data = model_data.copy()
        if not preserve_uuid:
            model_data['uuid'] = str(uuid.uuid4())
+        model_data['workspace_uuid'] = require_workspace_uuid(context)
+        if 'extra_args' in model_data:
+            model_data['extra_args'] = restore_secret_placeholders(model_data['extra_args'])

        if 'provider' in model_data:
            provider_data = model_data.pop('provider')
@@ -310,35 +464,44 @@ class EmbeddingModelsService:
                model_data['provider_uuid'] = provider_data['uuid']
            else:
                provider_uuid = await self.ap.provider_service.find_or_create_provider(
+                    context,
                    requester=provider_data.get('requester', ''),
                    base_url=provider_data.get('base_url', ''),
                    api_keys=provider_data.get('api_keys', []),
                )
                model_data['provider_uuid'] = provider_uuid

-        await _validate_provider_supports(self.ap, model_data['provider_uuid'], 'text-embedding')
+        await _require_workspace_provider(self.ap, context, model_data['provider_uuid'])
+        await _validate_provider_supports(self.ap, context, model_data['provider_uuid'], 'text-embedding')

        await self.ap.persistence_mgr.execute_async(
            sqlalchemy.insert(persistence_model.EmbeddingModel).values(**model_data)
        )

-        runtime_provider = self.ap.model_mgr.provider_dict.get(model_data['provider_uuid'])
-        if runtime_provider is None:
-            raise Exception('provider not found')
-
+        runtime_provider = await _require_runtime_provider(self.ap, context, model_data['provider_uuid'])
        runtime_embedding_model = await self.ap.model_mgr.load_embedding_model_with_provider(
+            context,
            persistence_model.EmbeddingModel(**model_data),
            runtime_provider,
        )
-        self.ap.model_mgr.embedding_models.append(runtime_embedding_model)
+        await self.ap.model_mgr.cache_embedding_model(context, runtime_embedding_model)

        return model_data['uuid']

-    async def get_embedding_model(self, model_uuid: str) -> dict | None:
+    async def get_embedding_model(
+        self,
+        context: TenantContext,
+        model_uuid: str,
+        include_secret: bool = False,
+    ) -> dict | None:
        """Get a single embedding model with provider info"""
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_model.EmbeddingModel).where(
-                persistence_model.EmbeddingModel.uuid == model_uuid
+            scope_statement(
+                sqlalchemy.select(persistence_model.EmbeddingModel).where(
+                    persistence_model.EmbeddingModel.uuid == model_uuid
+                ),
+                persistence_model.EmbeddingModel,
+                context,
            )
        )
        model = result.first()
@@ -348,21 +511,38 @@ class EmbeddingModelsService:
        model_dict = self.ap.persistence_mgr.serialize_model(persistence_model.EmbeddingModel, model)

        provider_result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_model.ModelProvider).where(
-                persistence_model.ModelProvider.uuid == model.provider_uuid
+            scope_statement(
+                sqlalchemy.select(persistence_model.ModelProvider).where(
+                    persistence_model.ModelProvider.uuid == model.provider_uuid
+                ),
+                persistence_model.ModelProvider,
+                context,
            )
        )
        provider = provider_result.first()
        if provider:
            provider_dict = self.ap.persistence_mgr.serialize_model(persistence_model.ModelProvider, provider)
-            model_dict['provider'] = _parse_provider_api_keys(provider_dict)
+            provider_dict = _parse_provider_api_keys(provider_dict)
+            model_dict['provider'] = provider_dict
+
+        if not include_secret:
+            model_dict = _redact_model_secrets(model_dict)

        return model_dict

-    async def update_embedding_model(self, model_uuid: str, model_data: dict) -> None:
+    async def update_embedding_model(self, context: TenantContext, model_uuid: str, model_data: dict) -> None:
        """Update an existing embedding model"""
-        if 'uuid' in model_data:
-            del model_data['uuid']
+        existing_model = await self.get_embedding_model(context, model_uuid, include_secret=True)
+        if existing_model is None:
+            raise WorkspaceNotFoundError('Model not found')
+        model_data = model_data.copy()
+        model_data.pop('uuid', None)
+        model_data.pop('workspace_uuid', None)
+        if 'extra_args' in model_data:
+            model_data['extra_args'] = restore_secret_placeholders(
+                model_data['extra_args'],
+                existing_model.get('extra_args', {}),
+            )

        if 'provider' in model_data:
            provider_data = model_data.pop('provider')
@@ -370,57 +550,82 @@ class EmbeddingModelsService:
                model_data['provider_uuid'] = provider_data['uuid']
            else:
                provider_uuid = await self.ap.provider_service.find_or_create_provider(
+                    context,
                    requester=provider_data.get('requester', ''),
                    base_url=provider_data.get('base_url', ''),
                    api_keys=provider_data.get('api_keys', []),
                )
                model_data['provider_uuid'] = provider_uuid

-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.update(persistence_model.EmbeddingModel)
-            .where(persistence_model.EmbeddingModel.uuid == model_uuid)
-            .values(**model_data)
-        )
+        provider_uuid = model_data.get('provider_uuid', existing_model['provider_uuid'])
+        await _require_workspace_provider(self.ap, context, provider_uuid)
+        await _validate_provider_supports(self.ap, context, provider_uuid, 'text-embedding')

-        await self.ap.model_mgr.remove_embedding_model(model_uuid)
-
-        runtime_provider = self.ap.model_mgr.provider_dict.get(model_data['provider_uuid'])
-        if runtime_provider is None:
-            raise Exception('provider not found')
-
-        runtime_embedding_model = await self.ap.model_mgr.load_embedding_model_with_provider(
-            persistence_model.EmbeddingModel(**_runtime_model_data(model_uuid, model_data)),
-            runtime_provider,
-        )
-        self.ap.model_mgr.embedding_models.append(runtime_embedding_model)
-
-    async def delete_embedding_model(self, model_uuid: str) -> None:
-        """Delete an embedding model"""
-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.delete(persistence_model.EmbeddingModel).where(
-                persistence_model.EmbeddingModel.uuid == model_uuid
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.update(persistence_model.EmbeddingModel)
+                .where(persistence_model.EmbeddingModel.uuid == model_uuid)
+                .values(**model_data),
+                persistence_model.EmbeddingModel,
+                context,
            )
        )
-        await self.ap.model_mgr.remove_embedding_model(model_uuid)
+        if getattr(result, 'rowcount', None) == 0:
+            raise WorkspaceNotFoundError('Model not found')

-    async def test_embedding_model(self, model_uuid: str, model_data: dict) -> None:
+        await self.ap.model_mgr.remove_embedding_model(context, model_uuid)
+        runtime_provider = await _require_runtime_provider(self.ap, context, provider_uuid)
+        runtime_embedding_model = await self.ap.model_mgr.load_embedding_model_with_provider(
+            context,
+            persistence_model.EmbeddingModel(
+                **_runtime_model_data(
+                    model_uuid,
+                    {
+                        key: value
+                        for key, value in {**existing_model, **model_data, 'provider_uuid': provider_uuid}.items()
+                        if key not in {'provider', 'created_at', 'updated_at'}
+                    },
+                )
+            ),
+            runtime_provider,
+        )
+        await self.ap.model_mgr.cache_embedding_model(context, runtime_embedding_model)
+
+    async def delete_embedding_model(self, context: TenantContext, model_uuid: str) -> None:
+        """Delete an embedding model"""
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.delete(persistence_model.EmbeddingModel).where(
+                    persistence_model.EmbeddingModel.uuid == model_uuid
+                ),
+                persistence_model.EmbeddingModel,
+                context,
+            )
+        )
+        if getattr(result, 'rowcount', None) == 0:
+            raise WorkspaceNotFoundError('Model not found')
+        await self.ap.model_mgr.remove_embedding_model(context, model_uuid)
+
+    async def test_embedding_model(self, context: TenantContext, model_uuid: str, model_data: dict) -> None:
        """Test an embedding model"""
+        require_workspace_uuid(context)
        runtime_embedding_model: model_requester.RuntimeEmbeddingModel | None = None

        if model_uuid != '_':
-            for model in self.ap.model_mgr.embedding_models:
-                if model.model_entity.uuid == model_uuid:
-                    runtime_embedding_model = model
-                    break
-            if runtime_embedding_model is None:
-                raise Exception('model not found')
+            if await self.get_embedding_model(context, model_uuid) is None:
+                raise WorkspaceNotFoundError('Model not found')
+            runtime_embedding_model = await self.ap.model_mgr.get_embedding_model_by_uuid(context, model_uuid)
        else:
-            runtime_embedding_model = await self.ap.model_mgr.init_temporary_runtime_embedding_model(model_data)
+            runtime_embedding_model = await self.ap.model_mgr.init_temporary_runtime_embedding_model(
+                context,
+                model_data,
+            )

        await runtime_embedding_model.provider.invoke_embedding(
            model=runtime_embedding_model,
            input_text=['Hello, world!'],
            extra_args={},
+            execution_context=runtime_embedding_model.execution_context,
        )


@@ -430,13 +635,17 @@ class RerankModelsService:
    def __init__(self, ap: app.Application) -> None:
        self.ap = ap

-    async def get_rerank_models(self) -> list[dict]:
+    async def get_rerank_models(self, context: TenantContext, include_secret: bool = False) -> list[dict]:
        """Get all rerank models with provider info"""
-        result = await self.ap.persistence_mgr.execute_async(sqlalchemy.select(persistence_model.RerankModel))
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(sqlalchemy.select(persistence_model.RerankModel), persistence_model.RerankModel, context)
+        )
        models = result.all()

        providers_result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_model.ModelProvider)
+            scope_statement(
+                sqlalchemy.select(persistence_model.ModelProvider), persistence_model.ModelProvider, context
+            )
        )
        providers = {p.uuid: p for p in providers_result.all()}

@@ -446,25 +655,44 @@ class RerankModelsService:
            provider = providers.get(model.provider_uuid)
            if provider:
                provider_dict = self.ap.persistence_mgr.serialize_model(persistence_model.ModelProvider, provider)
-                model_dict['provider'] = _parse_provider_api_keys(provider_dict)
+                provider_dict = _parse_provider_api_keys(provider_dict)
+                model_dict['provider'] = provider_dict
+            if not include_secret:
+                model_dict = _redact_model_secrets(model_dict)
            models_list.append(model_dict)

        return models_list

-    async def get_rerank_models_by_provider(self, provider_uuid: str) -> list[dict]:
+    async def get_rerank_models_by_provider(
+        self,
+        context: TenantContext,
+        provider_uuid: str,
+        *,
+        include_secret: bool = False,
+    ) -> list[dict]:
        """Get rerank models by provider UUID"""
+        await _require_workspace_provider(self.ap, context, provider_uuid)
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_model.RerankModel).where(
-                persistence_model.RerankModel.provider_uuid == provider_uuid
+            scope_statement(
+                sqlalchemy.select(persistence_model.RerankModel).where(
+                    persistence_model.RerankModel.provider_uuid == provider_uuid
+                ),
+                persistence_model.RerankModel,
+                context,
            )
        )
        models = result.all()
-        return [self.ap.persistence_mgr.serialize_model(persistence_model.RerankModel, m) for m in models]
+        serialized = [self.ap.persistence_mgr.serialize_model(persistence_model.RerankModel, m) for m in models]
+        return serialized if include_secret else [_redact_model_secrets(model) for model in serialized]

-    async def create_rerank_model(self, model_data: dict, preserve_uuid: bool = False) -> str:
+    async def create_rerank_model(self, context: TenantContext, model_data: dict, preserve_uuid: bool = False) -> str:
        """Create a new rerank model"""
+        model_data = model_data.copy()
        if not preserve_uuid:
            model_data['uuid'] = str(uuid.uuid4())
+        model_data['workspace_uuid'] = require_workspace_uuid(context)
+        if 'extra_args' in model_data:
+            model_data['extra_args'] = restore_secret_placeholders(model_data['extra_args'])

        if 'provider' in model_data:
            provider_data = model_data.pop('provider')
@@ -472,34 +700,45 @@ class RerankModelsService:
                model_data['provider_uuid'] = provider_data['uuid']
            else:
                provider_uuid = await self.ap.provider_service.find_or_create_provider(
+                    context,
                    requester=provider_data.get('requester', ''),
                    base_url=provider_data.get('base_url', ''),
                    api_keys=provider_data.get('api_keys', []),
                )
                model_data['provider_uuid'] = provider_uuid

-        await _validate_provider_supports(self.ap, model_data['provider_uuid'], 'rerank')
+        await _require_workspace_provider(self.ap, context, model_data['provider_uuid'])
+        await _validate_provider_supports(self.ap, context, model_data['provider_uuid'], 'rerank')

        await self.ap.persistence_mgr.execute_async(
            sqlalchemy.insert(persistence_model.RerankModel).values(**model_data)
        )

-        runtime_provider = self.ap.model_mgr.provider_dict.get(model_data['provider_uuid'])
-        if runtime_provider is None:
-            raise Exception('provider not found')
-
+        runtime_provider = await _require_runtime_provider(self.ap, context, model_data['provider_uuid'])
        runtime_rerank_model = await self.ap.model_mgr.load_rerank_model_with_provider(
+            context,
            persistence_model.RerankModel(**model_data),
            runtime_provider,
        )
-        self.ap.model_mgr.rerank_models.append(runtime_rerank_model)
+        await self.ap.model_mgr.cache_rerank_model(context, runtime_rerank_model)

        return model_data['uuid']

-    async def get_rerank_model(self, model_uuid: str) -> dict | None:
+    async def get_rerank_model(
+        self,
+        context: TenantContext,
+        model_uuid: str,
+        include_secret: bool = False,
+    ) -> dict | None:
        """Get a single rerank model with provider info"""
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_model.RerankModel).where(persistence_model.RerankModel.uuid == model_uuid)
+            scope_statement(
+                sqlalchemy.select(persistence_model.RerankModel).where(
+                    persistence_model.RerankModel.uuid == model_uuid
+                ),
+                persistence_model.RerankModel,
+                context,
+            )
        )
        model = result.first()
        if model is None:
@@ -508,21 +747,38 @@ class RerankModelsService:
        model_dict = self.ap.persistence_mgr.serialize_model(persistence_model.RerankModel, model)

        provider_result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_model.ModelProvider).where(
-                persistence_model.ModelProvider.uuid == model.provider_uuid
+            scope_statement(
+                sqlalchemy.select(persistence_model.ModelProvider).where(
+                    persistence_model.ModelProvider.uuid == model.provider_uuid
+                ),
+                persistence_model.ModelProvider,
+                context,
            )
        )
        provider = provider_result.first()
        if provider:
            provider_dict = self.ap.persistence_mgr.serialize_model(persistence_model.ModelProvider, provider)
-            model_dict['provider'] = _parse_provider_api_keys(provider_dict)
+            provider_dict = _parse_provider_api_keys(provider_dict)
+            model_dict['provider'] = provider_dict
+
+        if not include_secret:
+            model_dict = _redact_model_secrets(model_dict)

        return model_dict

-    async def update_rerank_model(self, model_uuid: str, model_data: dict) -> None:
+    async def update_rerank_model(self, context: TenantContext, model_uuid: str, model_data: dict) -> None:
        """Update an existing rerank model"""
-        if 'uuid' in model_data:
-            del model_data['uuid']
+        existing_model = await self.get_rerank_model(context, model_uuid, include_secret=True)
+        if existing_model is None:
+            raise WorkspaceNotFoundError('Model not found')
+        model_data = model_data.copy()
+        model_data.pop('uuid', None)
+        model_data.pop('workspace_uuid', None)
+        if 'extra_args' in model_data:
+            model_data['extra_args'] = restore_secret_placeholders(
+                model_data['extra_args'],
+                existing_model.get('extra_args', {}),
+            )

        if 'provider' in model_data:
            provider_data = model_data.pop('provider')
@@ -530,50 +786,76 @@ class RerankModelsService:
                model_data['provider_uuid'] = provider_data['uuid']
            else:
                provider_uuid = await self.ap.provider_service.find_or_create_provider(
+                    context,
                    requester=provider_data.get('requester', ''),
                    base_url=provider_data.get('base_url', ''),
                    api_keys=provider_data.get('api_keys', []),
                )
                model_data['provider_uuid'] = provider_uuid

-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.update(persistence_model.RerankModel)
-            .where(persistence_model.RerankModel.uuid == model_uuid)
-            .values(**model_data)
+        provider_uuid = model_data.get('provider_uuid', existing_model['provider_uuid'])
+        await _require_workspace_provider(self.ap, context, provider_uuid)
+        await _validate_provider_supports(self.ap, context, provider_uuid, 'rerank')
+
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.update(persistence_model.RerankModel)
+                .where(persistence_model.RerankModel.uuid == model_uuid)
+                .values(**model_data),
+                persistence_model.RerankModel,
+                context,
+            )
        )
+        if getattr(result, 'rowcount', None) == 0:
+            raise WorkspaceNotFoundError('Model not found')

-        await self.ap.model_mgr.remove_rerank_model(model_uuid)
-
-        runtime_provider = self.ap.model_mgr.provider_dict.get(model_data['provider_uuid'])
-        if runtime_provider is None:
-            raise Exception('provider not found')
-
+        await self.ap.model_mgr.remove_rerank_model(context, model_uuid)
+        runtime_provider = await _require_runtime_provider(self.ap, context, provider_uuid)
        runtime_rerank_model = await self.ap.model_mgr.load_rerank_model_with_provider(
-            persistence_model.RerankModel(**_runtime_model_data(model_uuid, model_data)),
+            context,
+            persistence_model.RerankModel(
+                **_runtime_model_data(
+                    model_uuid,
+                    {
+                        key: value
+                        for key, value in {**existing_model, **model_data, 'provider_uuid': provider_uuid}.items()
+                        if key not in {'provider', 'created_at', 'updated_at'}
+                    },
+                )
+            ),
            runtime_provider,
        )
-        self.ap.model_mgr.rerank_models.append(runtime_rerank_model)
+        await self.ap.model_mgr.cache_rerank_model(context, runtime_rerank_model)

-    async def delete_rerank_model(self, model_uuid: str) -> None:
+    async def delete_rerank_model(self, context: TenantContext, model_uuid: str) -> None:
        """Delete a rerank model"""
-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.delete(persistence_model.RerankModel).where(persistence_model.RerankModel.uuid == model_uuid)
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.delete(persistence_model.RerankModel).where(
+                    persistence_model.RerankModel.uuid == model_uuid
+                ),
+                persistence_model.RerankModel,
+                context,
+            )
        )
-        await self.ap.model_mgr.remove_rerank_model(model_uuid)
+        if getattr(result, 'rowcount', None) == 0:
+            raise WorkspaceNotFoundError('Model not found')
+        await self.ap.model_mgr.remove_rerank_model(context, model_uuid)

-    async def test_rerank_model(self, model_uuid: str, model_data: dict) -> None:
+    async def test_rerank_model(self, context: TenantContext, model_uuid: str, model_data: dict) -> None:
        """Test a rerank model"""
+        require_workspace_uuid(context)
        runtime_rerank_model: model_requester.RuntimeRerankModel | None = None

        if model_uuid != '_':
-            for model in self.ap.model_mgr.rerank_models:
-                if model.model_entity.uuid == model_uuid:
-                    runtime_rerank_model = model
-                    break
-            if runtime_rerank_model is None:
-                raise Exception('model not found')
+            if await self.get_rerank_model(context, model_uuid) is None:
+                raise WorkspaceNotFoundError('Model not found')
+            runtime_rerank_model = await self.ap.model_mgr.get_rerank_model_by_uuid(context, model_uuid)
        else:
-            runtime_rerank_model = await self.ap.model_mgr.init_temporary_runtime_rerank_model(model_data)
+            runtime_rerank_model = await self.ap.model_mgr.init_temporary_runtime_rerank_model(
+                context,
+                model_data,
+            )

        await runtime_rerank_model.provider.invoke_rerank(
            model=runtime_rerank_model,
@@ -582,4 +864,5 @@ class RerankModelsService:
                'Artificial intelligence is a branch of computer science.',
                'The weather is nice today.',
            ],
+            execution_context=runtime_rerank_model.execution_context,
        )
@@ -6,6 +6,9 @@ import sqlalchemy

 from ....core import app
 from ....entity.persistence import pipeline as persistence_pipeline
+from ....workspace.errors import WorkspaceNotFoundError
+from .secrets import contains_secret_placeholder, redact_secrets, restore_secret_placeholders
+from .tenant import TenantContext, require_workspace_uuid, scope_statement


 default_stage_order = [
@@ -30,7 +33,8 @@ class PipelineService:
    def __init__(self, ap: app.Application) -> None:
        self.ap = ap

-    async def get_pipeline_metadata(self) -> list[dict]:
+    async def get_pipeline_metadata(self, context: TenantContext) -> list[dict]:
+        require_workspace_uuid(context)
        return [
            self.ap.pipeline_config_meta_trigger,
            self.ap.pipeline_config_meta_safety,
@@ -38,8 +42,19 @@ class PipelineService:
            self.ap.pipeline_config_meta_output,
        ]

-    async def get_pipelines(self, sort_by: str = 'created_at', sort_order: str = 'DESC') -> list[dict]:
-        query = sqlalchemy.select(persistence_pipeline.LegacyPipeline)
+    async def get_pipelines(
+        self,
+        context: TenantContext,
+        sort_by: str = 'created_at',
+        sort_order: str = 'DESC',
+        *,
+        include_secret: bool = False,
+    ) -> list[dict]:
+        query = scope_statement(
+            sqlalchemy.select(persistence_pipeline.LegacyPipeline),
+            persistence_pipeline.LegacyPipeline,
+            context,
+        )

        if sort_by == 'created_at':
            if sort_order == 'DESC':
@@ -54,15 +69,26 @@ class PipelineService:

        result = await self.ap.persistence_mgr.execute_async(query)
        pipelines = result.all()
-        return [
+        serialized = [
            self.ap.persistence_mgr.serialize_model(persistence_pipeline.LegacyPipeline, pipeline)
            for pipeline in pipelines
        ]
+        return serialized if include_secret else [redact_secrets(pipeline) for pipeline in serialized]

-    async def get_pipeline(self, pipeline_uuid: str) -> dict | None:
+    async def get_pipeline(
+        self,
+        context: TenantContext,
+        pipeline_uuid: str,
+        *,
+        include_secret: bool = False,
+    ) -> dict | None:
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_pipeline.LegacyPipeline).where(
-                persistence_pipeline.LegacyPipeline.uuid == pipeline_uuid
+            scope_statement(
+                sqlalchemy.select(persistence_pipeline.LegacyPipeline).where(
+                    persistence_pipeline.LegacyPipeline.uuid == pipeline_uuid
+                ),
+                persistence_pipeline.LegacyPipeline,
+                context,
            )
        )

@@ -71,20 +97,24 @@ class PipelineService:
        if pipeline is None:
            return None

-        return self.ap.persistence_mgr.serialize_model(persistence_pipeline.LegacyPipeline, pipeline)
+        serialized = self.ap.persistence_mgr.serialize_model(persistence_pipeline.LegacyPipeline, pipeline)
+        return serialized if include_secret else redact_secrets(serialized)

-    async def create_pipeline(self, pipeline_data: dict, default: bool = False) -> str:
+    async def create_pipeline(self, context: TenantContext, pipeline_data: dict, default: bool = False) -> str:
        from ....utils import paths as path_utils

+        workspace_uuid = require_workspace_uuid(context)
        # Check limitation
        limitation = self.ap.instance_config.data.get('system', {}).get('limitation', {})
        max_pipelines = limitation.get('max_pipelines', -1)
        if max_pipelines >= 0:
-            existing_pipelines = await self.get_pipelines()
+            existing_pipelines = await self.get_pipelines(context)
            if len(existing_pipelines) >= max_pipelines:
                raise ValueError(f'Maximum number of pipelines ({max_pipelines}) reached')

+        pipeline_data = pipeline_data.copy()
        pipeline_data['uuid'] = str(uuid.uuid4())
+        pipeline_data['workspace_uuid'] = workspace_uuid
        pipeline_data['for_version'] = self.ap.ver_mgr.get_current_version()
        pipeline_data['stages'] = default_stage_order.copy()
        pipeline_data['is_default'] = default
@@ -108,79 +138,122 @@ class PipelineService:
            sqlalchemy.insert(persistence_pipeline.LegacyPipeline).values(**pipeline_data)
        )

-        pipeline = await self.get_pipeline(pipeline_data['uuid'])
+        pipeline = await self.get_pipeline(context, pipeline_data['uuid'], include_secret=True)

-        await self.ap.pipeline_mgr.load_pipeline(pipeline)
+        await self.ap.pipeline_mgr.load_pipeline(context, pipeline)

        return pipeline_data['uuid']

-    async def update_pipeline(self, pipeline_uuid: str, pipeline_data: dict) -> None:
+    async def update_pipeline(self, context: TenantContext, pipeline_uuid: str, pipeline_data: dict) -> None:
+        workspace_uuid = require_workspace_uuid(context)
        pipeline_data = pipeline_data.copy()
-        for protected_field in ('uuid', 'for_version', 'stages', 'is_default'):
+        for protected_field in ('uuid', 'workspace_uuid', 'for_version', 'stages', 'is_default'):
            pipeline_data.pop(protected_field, None)

-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.update(persistence_pipeline.LegacyPipeline)
-            .where(persistence_pipeline.LegacyPipeline.uuid == pipeline_uuid)
-            .values(**pipeline_data)
-        )
+        if 'config' in pipeline_data:
+            current_config = None
+            if contains_secret_placeholder(pipeline_data['config']):
+                current_pipeline = await self.get_pipeline(context, pipeline_uuid, include_secret=True)
+                if current_pipeline is None:
+                    raise WorkspaceNotFoundError('Pipeline not found')
+                current_config = current_pipeline.get('config', {})
+            pipeline_data['config'] = restore_secret_placeholders(
+                pipeline_data['config'],
+                current_config if current_config is not None else {},
+            )

-        pipeline = await self.get_pipeline(pipeline_uuid)
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.update(persistence_pipeline.LegacyPipeline)
+                .where(persistence_pipeline.LegacyPipeline.uuid == pipeline_uuid)
+                .values(**pipeline_data),
+                persistence_pipeline.LegacyPipeline,
+                workspace_uuid,
+            )
+        )
+        if getattr(result, 'rowcount', None) == 0:
+            raise WorkspaceNotFoundError('Pipeline not found')
+
+        pipeline = await self.get_pipeline(context, pipeline_uuid, include_secret=True)
+        if pipeline is None:
+            raise WorkspaceNotFoundError('Pipeline not found')

        if 'name' in pipeline_data:
            from ....entity.persistence import bot as persistence_bot

            result = await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.select(persistence_bot.Bot).where(persistence_bot.Bot.use_pipeline_uuid == pipeline_uuid)
+                scope_statement(
+                    sqlalchemy.select(persistence_bot.Bot).where(
+                        persistence_bot.Bot.use_pipeline_uuid == pipeline_uuid
+                    ),
+                    persistence_bot.Bot,
+                    workspace_uuid,
+                )
            )

            bots = result.all()

            for bot in bots:
                bot_data = {'use_pipeline_name': pipeline_data['name']}
-                await self.ap.bot_service.update_bot(bot.uuid, bot_data)
+                await self.ap.bot_service.update_bot(context, bot.uuid, bot_data)

-        await self.ap.pipeline_mgr.remove_pipeline(pipeline_uuid)
-        await self.ap.pipeline_mgr.load_pipeline(pipeline)
+        await self.ap.pipeline_mgr.remove_pipeline(context, pipeline_uuid)
+        await self.ap.pipeline_mgr.load_pipeline(context, pipeline)

        # update all conversation that use this pipeline
        for session in self.ap.sess_mgr.session_list:
-            if session.using_conversation is not None and session.using_conversation.pipeline_uuid == pipeline_uuid:
+            if (
+                session.using_conversation is not None
+                and session.using_conversation.pipeline_uuid == pipeline_uuid
+                and getattr(session, 'workspace_uuid', workspace_uuid) == workspace_uuid
+            ):
                session.using_conversation = None

-    async def delete_pipeline(self, pipeline_uuid: str) -> None:
-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.delete(persistence_pipeline.LegacyPipeline).where(
-                persistence_pipeline.LegacyPipeline.uuid == pipeline_uuid
+    async def delete_pipeline(self, context: TenantContext, pipeline_uuid: str) -> None:
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.delete(persistence_pipeline.LegacyPipeline).where(
+                    persistence_pipeline.LegacyPipeline.uuid == pipeline_uuid
+                ),
+                persistence_pipeline.LegacyPipeline,
+                context,
            )
        )
-        await self.ap.pipeline_mgr.remove_pipeline(pipeline_uuid)
+        if getattr(result, 'rowcount', None) == 0:
+            raise WorkspaceNotFoundError('Pipeline not found')
+        await self.ap.pipeline_mgr.remove_pipeline(context, pipeline_uuid)

-    async def copy_pipeline(self, pipeline_uuid: str) -> str:
+    async def copy_pipeline(self, context: TenantContext, pipeline_uuid: str) -> str:
        """Copy a pipeline with all its configurations"""
+        workspace_uuid = require_workspace_uuid(context)
        # Check limitation
        limitation = self.ap.instance_config.data.get('system', {}).get('limitation', {})
        max_pipelines = limitation.get('max_pipelines', -1)
        if max_pipelines >= 0:
-            existing_pipelines = await self.get_pipelines()
+            existing_pipelines = await self.get_pipelines(context)
            if len(existing_pipelines) >= max_pipelines:
                raise ValueError(f'Maximum number of pipelines ({max_pipelines}) reached')

        # Get the original pipeline
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_pipeline.LegacyPipeline).where(
-                persistence_pipeline.LegacyPipeline.uuid == pipeline_uuid
+            scope_statement(
+                sqlalchemy.select(persistence_pipeline.LegacyPipeline).where(
+                    persistence_pipeline.LegacyPipeline.uuid == pipeline_uuid
+                ),
+                persistence_pipeline.LegacyPipeline,
+                workspace_uuid,
            )
        )

        original_pipeline = result.first()
        if original_pipeline is None:
-            raise ValueError(f'Pipeline {pipeline_uuid} not found')
+            raise WorkspaceNotFoundError(f'Pipeline {pipeline_uuid} not found')

        # Create new pipeline data
        new_uuid = str(uuid.uuid4())
        new_pipeline_data = {
            'uuid': new_uuid,
+            'workspace_uuid': workspace_uuid,
            'name': f'{original_pipeline.name} (Copy)',
            'description': original_pipeline.description,
            'for_version': self.ap.ver_mgr.get_current_version(),
@@ -207,13 +280,14 @@ class PipelineService:
        )

        # Load the new pipeline
-        pipeline = await self.get_pipeline(new_uuid)
-        await self.ap.pipeline_mgr.load_pipeline(pipeline)
+        pipeline = await self.get_pipeline(context, new_uuid, include_secret=True)
+        await self.ap.pipeline_mgr.load_pipeline(context, pipeline)

        return new_uuid

    async def update_pipeline_extensions(
        self,
+        context: TenantContext,
        pipeline_uuid: str,
        bound_plugins: list[dict],
        bound_mcp_servers: list[str] = None,
@@ -225,16 +299,21 @@ class PipelineService:
        mcp_resource_agent_read_enabled: bool | None = None,
    ) -> None:
        """Update the bound plugins and MCP servers for a pipeline"""
+        workspace_uuid = require_workspace_uuid(context)
        # Get current pipeline
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_pipeline.LegacyPipeline).where(
-                persistence_pipeline.LegacyPipeline.uuid == pipeline_uuid
+            scope_statement(
+                sqlalchemy.select(persistence_pipeline.LegacyPipeline).where(
+                    persistence_pipeline.LegacyPipeline.uuid == pipeline_uuid
+                ),
+                persistence_pipeline.LegacyPipeline,
+                workspace_uuid,
            )
        )

        pipeline = result.first()
        if pipeline is None:
-            raise ValueError(f'Pipeline {pipeline_uuid} not found')
+            raise WorkspaceNotFoundError(f'Pipeline {pipeline_uuid} not found')

        # Update extensions_preferences
        extensions_preferences = pipeline.extensions_preferences or {}
@@ -252,12 +331,16 @@ class PipelineService:
            extensions_preferences['mcp_resources'] = bound_mcp_resources

        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.update(persistence_pipeline.LegacyPipeline)
-            .where(persistence_pipeline.LegacyPipeline.uuid == pipeline_uuid)
-            .values(extensions_preferences=extensions_preferences)
+            scope_statement(
+                sqlalchemy.update(persistence_pipeline.LegacyPipeline)
+                .where(persistence_pipeline.LegacyPipeline.uuid == pipeline_uuid)
+                .values(extensions_preferences=extensions_preferences),
+                persistence_pipeline.LegacyPipeline,
+                workspace_uuid,
+            )
        )

        # Reload pipeline to apply changes
-        await self.ap.pipeline_mgr.remove_pipeline(pipeline_uuid)
-        pipeline = await self.get_pipeline(pipeline_uuid)
-        await self.ap.pipeline_mgr.load_pipeline(pipeline)
+        await self.ap.pipeline_mgr.remove_pipeline(context, pipeline_uuid)
+        pipeline = await self.get_pipeline(context, pipeline_uuid, include_secret=True)
+        await self.ap.pipeline_mgr.load_pipeline(context, pipeline)
@@ -7,6 +7,9 @@ import sqlalchemy

 from ....core import app
 from ....entity.persistence import model as persistence_model
+from ....workspace.errors import WorkspaceNotFoundError
+from .secrets import contains_secret_placeholder, redact_secrets, restore_secret_placeholders
+from .tenant import TenantContext, require_workspace_uuid, scope_statement


 class ModelProviderService:
@@ -35,9 +38,15 @@ class ModelProviderService:

        return normalized_keys

-    async def get_providers(self) -> list[dict]:
+    async def get_providers(self, context: TenantContext, include_secret: bool = False) -> list[dict]:
        """Get all providers"""
-        result = await self.ap.persistence_mgr.execute_async(sqlalchemy.select(persistence_model.ModelProvider))
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.select(persistence_model.ModelProvider),
+                persistence_model.ModelProvider,
+                context,
+            )
+        )
        providers = result.all()
        providers_list = []
        for p in providers:
@@ -50,14 +59,25 @@ class ModelProviderService:
                    provider_dict['api_keys'] = json.loads(provider_dict['api_keys'])
                except Exception:
                    provider_dict['api_keys'] = []
+            if not include_secret:
+                provider_dict = redact_secrets(provider_dict)
            providers_list.append(provider_dict)
        return providers_list

-    async def get_provider(self, provider_uuid: str) -> dict | None:
+    async def get_provider(
+        self,
+        context: TenantContext,
+        provider_uuid: str,
+        include_secret: bool = False,
+    ) -> dict | None:
        """Get a single provider by UUID"""
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_model.ModelProvider).where(
-                persistence_model.ModelProvider.uuid == provider_uuid
+            scope_statement(
+                sqlalchemy.select(persistence_model.ModelProvider).where(
+                    persistence_model.ModelProvider.uuid == provider_uuid
+                ),
+                persistence_model.ModelProvider,
+                context,
            )
        )
        provider = result.first()
@@ -72,103 +92,171 @@ class ModelProviderService:
                provider_dict['api_keys'] = json.loads(provider_dict['api_keys'])
            except Exception:
                provider_dict['api_keys'] = []
+        if not include_secret:
+            provider_dict = redact_secrets(provider_dict)
        return provider_dict

-    async def create_provider(self, provider_data: dict) -> str:
+    async def create_provider(self, context: TenantContext, provider_data: dict) -> str:
        """Create a new provider"""
+        provider_data = provider_data.copy()
        provider_data['uuid'] = str(uuid.uuid4())
-        provider_data['api_keys'] = self._normalize_api_keys(provider_data.get('api_keys'))
+        provider_data['workspace_uuid'] = require_workspace_uuid(context)
+        provider_data['api_keys'] = self._normalize_api_keys(
+            restore_secret_placeholders(provider_data.get('api_keys'), sensitive=True)
+        )
        await self.ap.persistence_mgr.execute_async(
            sqlalchemy.insert(persistence_model.ModelProvider).values(**provider_data)
        )

        # load to runtime
-        runtime_provider = await self.ap.model_mgr.load_provider(provider_data)
-        self.ap.model_mgr.provider_dict[runtime_provider.provider_entity.uuid] = runtime_provider
+        runtime_provider = await self.ap.model_mgr.load_provider(context, provider_data)
+        await self.ap.model_mgr.cache_provider(context, runtime_provider)
        return provider_data['uuid']

-    async def update_provider(self, provider_uuid: str, provider_data: dict) -> None:
+    async def update_provider(self, context: TenantContext, provider_uuid: str, provider_data: dict) -> None:
        """Update an existing provider"""
-        if 'uuid' in provider_data:
-            del provider_data['uuid']
+        provider_data = provider_data.copy()
+        provider_data.pop('uuid', None)
+        provider_data.pop('workspace_uuid', None)
        if 'api_keys' in provider_data:
-            provider_data['api_keys'] = self._normalize_api_keys(provider_data.get('api_keys'))
-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.update(persistence_model.ModelProvider)
-            .where(persistence_model.ModelProvider.uuid == provider_uuid)
-            .values(**provider_data)
+            submitted_keys = provider_data.get('api_keys')
+            if contains_secret_placeholder(submitted_keys, sensitive=True):
+                current_provider = await self.get_provider(context, provider_uuid, include_secret=True)
+                if current_provider is None:
+                    raise WorkspaceNotFoundError('Provider not found')
+                submitted_keys = restore_secret_placeholders(
+                    submitted_keys,
+                    current_provider.get('api_keys', []),
+                    sensitive=True,
+                )
+            provider_data['api_keys'] = self._normalize_api_keys(submitted_keys)
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.update(persistence_model.ModelProvider)
+                .where(persistence_model.ModelProvider.uuid == provider_uuid)
+                .values(**provider_data),
+                persistence_model.ModelProvider,
+                context,
+            )
        )
-        await self.ap.model_mgr.reload_provider(provider_uuid)
+        if getattr(result, 'rowcount', None) == 0:
+            raise WorkspaceNotFoundError('Provider not found')
+        await self.ap.model_mgr.reload_provider(context, provider_uuid)

-    async def delete_provider(self, provider_uuid: str) -> None:
+    async def delete_provider(self, context: TenantContext, provider_uuid: str) -> None:
        """Delete a provider (only if no models reference it)"""
+        workspace_uuid = require_workspace_uuid(context)
        # Check if any models use this provider
        llm_result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_model.LLMModel).where(
-                persistence_model.LLMModel.provider_uuid == provider_uuid
+            scope_statement(
+                sqlalchemy.select(persistence_model.LLMModel).where(
+                    persistence_model.LLMModel.provider_uuid == provider_uuid
+                ),
+                persistence_model.LLMModel,
+                workspace_uuid,
            )
        )
        if llm_result.first() is not None:
            raise ValueError('Cannot delete provider: LLM models still reference it')

        embedding_result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_model.EmbeddingModel).where(
-                persistence_model.EmbeddingModel.provider_uuid == provider_uuid
+            scope_statement(
+                sqlalchemy.select(persistence_model.EmbeddingModel).where(
+                    persistence_model.EmbeddingModel.provider_uuid == provider_uuid
+                ),
+                persistence_model.EmbeddingModel,
+                workspace_uuid,
            )
        )
        if embedding_result.first() is not None:
            raise ValueError('Cannot delete provider: Embedding models still reference it')

        rerank_result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_model.RerankModel).where(
-                persistence_model.RerankModel.provider_uuid == provider_uuid
+            scope_statement(
+                sqlalchemy.select(persistence_model.RerankModel).where(
+                    persistence_model.RerankModel.provider_uuid == provider_uuid
+                ),
+                persistence_model.RerankModel,
+                workspace_uuid,
            )
        )
        if rerank_result.first() is not None:
            raise ValueError('Cannot delete provider: Rerank models still reference it')

-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.delete(persistence_model.ModelProvider).where(
-                persistence_model.ModelProvider.uuid == provider_uuid
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.delete(persistence_model.ModelProvider).where(
+                    persistence_model.ModelProvider.uuid == provider_uuid
+                ),
+                persistence_model.ModelProvider,
+                workspace_uuid,
            )
        )
+        if getattr(result, 'rowcount', None) == 0:
+            raise WorkspaceNotFoundError('Provider not found')

-        await self.ap.model_mgr.remove_provider(provider_uuid)
+        await self.ap.model_mgr.remove_provider(context, provider_uuid)

-    async def get_provider_model_counts(self, provider_uuid: str) -> dict:
+    async def get_provider_model_counts(self, context: TenantContext, provider_uuid: str) -> dict:
        """Get count of models using this provider"""
+        workspace_uuid = require_workspace_uuid(context)
+        if await self.get_provider(context, provider_uuid) is None:
+            raise WorkspaceNotFoundError('Provider not found')
        llm_result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(sqlalchemy.func.count())
-            .select_from(persistence_model.LLMModel)
-            .where(persistence_model.LLMModel.provider_uuid == provider_uuid)
+            scope_statement(
+                sqlalchemy.select(sqlalchemy.func.count())
+                .select_from(persistence_model.LLMModel)
+                .where(persistence_model.LLMModel.provider_uuid == provider_uuid),
+                persistence_model.LLMModel,
+                workspace_uuid,
+            )
        )
        llm_count = llm_result.scalar() or 0

        embedding_result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(sqlalchemy.func.count())
-            .select_from(persistence_model.EmbeddingModel)
-            .where(persistence_model.EmbeddingModel.provider_uuid == provider_uuid)
+            scope_statement(
+                sqlalchemy.select(sqlalchemy.func.count())
+                .select_from(persistence_model.EmbeddingModel)
+                .where(persistence_model.EmbeddingModel.provider_uuid == provider_uuid),
+                persistence_model.EmbeddingModel,
+                workspace_uuid,
+            )
        )
        embedding_count = embedding_result.scalar() or 0

        rerank_result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(sqlalchemy.func.count())
-            .select_from(persistence_model.RerankModel)
-            .where(persistence_model.RerankModel.provider_uuid == provider_uuid)
+            scope_statement(
+                sqlalchemy.select(sqlalchemy.func.count())
+                .select_from(persistence_model.RerankModel)
+                .where(persistence_model.RerankModel.provider_uuid == provider_uuid),
+                persistence_model.RerankModel,
+                workspace_uuid,
+            )
        )
        rerank_count = rerank_result.scalar() or 0

        return {'llm_count': llm_count, 'embedding_count': embedding_count, 'rerank_count': rerank_count}

-    async def find_or_create_provider(self, requester: str, base_url: str, api_keys: list) -> str:
+    async def find_or_create_provider(
+        self,
+        context: TenantContext,
+        requester: str,
+        base_url: str,
+        api_keys: list,
+    ) -> str:
        """Find existing provider or create new one"""
-        api_keys = self._normalize_api_keys(api_keys)
+        workspace_uuid = require_workspace_uuid(context)
+        api_keys = self._normalize_api_keys(restore_secret_placeholders(api_keys, sensitive=True))

        # Try to find existing provider with same config
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_model.ModelProvider).where(
-                persistence_model.ModelProvider.requester == requester,
-                persistence_model.ModelProvider.base_url == base_url,
+            scope_statement(
+                sqlalchemy.select(persistence_model.ModelProvider).where(
+                    persistence_model.ModelProvider.requester == requester,
+                    persistence_model.ModelProvider.base_url == base_url,
+                ),
+                persistence_model.ModelProvider,
+                workspace_uuid,
            )
        )
        for provider in result.all():
@@ -187,29 +275,38 @@ class ModelProviderService:
                pass

        return await self.create_provider(
+            context,
            {
                'name': provider_name,
                'requester': requester,
                'base_url': base_url,
                'api_keys': api_keys,
-            }
+            },
        )

-    async def update_space_model_provider_api_keys(self, api_key: str) -> None:
+    async def update_space_model_provider_api_keys(self, context: TenantContext, api_key: str) -> None:
        """Update Space model provider API keys"""
-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.update(persistence_model.ModelProvider)
-            .where(persistence_model.ModelProvider.uuid == '00000000-0000-0000-0000-000000000000')
-            .values(api_keys=self._normalize_api_keys(api_key))
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.update(persistence_model.ModelProvider)
+                .where(persistence_model.ModelProvider.uuid == '00000000-0000-0000-0000-000000000000')
+                .values(api_keys=self._normalize_api_keys(api_key)),
+                persistence_model.ModelProvider,
+                context,
+            )
        )
-        await self.ap.model_mgr.reload_provider('00000000-0000-0000-0000-000000000000')
+        if getattr(result, 'rowcount', None) == 0:
+            raise WorkspaceNotFoundError('Provider not found')
+        await self.ap.model_mgr.reload_provider(context, '00000000-0000-0000-0000-000000000000')

-    async def scan_provider_models(self, provider_uuid: str, model_type: str | None = None) -> dict:
-        provider = await self.get_provider(provider_uuid)
+    async def scan_provider_models(
+        self, context: TenantContext, provider_uuid: str, model_type: str | None = None
+    ) -> dict:
+        provider = await self.get_provider(context, provider_uuid, include_secret=True)
        if provider is None:
-            raise ValueError('provider not found')
+            raise WorkspaceNotFoundError('Provider not found')

-        runtime_provider = await self.ap.model_mgr.load_provider(provider)
+        runtime_provider = await self.ap.model_mgr.load_provider(context, provider)

        try:
            scan_result = await runtime_provider.requester.scan_models(
@@ -230,10 +327,19 @@ class ModelProviderService:
            scanned_models = scan_result
            debug_info = None

-        llm_models = await self.ap.llm_model_service.get_llm_models_by_provider(provider_uuid)
-        embedding_models = await self.ap.embedding_models_service.get_embedding_models_by_provider(provider_uuid)
+        llm_models = await self.ap.llm_model_service.get_llm_models_by_provider(context, provider_uuid)
+        embedding_models = await self.ap.embedding_models_service.get_embedding_models_by_provider(
+            context, provider_uuid
+        )
+        rerank_service = getattr(self.ap, 'rerank_models_service', None)
+        rerank_models = (
+            await rerank_service.get_rerank_models_by_provider(context, provider_uuid)
+            if rerank_service is not None
+            else []
+        )
        existing_llm_names = {model['name'] for model in llm_models}
        existing_embedding_names = {model['name'] for model in embedding_models}
+        existing_rerank_names = {model['name'] for model in rerank_models}

        filtered_models = []
        for model in scanned_models:
@@ -260,6 +366,8 @@ class ModelProviderService:
                    'already_added': (
                        model_name in existing_embedding_names
                        if scanned_type == 'embedding'
+                        else model_name in existing_rerank_names
+                        if scanned_type == 'rerank'
                        else model_name in existing_llm_names
                    ),
                }
@@ -0,0 +1,336 @@
+from __future__ import annotations
+
+import copy
+import re
+from urllib.parse import parse_qsl, urlencode, urlsplit, urlunsplit
+
+
+SECRET_MASK = '***'
+_MISSING_SECRET = object()
+
+_SENSITIVE_NAMES = frozenset(
+    {
+        'api_key',
+        'api_keys',
+        'apikey',
+        'apikeys',
+        'auth',
+        'authorization',
+        'cookie',
+        'credentials',
+        'database_url',
+        'dsn',
+        'header_value',
+        'key',
+        'proxy_authorization',
+        'set_cookie',
+        'webhook_url',
+    }
+)
+_SENSITIVE_TOKENS = frozenset(
+    {
+        'apikey',
+        'credential',
+        'credentials',
+        'passwd',
+        'password',
+        'secret',
+        'token',
+    }
+)
+_KEY_QUALIFIERS = frozenset(
+    {
+        'access',
+        'api',
+        'auth',
+        'bearer',
+        'client',
+        'debug',
+        'encryption',
+        'private',
+        'signing',
+    }
+)
+_SENSITIVE_URL_QUERY_NAMES = frozenset(
+    {
+        'code',
+        'credential',
+        'credentials',
+        'password',
+        'passwd',
+        'sig',
+        'signature',
+    }
+)
+
+
+def _normalize_key(key: object) -> str:
+    value = re.sub(r'([a-z0-9])([A-Z])', r'\1_\2', str(key or ''))
+    return re.sub(r'[^a-zA-Z0-9]+', '_', value).strip('_').lower()
+
+
+def is_sensitive_key(key: object) -> bool:
+    """Return whether a configuration key conventionally carries a secret."""
+
+    normalized = _normalize_key(key)
+    if normalized in _SENSITIVE_NAMES:
+        return True
+    tokens = frozenset(token for token in normalized.split('_') if token)
+    if tokens & _SENSITIVE_TOKENS:
+        return True
+    return bool(tokens & {'key', 'keys'}) and bool(tokens & _KEY_QUALIFIERS)
+
+
+def is_url_key(key: object) -> bool:
+    """Return whether a configuration field conventionally carries a URL."""
+
+    normalized = _normalize_key(key)
+    return normalized == 'url' or normalized.endswith('_url')
+
+
+def _is_sensitive_url_query_key(key: object) -> bool:
+    normalized = _normalize_key(key)
+    return (
+        is_sensitive_key(key) or normalized in _SENSITIVE_URL_QUERY_NAMES or normalized.endswith(('_sig', '_signature'))
+    )
+
+
+def _redact_url_string(value: str) -> str:
+    if not value:
+        return value
+    try:
+        parsed = urlsplit(value)
+        netloc = parsed.netloc
+        if '@' in netloc:
+            _, host = netloc.rsplit('@', 1)
+            netloc = f'{SECRET_MASK}@{host}'
+        query = urlencode(
+            [
+                (key, SECRET_MASK if _is_sensitive_url_query_key(key) and item else item)
+                for key, item in parse_qsl(parsed.query, keep_blank_values=True)
+            ],
+            doseq=True,
+            safe='*',
+        )
+        return urlunsplit((parsed.scheme, netloc, parsed.path, query, parsed.fragment))
+    except (TypeError, ValueError):
+        # A malformed URL cannot be safely decomposed, so fail closed.
+        return SECRET_MASK
+
+
+def redact_url_secrets(value):
+    """Redact URL userinfo and credential-like query values."""
+
+    if isinstance(value, str):
+        return _redact_url_string(value)
+    if isinstance(value, list):
+        return [redact_url_secrets(item) for item in value]
+    if isinstance(value, tuple):
+        return tuple(redact_url_secrets(item) for item in value)
+    return copy.deepcopy(value)
+
+
+def _contains_url_secret_placeholder(value) -> bool:
+    if isinstance(value, str):
+        if value == SECRET_MASK:
+            return True
+        try:
+            parsed = urlsplit(value)
+            if '@' in parsed.netloc and SECRET_MASK in parsed.netloc.rsplit('@', 1)[0]:
+                return True
+            return any(
+                item == SECRET_MASK and _is_sensitive_url_query_key(key)
+                for key, item in parse_qsl(parsed.query, keep_blank_values=True)
+            )
+        except (TypeError, ValueError):
+            return False
+    if isinstance(value, (list, tuple)):
+        return any(_contains_url_secret_placeholder(item) for item in value)
+    return False
+
+
+def _restore_url_string(value: str, current_value) -> str:
+    if value == SECRET_MASK:
+        if current_value is _MISSING_SECRET:
+            raise ValueError('Masked URL secret has no existing value')
+        return copy.deepcopy(current_value)
+
+    try:
+        submitted = urlsplit(value)
+    except (TypeError, ValueError):
+        return value
+
+    current = None
+    if isinstance(current_value, str):
+        try:
+            current = urlsplit(current_value)
+        except (TypeError, ValueError):
+            current = None
+
+    netloc = submitted.netloc
+    if '@' in netloc:
+        submitted_userinfo, host = netloc.rsplit('@', 1)
+        if SECRET_MASK in submitted_userinfo:
+            if current is None or '@' not in current.netloc:
+                raise ValueError('Masked URL userinfo has no existing value')
+            current_userinfo, _ = current.netloc.rsplit('@', 1)
+            netloc = f'{current_userinfo}@{host}'
+
+    current_query: dict[str, list[str]] = {}
+    if current is not None:
+        for key, item in parse_qsl(current.query, keep_blank_values=True):
+            current_query.setdefault(_normalize_key(key), []).append(item)
+    consumed: dict[str, int] = {}
+    restored_query: list[tuple[str, str]] = []
+    for key, item in parse_qsl(submitted.query, keep_blank_values=True):
+        normalized = _normalize_key(key)
+        if item == SECRET_MASK and _is_sensitive_url_query_key(key):
+            index = consumed.get(normalized, 0)
+            candidates = current_query.get(normalized, [])
+            if index >= len(candidates):
+                raise ValueError('Masked URL query secret has no existing value')
+            item = candidates[index]
+            consumed[normalized] = index + 1
+        restored_query.append((key, item))
+
+    return urlunsplit(
+        (
+            submitted.scheme,
+            netloc,
+            submitted.path,
+            urlencode(restored_query, doseq=True, safe='*'),
+            submitted.fragment,
+        )
+    )
+
+
+def restore_url_secret_placeholders(value, current_value=_MISSING_SECRET):
+    """Restore URL placeholders from the corresponding persisted URL."""
+
+    if isinstance(value, str):
+        return _restore_url_string(value, current_value)
+    if isinstance(value, list):
+        current_items = current_value if isinstance(current_value, (list, tuple)) else ()
+        return [
+            restore_url_secret_placeholders(
+                item,
+                current_items[index] if index < len(current_items) else _MISSING_SECRET,
+            )
+            for index, item in enumerate(value)
+        ]
+    if isinstance(value, tuple):
+        current_items = current_value if isinstance(current_value, (list, tuple)) else ()
+        return tuple(
+            restore_url_secret_placeholders(
+                item,
+                current_items[index] if index < len(current_items) else _MISSING_SECRET,
+            )
+            for index, item in enumerate(value)
+        )
+    return copy.deepcopy(value)
+
+
+def mask_secret_value(value):
+    """Return a shape-preserving copy whose non-empty leaves are masked."""
+
+    if isinstance(value, dict):
+        return {key: mask_secret_value(item) for key, item in value.items()}
+    if isinstance(value, list):
+        return [mask_secret_value(item) for item in value]
+    if isinstance(value, tuple):
+        return tuple(mask_secret_value(item) for item in value)
+    if value is None or value == '':
+        return value
+    return SECRET_MASK
+
+
+def redact_secrets(value):
+    """Return a recursively redacted copy without mutating the source value."""
+
+    if isinstance(value, dict):
+        return {
+            key: (
+                mask_secret_value(item)
+                if is_sensitive_key(key)
+                else redact_url_secrets(item)
+                if is_url_key(key)
+                else redact_secrets(item)
+            )
+            for key, item in value.items()
+        }
+    if isinstance(value, list):
+        return [redact_secrets(item) for item in value]
+    if isinstance(value, tuple):
+        return tuple(redact_secrets(item) for item in value)
+    return copy.deepcopy(value)
+
+
+def restore_secret_placeholders(value, current_value=_MISSING_SECRET, *, sensitive: bool = False):
+    """Restore masked leaves from existing data before a management write.
+
+    ``***`` is a reserved placeholder only inside a sensitive field. A masked
+    leaf without an existing counterpart is rejected so it can never become a
+    persisted credential. Empty values and explicit replacements pass through.
+    """
+
+    if sensitive and value == SECRET_MASK:
+        if current_value is _MISSING_SECRET:
+            raise ValueError('Masked secret has no existing value')
+        return copy.deepcopy(current_value)
+    if isinstance(value, dict):
+        current_mapping = current_value if isinstance(current_value, dict) else {}
+        return {
+            key: (
+                restore_url_secret_placeholders(
+                    item,
+                    current_mapping.get(key, _MISSING_SECRET),
+                )
+                if not sensitive and not is_sensitive_key(key) and is_url_key(key)
+                else restore_secret_placeholders(
+                    item,
+                    current_mapping.get(key, _MISSING_SECRET),
+                    sensitive=sensitive or is_sensitive_key(key),
+                )
+            )
+            for key, item in value.items()
+        }
+    if isinstance(value, list):
+        current_items = current_value if isinstance(current_value, (list, tuple)) else ()
+        return [
+            restore_secret_placeholders(
+                item,
+                current_items[index] if index < len(current_items) else _MISSING_SECRET,
+                sensitive=sensitive,
+            )
+            for index, item in enumerate(value)
+        ]
+    if isinstance(value, tuple):
+        current_items = current_value if isinstance(current_value, (list, tuple)) else ()
+        return tuple(
+            restore_secret_placeholders(
+                item,
+                current_items[index] if index < len(current_items) else _MISSING_SECRET,
+                sensitive=sensitive,
+            )
+            for index, item in enumerate(value)
+        )
+    return copy.deepcopy(value)
+
+
+def contains_secret_placeholder(value, *, sensitive: bool = False) -> bool:
+    """Return whether ``value`` contains a meaningful masked secret leaf."""
+
+    if sensitive and value == SECRET_MASK:
+        return True
+    if isinstance(value, dict):
+        return any(
+            (
+                _contains_url_secret_placeholder(item)
+                if not sensitive and not is_sensitive_key(key) and is_url_key(key)
+                else contains_secret_placeholder(item, sensitive=sensitive or is_sensitive_key(key))
+            )
+            for key, item in value.items()
+        )
+    if isinstance(value, (list, tuple)):
+        return any(contains_secret_placeholder(item, sensitive=sensitive) for item in value)
+    return False
@@ -1,9 +1,11 @@
 from __future__ import annotations

+import asyncio
 import io
 import inspect
 import os
 import posixpath
+import stat
 import zipfile
 from typing import Optional
 from urllib.parse import quote, unquote, urlparse
@@ -12,6 +14,9 @@ import httpx

 from ....core import app
 from ....skill.utils import parse_frontmatter
+from ....utils import httpclient
+from ..context import ExecutionContext
+from .tenant import TenantContext, require_workspace_uuid


 _PUBLIC_SKILL_FIELDS = (
@@ -32,6 +37,12 @@ _GITHUB_ASSET_HOSTS = {
    'raw.githubusercontent.com',
    'codeload.github.com',
 }
+_MAX_GITHUB_ARCHIVE_BYTES = 10 * 1024 * 1024
+_MAX_GITHUB_ARCHIVE_ENTRIES = 4096
+_MAX_SKILL_ARCHIVE_FILES = 1024
+_MAX_SKILL_FILE_BYTES = 10 * 1024 * 1024
+_MAX_SKILL_UNCOMPRESSED_BYTES = 50 * 1024 * 1024
+_MAX_SKILL_COMPRESSION_RATIO = 200


 class SkillService:
@@ -75,75 +86,112 @@ class SkillService:
        """Backwards-compatible alias preserved for clarity at call sites."""
        self._require_box(action)

+    async def _execution_context(self, context: TenantContext) -> ExecutionContext:
+        workspace_uuid = require_workspace_uuid(context)
+        instance_uuid = str(getattr(context, 'instance_uuid', '') or '').strip()
+        generation = getattr(context, 'placement_generation', None)
+        if not instance_uuid or isinstance(generation, bool) or not isinstance(generation, int) or generation <= 0:
+            raise ValueError('Skill operations require an explicit fenced execution context')
+        binding = await self.ap.workspace_service.get_execution_binding(
+            workspace_uuid,
+            expected_generation=generation,
+        )
+        if binding.instance_uuid != instance_uuid:
+            raise ValueError('Skill execution context belongs to another LangBot instance')
+        return ExecutionContext(
+            instance_uuid=instance_uuid,
+            workspace_uuid=workspace_uuid,
+            placement_generation=generation,
+            bot_uuid=getattr(context, 'bot_uuid', None),
+            pipeline_uuid=getattr(context, 'pipeline_uuid', None),
+            query_uuid=getattr(context, 'query_uuid', None),
+        )
+
    @staticmethod
    def _serialize_skill(skill: dict) -> dict:
        return {field: skill.get(field) for field in _PUBLIC_SKILL_FIELDS if field in skill}

-    async def list_skills(self) -> list[dict]:
+    async def list_skills(self, context: TenantContext) -> list[dict]:
+        execution_context = await self._execution_context(context)
        # When Box is unavailable, surface an empty list rather than raising —
        # the skills page should render cleanly, and the UI separately renders
        # a "Box disabled / unavailable" banner via useBoxStatus.
        box_service = self._box_service()
        if box_service is None:
            return []
-        return [self._serialize_skill(skill) for skill in await box_service.list_skills()]
+        return [self._serialize_skill(skill) for skill in await box_service.list_skills(execution_context)]

-    async def get_skill(self, skill_name: str) -> Optional[dict]:
+    async def get_skill(self, context: TenantContext, skill_name: str) -> Optional[dict]:
+        execution_context = await self._execution_context(context)
        box_service = self._box_service()
        if box_service is None:
            return None
-        skill = await box_service.get_skill(skill_name)
+        skill = await box_service.get_skill(execution_context, skill_name)
        return self._serialize_skill(skill) if skill else None

-    async def get_skill_by_name(self, name: str) -> Optional[dict]:
-        return await self.get_skill(name)
+    async def get_skill_by_name(self, context: TenantContext, name: str) -> Optional[dict]:
+        return await self.get_skill(context, name)

-    async def create_skill(self, data: dict) -> dict:
+    async def create_skill(self, context: TenantContext, data: dict) -> dict:
+        execution_context = await self._execution_context(context)
        box_service = self._require_box('Creating a skill')
-        created = await box_service.create_skill(data)
-        await self._reload_skills()
+        created = await box_service.create_skill(execution_context, data)
+        await self._reload_skills(execution_context)
        return self._serialize_skill(created)

-    async def update_skill(self, skill_name: str, data: dict) -> dict:
+    async def update_skill(self, context: TenantContext, skill_name: str, data: dict) -> dict:
+        execution_context = await self._execution_context(context)
        box_service = self._require_box('Editing a skill')
-        updated = await box_service.update_skill(skill_name, data)
-        await self._reload_skills()
+        updated = await box_service.update_skill(execution_context, skill_name, data)
+        await self._reload_skills(execution_context)
        return self._serialize_skill(updated)

-    async def delete_skill(self, skill_name: str) -> bool:
+    async def delete_skill(self, context: TenantContext, skill_name: str) -> bool:
+        execution_context = await self._execution_context(context)
        box_service = self._require_box('Deleting a skill')
-        await box_service.delete_skill(skill_name)
-        await self._reload_skills()
+        await box_service.delete_skill(execution_context, skill_name)
+        await self._reload_skills(execution_context)
        return True

    async def list_skill_files(
        self,
+        context: TenantContext,
        skill_name: str,
        path: str = '.',
        include_hidden: bool = False,
        max_entries: int = 200,
    ) -> dict:
+        execution_context = await self._execution_context(context)
        box_service = self._require_box('Browsing skill files')
-        return await box_service.list_skill_files(skill_name, path, include_hidden, max_entries)
+        return await box_service.list_skill_files(execution_context, skill_name, path, include_hidden, max_entries)

-    async def read_skill_file(self, skill_name: str, path: str) -> dict:
+    async def read_skill_file(self, context: TenantContext, skill_name: str, path: str) -> dict:
+        execution_context = await self._execution_context(context)
        box_service = self._require_box('Reading a skill file')
-        return await box_service.read_skill_file(skill_name, path)
+        return await box_service.read_skill_file(execution_context, skill_name, path)

-    async def write_skill_file(self, skill_name: str, path: str, content: str) -> dict:
+    async def write_skill_file(self, context: TenantContext, skill_name: str, path: str, content: str) -> dict:
+        execution_context = await self._execution_context(context)
        box_service = self._require_box('Editing skill files')
-        result = await box_service.write_skill_file(skill_name, path, content)
-        await self._reload_skills()
+        result = await box_service.write_skill_file(execution_context, skill_name, path, content)
+        await self._reload_skills(execution_context)
        return result

-    async def install_from_github(self, data: dict) -> list[dict]:
+    async def install_from_github(self, context: TenantContext, data: dict) -> list[dict]:
+        execution_context = await self._execution_context(context)
        box_service = self._require_box('Installing a skill from GitHub')
        owner = str(data['owner']).strip()
        repo = str(data['repo']).strip()
        release_tag = str(data.get('release_tag', '')).strip()
        raw_asset_url = str(data['asset_url']).strip()
        if self._is_github_skill_md_url(raw_asset_url):
-            return await self._install_github_skill_md(raw_asset_url, owner=owner, repo=repo, data=data)
+            return await self._install_github_skill_md(
+                execution_context,
+                raw_asset_url,
+                owner=owner,
+                repo=repo,
+                data=data,
+            )

        asset_url = self._validate_github_asset_url(raw_asset_url, owner=owner, repo=repo, release_tag=release_tag)
        source_subdir = str(data.get('source_subdir', '') or '').strip()
@@ -151,29 +199,37 @@ class SkillService:
        zip_bytes = await self._download_github_asset(asset_url)
        filename = f'{repo}-{release_tag.lstrip("v").replace("/", "-") or "source"}.zip'
        installed = await box_service.install_skill_zip(
+            execution_context,
            zip_bytes,
            filename,
            source_paths=data.get('source_paths') or [],
            source_path=str(data.get('source_path', '') or ''),
            source_subdir=source_subdir,
        )
-        await self._reload_skills()
+        await self._reload_skills(execution_context)
        return [self._serialize_skill(skill) for skill in installed]

-    async def preview_install_from_github(self, data: dict) -> list[dict]:
+    async def preview_install_from_github(self, context: TenantContext, data: dict) -> list[dict]:
+        execution_context = await self._execution_context(context)
        box_service = self._require_box('Previewing a skill from GitHub')
        owner = str(data['owner']).strip()
        repo = str(data['repo']).strip()
        release_tag = str(data.get('release_tag', '')).strip()
        raw_asset_url = str(data['asset_url']).strip()
        if self._is_github_skill_md_url(raw_asset_url):
-            return await self._preview_github_skill_md(raw_asset_url, owner=owner, repo=repo)
+            return await self._preview_github_skill_md(
+                execution_context,
+                raw_asset_url,
+                owner=owner,
+                repo=repo,
+            )

        asset_url = self._validate_github_asset_url(raw_asset_url, owner=owner, repo=repo, release_tag=release_tag)
        source_subdir = str(data.get('source_subdir', '') or '').strip()

        zip_bytes = await self._download_github_asset(asset_url)
        return await box_service.preview_skill_zip(
+            execution_context,
            zip_bytes,
            f'{repo}-{release_tag.lstrip("v").replace("/", "-") or "source"}.zip',
            source_subdir=source_subdir,
@@ -181,27 +237,45 @@ class SkillService:

    async def install_from_zip_upload(
        self,
+        context: TenantContext,
        *,
        file_bytes: bytes,
        filename: str,
        source_paths: list[str] | None = None,
        source_path: str = '',
    ) -> list[dict]:
+        execution_context = await self._execution_context(context)
        box_service = self._require_box('Installing a skill from upload')
        installed = await box_service.install_skill_zip(
+            execution_context,
            file_bytes,
            filename,
            source_paths=source_paths or [],
            source_path=source_path,
        )
-        await self._reload_skills()
+        await self._reload_skills(execution_context)
        return [self._serialize_skill(skill) for skill in installed]

-    async def preview_install_from_zip_upload(self, *, file_bytes: bytes, filename: str) -> list[dict]:
+    async def preview_install_from_zip_upload(
+        self,
+        context: TenantContext,
+        *,
+        file_bytes: bytes,
+        filename: str,
+    ) -> list[dict]:
+        execution_context = await self._execution_context(context)
        box_service = self._require_box('Previewing a skill upload')
-        return await box_service.preview_skill_zip(file_bytes, filename)
+        return await box_service.preview_skill_zip(execution_context, file_bytes, filename)

-    async def _install_github_skill_md(self, asset_url: str, *, owner: str, repo: str, data: dict) -> list[dict]:
+    async def _install_github_skill_md(
+        self,
+        context: TenantContext,
+        asset_url: str,
+        *,
+        owner: str,
+        repo: str,
+        data: dict,
+    ) -> list[dict]:
        box_service = self._require_box('Installing a skill from GitHub')
        zip_bytes, filename, _package_name = await self._download_github_skill_directory_as_zip(
            asset_url,
@@ -210,46 +284,73 @@ class SkillService:
        )

        installed = await box_service.install_skill_zip(
+            context,
            zip_bytes,
            filename,
            source_paths=data.get('source_paths') or [],
            source_path=str(data.get('source_path', '') or ''),
            target_suffix='',
        )
-        await self._reload_skills()
+        await self._reload_skills(context)
        return [self._serialize_skill(skill) for skill in installed]

-    async def _preview_github_skill_md(self, asset_url: str, *, owner: str, repo: str) -> list[dict]:
+    async def _preview_github_skill_md(
+        self,
+        context: TenantContext,
+        asset_url: str,
+        *,
+        owner: str,
+        repo: str,
+    ) -> list[dict]:
        box_service = self._require_box('Previewing a skill from GitHub')
        zip_bytes, _filename, package_name = await self._download_github_skill_directory_as_zip(
            asset_url,
            owner=owner,
            repo=repo,
        )
-        return await box_service.preview_skill_zip(zip_bytes, f'{package_name}.zip', target_suffix='')
+        return await box_service.preview_skill_zip(context, zip_bytes, f'{package_name}.zip', target_suffix='')

-    async def reload_skills(self) -> list[dict]:
-        await self._reload_skills()
-        return await self.list_skills()
+    async def reload_skills(self, context: TenantContext) -> list[dict]:
+        execution_context = await self._execution_context(context)
+        await self._reload_skills(execution_context)
+        return await self.list_skills(execution_context)

-    async def scan_directory_async(self, path: str) -> dict:
+    async def scan_directory_async(self, context: TenantContext, path: str) -> dict:
+        execution_context = await self._execution_context(context)
        box_service = self._require_box('Scanning a skill directory')
-        return await box_service.scan_skill_directory(path)
+        return await box_service.scan_skill_directory(execution_context, path)

-    async def _reload_skills(self) -> None:
+    async def _reload_skills(self, context: TenantContext) -> None:
        skill_mgr = getattr(self.ap, 'skill_mgr', None)
        reload_skills = getattr(skill_mgr, 'reload_skills', None)
        if not callable(reload_skills):
            return
-        result = reload_skills()
+        result = reload_skills(context)
        if inspect.isawaitable(result):
            await result

    async def _download_github_asset(self, asset_url: str) -> bytes:
-        async with httpx.AsyncClient(follow_redirects=True, timeout=120) as client:
-            resp = await client.get(asset_url)
-            resp.raise_for_status()
-            return resp.content
+        async with httpx.AsyncClient(
+            follow_redirects=True,
+            timeout=120,
+            event_hooks=httpclient.httpx_response_limit_hooks(_MAX_GITHUB_ARCHIVE_BYTES),
+        ) as client:
+            async with client.stream('GET', asset_url) as resp:
+                resp.raise_for_status()
+                content_length = resp.headers.get('content-length')
+                if content_length is not None:
+                    try:
+                        if int(content_length) > _MAX_GITHUB_ARCHIVE_BYTES:
+                            raise ValueError('GitHub skill archive exceeds the compressed size limit')
+                    except ValueError as exc:
+                        if 'exceeds' in str(exc):
+                            raise
+                content = bytearray()
+                async for chunk in resp.aiter_bytes():
+                    content.extend(chunk)
+                    if len(content) > _MAX_GITHUB_ARCHIVE_BYTES:
+                        raise ValueError('GitHub skill archive exceeds the compressed size limit')
+                return bytes(content)

    async def _download_github_skill_directory_as_zip(
        self, asset_url: str, *, owner: str, repo: str
@@ -257,14 +358,25 @@ class SkillService:
        info = self._parse_github_skill_md_url(asset_url, owner=owner, repo=repo)
        archive_url = f'https://codeload.github.com/{owner}/{repo}/zip/{quote(info["ref"], safe="/")}'
        archive_bytes = await self._download_github_asset(archive_url)
+        return await asyncio.to_thread(self._build_github_skill_directory_zip, archive_bytes, info)

+    def _build_github_skill_directory_zip(
+        self,
+        archive_bytes: bytes,
+        info: dict[str, str],
+    ) -> tuple[bytes, str, str]:
+        """Validate and repack a GitHub skill archive outside the event loop."""
        try:
            source_archive = zipfile.ZipFile(io.BytesIO(archive_bytes), 'r')
        except zipfile.BadZipFile as exc:
            raise ValueError('GitHub repository archive must be a valid .zip archive') from exc

        with source_archive as source_zip:
+            if len(source_zip.infolist()) > _MAX_GITHUB_ARCHIVE_ENTRIES:
+                raise ValueError('GitHub repository archive contains too many entries')
            skill_entry = self._find_github_skill_archive_entry(source_zip, info['file_path'])
+            if skill_entry.file_size > _MAX_SKILL_FILE_BYTES:
+                raise ValueError('GitHub SKILL.md exceeds the file size limit')
            try:
                skill_md_content = source_zip.read(skill_entry).decode('utf-8')
            except UnicodeDecodeError as exc:
@@ -302,6 +414,7 @@ class SkillService:
        normalized_source_dir = posixpath.normpath(source_skill_dir)
        source_prefix = f'{normalized_source_dir}/'
        copied_files = 0
+        copied_bytes = 0

        for member in source_zip.infolist():
            normalized_member = posixpath.normpath(member.filename)
@@ -324,10 +437,33 @@ class SkillService:
            if member.is_dir():
                target_zip.writestr(target_info, b'')
                continue
-
-            target_zip.writestr(target_info, source_zip.read(member))
+            if member.flag_bits & 0x1:
+                raise ValueError('Encrypted GitHub skill archive entries are not supported')
+            unix_mode = member.external_attr >> 16
+            if stat.S_IFMT(unix_mode) == stat.S_IFLNK:
+                raise ValueError(f'GitHub archive contains a symbolic link: {member.filename}')
+            if member.file_size > _MAX_SKILL_FILE_BYTES:
+                raise ValueError(f'GitHub skill file exceeds the size limit: {member.filename}')
+            if member.file_size and member.file_size > max(member.compress_size, 1) * _MAX_SKILL_COMPRESSION_RATIO:
+                raise ValueError(f'GitHub skill file exceeds the compression-ratio limit: {member.filename}')
            copied_files += 1
+            copied_bytes += member.file_size
+            if copied_files > _MAX_SKILL_ARCHIVE_FILES:
+                raise ValueError('GitHub skill directory contains too many files')
+            if copied_bytes > _MAX_SKILL_UNCOMPRESSED_BYTES:
+                raise ValueError('GitHub skill directory exceeds the uncompressed size limit')

+            # Copy in bounded chunks instead of materialising a potentially
+            # large member in Core memory. The Box Runtime independently
+            # revalidates the resulting archive before installation.
+            with source_zip.open(member, 'r') as source_file, target_zip.open(target_info, 'w') as target_file:
+                remaining = member.file_size
+                while remaining:
+                    chunk = source_file.read(min(64 * 1024, remaining))
+                    if not chunk:
+                        raise ValueError(f'GitHub skill file is truncated: {member.filename}')
+                    target_file.write(chunk)
+                    remaining -= len(chunk)
        if copied_files == 0:
            raise ValueError('GitHub skill directory is empty')

@@ -1,5 +1,7 @@
 from __future__ import annotations

+from collections import OrderedDict
+
 from langbot.pkg.utils import httpclient
 import typing
 import datetime
@@ -11,6 +13,10 @@ from ....entity.persistence import user
 from ....entity.dto.space_model import SpaceModel


+_CREDITS_CACHE_TTL_SECONDS = 60
+_CREDITS_CACHE_MAX_ENTRIES = 4096
+
+
 class SpaceService:
    """Service for interacting with LangBot Space API"""

@@ -19,7 +25,24 @@ class SpaceService:

    def __init__(self, ap: app.Application) -> None:
        self.ap = ap
-        self._credits_cache = {}
+        self._credits_cache = OrderedDict()
+
+    def _ordered_credits_cache(
+        self,
+    ) -> OrderedDict[str, tuple[int, float]]:
+        if not isinstance(self._credits_cache, OrderedDict):
+            # Preserve compatibility with tests and callers that seed the cache.
+            self._credits_cache = OrderedDict(self._credits_cache)
+        return self._credits_cache
+
+    def _prune_credits_cache(self, now: float) -> None:
+        cache = self._ordered_credits_cache()
+        while cache:
+            email = next(iter(cache))
+            _, cached_at = cache[email]
+            if now - cached_at < _CREDITS_CACHE_TTL_SECONDS:
+                break
+            cache.pop(email, None)

    def _get_space_config(self) -> typing.Dict[str, str]:
        """Get Space configuration from config file"""
@@ -85,12 +108,14 @@ class SpaceService:

    def get_oauth_authorize_url(self, redirect_uri: str, state: str = '') -> str:
        """Get the Space OAuth authorization URL for redirect"""
+        from urllib.parse import urlencode
+
        space_config = self._get_space_config()
        authorize_url = space_config['oauth_authorize_url']
-        params = f'redirect_uri={redirect_uri}'
+        params = {'redirect_uri': redirect_uri}
        if state:
-            params += f'&state={state}'
-        return f'{authorize_url}?{params}'
+            params['state'] = state
+        return f'{authorize_url}?{urlencode(params)}'

    async def exchange_oauth_code(self, code: str) -> typing.Dict:
        """Exchange OAuth authorization code for tokens"""
@@ -105,8 +130,9 @@ class SpaceService:
            json={'code': code, 'instance_id': constants.instance_id},
        ) as response:
            if response.status != 200:
-                raise ValueError(f'Failed to exchange OAuth code: {await response.text()}')
-            data = await response.json()
+                error = await httpclient.read_text_limited(response)
+                raise ValueError(f'Failed to exchange OAuth code: {error}')
+            data = await httpclient.read_json_limited(response)
            if data.get('code') != 0:
                raise ValueError(f'Failed to exchange OAuth code: {data.get("msg")}')
            return data.get('data', {})
@@ -121,8 +147,9 @@ class SpaceService:
            f'{space_url}/api/v1/accounts/token/refresh', json={'refresh_token': refresh_token}
        ) as response:
            if response.status != 200:
-                raise ValueError(f'Failed to refresh token: {await response.text()}')
-            data = await response.json()
+                error = await httpclient.read_text_limited(response)
+                raise ValueError(f'Failed to refresh token: {error}')
+            data = await httpclient.read_json_limited(response)
            if data.get('code') != 0:
                raise ValueError(f'Failed to refresh token: {data.get("msg")}')
            return data.get('data', {})
@@ -137,8 +164,9 @@ class SpaceService:
            f'{space_url}/api/v1/accounts/me', headers={'Authorization': f'Bearer {access_token}'}
        ) as response:
            if response.status != 200:
-                raise ValueError(f'Failed to get user info: {await response.text()}')
-            data = await response.json()
+                error = await httpclient.read_text_limited(response)
+                raise ValueError(f'Failed to get user info: {error}')
+            data = await httpclient.read_json_limited(response)
            if data.get('code') != 0:
                raise ValueError(f'Failed to get user info: {data.get("msg")}')
            return data.get('data', {})
@@ -154,11 +182,13 @@ class SpaceService:

    async def get_credits(self, user_email: str, force_refresh: bool = False) -> int | None:
        """Get Space credits for user with caching (60s TTL)"""
-        cache_ttl = 60
+        now = time.time()
+        cached_fallback = self._credits_cache.get(user_email)
+        self._prune_credits_cache(now)

        if not force_refresh and user_email in self._credits_cache:
            credits, ts = self._credits_cache[user_email]
-            if time.time() - ts < cache_ttl:
+            if now - ts < _CREDITS_CACHE_TTL_SECONDS:
                return credits

        try:
@@ -167,10 +197,14 @@ class SpaceService:
                return None
            credits = info.get('credits')
            if credits is not None:
-                self._credits_cache[user_email] = (credits, time.time())
+                cache = self._ordered_credits_cache()
+                cache.pop(user_email, None)
+                if len(cache) >= _CREDITS_CACHE_MAX_ENTRIES:
+                    cache.popitem(last=False)
+                cache[user_email] = (credits, time.time())
            return credits
        except Exception:
-            return self._credits_cache.get(user_email, (None, 0))[0]
+            return cached_fallback[0] if cached_fallback is not None else None

    async def get_models(self) -> typing.List[SpaceModel]:
        """Get models from Space"""
@@ -181,8 +215,9 @@ class SpaceService:
        session = httpclient.get_session()
        async with session.get(f'{space_url}/api/v1/models', params={'page_size': 100}) as response:
            if response.status != 200:
-                raise ValueError(f'Failed to get models: {await response.text()}')
-            data = await response.json()
+                error = await httpclient.read_text_limited(response)
+                raise ValueError(f'Failed to get models: {error}')
+            data = await httpclient.read_json_limited(response)
            if data.get('code') != 0:
                raise ValueError(f'Failed to get models: {data.get("msg")}')
            models_data = data.get('data', {}).get('models', [])
@@ -0,0 +1,34 @@
+from __future__ import annotations
+
+import typing
+
+from ..authz import WorkspaceRequiredError
+from ..context import ExecutionContext, RequestContext, WorkspaceContext
+
+TenantContext: typing.TypeAlias = RequestContext | ExecutionContext | WorkspaceContext | str
+
+
+def require_workspace_uuid(context: TenantContext | None) -> str:
+    """Resolve an explicit Workspace UUID without allowing a global fallback."""
+
+    if isinstance(context, str):
+        workspace_uuid = context
+    elif isinstance(context, RequestContext):
+        workspace_uuid = context.workspace_uuid
+    elif isinstance(context, ExecutionContext):
+        workspace_uuid = context.workspace_uuid
+    elif isinstance(context, WorkspaceContext):
+        workspace_uuid = context.workspace_uuid
+    else:
+        raise WorkspaceRequiredError('Workspace context is required')
+
+    normalized = workspace_uuid.strip()
+    if not normalized:
+        raise WorkspaceRequiredError('Workspace context is required')
+    return normalized
+
+
+def scope_statement(statement: typing.Any, model: typing.Any, context: TenantContext) -> typing.Any:
+    """Add the mandatory Workspace predicate to a SQLAlchemy statement."""
+
+    return statement.where(model.workspace_uuid == require_workspace_uuid(context))
@@ -6,71 +6,391 @@ import jwt
 import datetime
 import typing
 import asyncio
+import dataclasses
+import heapq
+import hashlib
+import secrets
+import time
+import uuid
+
+from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker

-from ....core import app
 from ....entity.persistence import user
+from ....entity.persistence.workspace import MembershipRole, MembershipStatus, WorkspaceMembership
 from ....utils import constants
 from ....entity.errors import account as account_errors
+from ....workspace.collaboration import normalize_email
+from ....utils import bounded_executor
+
+if typing.TYPE_CHECKING:
+    from ....core.app import Application
+
+
+_SPACE_OAUTH_STATE_MAX_ENTRIES = 4096
+_SPACE_OAUTH_STATE_HEAP_COMPACT_FLOOR = 64
+_SPACE_OAUTH_STATE_HEAP_MAX_MULTIPLIER = 4
+
+
+class AccountExistsLoginRequiredError(ValueError):
+    code = 'account_exists_login_required'
+
+
+class PublicRegistrationClosedError(ValueError):
+    code = 'registration_closed'
+
+
+class ControlPlaneDirectoryRequiredError(PublicRegistrationClosedError):
+    code = 'control_plane_required'
+
+
+class AccountDisabledError(ValueError):
+    code = 'account_disabled'
+
+
+@dataclasses.dataclass(frozen=True, slots=True)
+class SpaceOAuthStateConsumption:
+    purpose: typing.Literal['login', 'bind']
+    account: user.User | None
+    launch_workspace_uuid: str | None = None


 class UserService:
-    ap: app.Application
+    ap: Application
    _create_user_lock: asyncio.Lock

-    def __init__(self, ap: app.Application) -> None:
+    def __init__(self, ap: Application) -> None:
        self.ap = ap
        self._create_user_lock = asyncio.Lock()
-        self._password_hash_lock = asyncio.Semaphore(1)
+        self._password_hash_lock = asyncio.Lock()
+        self._space_oauth_state_lock = asyncio.Lock()
+        self._space_oauth_states: dict[str, tuple[str, str | None, float, str | None]] = {}
+        self._space_oauth_state_expiry_heap: list[tuple[float, str]] = []
+
+    @staticmethod
+    def _space_oauth_state_digest(state: str) -> str:
+        return hashlib.sha256(state.encode('utf-8')).hexdigest()
+
+    def _prune_space_oauth_states(self, now: float) -> None:
+        while self._space_oauth_state_expiry_heap:
+            expires_at, digest = self._space_oauth_state_expiry_heap[0]
+            entry = self._space_oauth_states.get(digest)
+            if entry is None or entry[2] != expires_at:
+                heapq.heappop(self._space_oauth_state_expiry_heap)
+                continue
+            if expires_at > now:
+                break
+            heapq.heappop(self._space_oauth_state_expiry_heap)
+            self._space_oauth_states.pop(digest, None)
+
+        max_heap_entries = max(
+            _SPACE_OAUTH_STATE_HEAP_COMPACT_FLOOR,
+            len(self._space_oauth_states) * _SPACE_OAUTH_STATE_HEAP_MAX_MULTIPLIER,
+        )
+        if len(self._space_oauth_state_expiry_heap) > max_heap_entries:
+            self._space_oauth_state_expiry_heap[:] = [
+                (entry[2], digest) for digest, entry in self._space_oauth_states.items()
+            ]
+            heapq.heapify(self._space_oauth_state_expiry_heap)
+
+    def _evict_earliest_space_oauth_state(self) -> None:
+        while self._space_oauth_state_expiry_heap:
+            expires_at, digest = heapq.heappop(self._space_oauth_state_expiry_heap)
+            entry = self._space_oauth_states.get(digest)
+            if entry is not None and entry[2] == expires_at:
+                self._space_oauth_states.pop(digest, None)
+                return
+
+    async def issue_space_oauth_state(
+        self,
+        purpose: typing.Literal['login', 'bind'],
+        *,
+        account_uuid: str | None = None,
+        launch_workspace_uuid: str | None = None,
+        ttl_seconds: int = 600,
+    ) -> str:
+        """Issue an opaque, single-use OAuth state without exposing a JWT."""
+        if purpose == 'bind' and not account_uuid:
+            raise ValueError('An Account is required for Space binding')
+        if purpose == 'login' and account_uuid is not None:
+            raise ValueError('Login state cannot be bound to an Account')
+        if purpose != 'login' and launch_workspace_uuid is not None:
+            raise ValueError('Launch Workspace state is only valid for Space login')
+        if ttl_seconds <= 0:
+            raise ValueError('OAuth state lifetime must be positive')
+
+        raw_state = secrets.token_urlsafe(32)
+        digest = self._space_oauth_state_digest(raw_state)
+        expires_at = time.monotonic() + min(ttl_seconds, 600)
+        async with self._space_oauth_state_lock:
+            now = time.monotonic()
+            self._prune_space_oauth_states(now)
+            if len(self._space_oauth_states) >= _SPACE_OAUTH_STATE_MAX_ENTRIES:
+                self._evict_earliest_space_oauth_state()
+            self._space_oauth_states[digest] = (purpose, account_uuid, expires_at, launch_workspace_uuid)
+            heapq.heappush(
+                self._space_oauth_state_expiry_heap,
+                (expires_at, digest),
+            )
+        return raw_state
+
+    async def consume_space_oauth_state_details(
+        self,
+        raw_state: str,
+        purpose: typing.Literal['login', 'bind'],
+    ) -> SpaceOAuthStateConsumption:
+        """Atomically consume OAuth state and return any bound launch intent."""
+        if not isinstance(raw_state, str) or not raw_state:
+            raise ValueError('Invalid or expired OAuth state')
+        digest = self._space_oauth_state_digest(raw_state)
+        async with self._space_oauth_state_lock:
+            entry = self._space_oauth_states.pop(digest, None)
+        if entry is None or entry[0] != purpose or entry[2] <= time.monotonic():
+            raise ValueError('Invalid or expired OAuth state')
+        if purpose == 'login':
+            return SpaceOAuthStateConsumption(
+                purpose='login',
+                account=None,
+                launch_workspace_uuid=entry[3],
+            )
+
+        account_uuid = entry[1]
+        account = await self.get_user_by_uuid(account_uuid or '')
+        if account is None:
+            raise ValueError('Invalid or expired OAuth state')
+        self._require_active_account(account)
+        return SpaceOAuthStateConsumption(purpose='bind', account=account)
+
+    async def consume_space_oauth_state(
+        self,
+        raw_state: str,
+        purpose: typing.Literal['login', 'bind'],
+    ) -> user.User | None:
+        """Atomically consume OAuth state and resolve its active bind Account."""
+        consumed = await self.consume_space_oauth_state_details(raw_state, purpose)
+        return consumed.account

    async def _hash_password(self, password: str) -> str:
+        if self._password_hash_lock.locked():
+            raise bounded_executor.BlockingWorkCapacityError(
+                'Password hashing capacity reached',
+                scope='system:authentication',
+            )
        async with self._password_hash_lock:
-            return await asyncio.to_thread(argon2.PasswordHasher().hash, password)
+            with bounded_executor.blocking_work_scope('system:authentication'):
+                return await asyncio.to_thread(argon2.PasswordHasher().hash, password)
+
+    def _require_local_directory(self) -> None:
+        if self._uses_control_plane_directory():
+            raise ControlPlaneDirectoryRequiredError(
+                'Cloud Accounts and directory changes are managed by the SaaS control plane'
+            )
+
+    def _uses_control_plane_directory(self) -> bool:
+        workspace_service = getattr(self.ap, 'workspace_service', None)
+        return bool(workspace_service is not None and workspace_service.policy.multi_workspace_enabled)

    async def _verify_password(self, hashed_password: str, password: str) -> None:
+        if self._password_hash_lock.locked():
+            raise bounded_executor.BlockingWorkCapacityError(
+                'Password hashing capacity reached',
+                scope='system:authentication',
+            )
        async with self._password_hash_lock:
-            await asyncio.to_thread(argon2.PasswordHasher().verify, hashed_password, password)
+            with bounded_executor.blocking_work_scope('system:authentication'):
+                await asyncio.to_thread(argon2.PasswordHasher().verify, hashed_password, password)
+
+    async def _update_space_provider_for_account(self, account: typing.Any, api_key: str) -> None:
+        """Refresh the OSS Workspace Space provider without guessing a SaaS Workspace.
+
+        Space OAuth credentials belong to an Account, while model-provider secrets
+        belong to a Workspace. Community edition has one unambiguous Workspace, so
+        the historical automatic refresh remains available only to the Workspace owner.
+        In multi-Workspace SaaS mode the OAuth callback has
+        no trusted Workspace selector; the closed control plane or an explicit
+        Workspace settings action must perform that linkage instead.
+        """
+
+        workspace_service = getattr(self.ap, 'workspace_service', None)
+        collaboration_service = getattr(self.ap, 'workspace_collaboration_service', None)
+        account_uuid = getattr(account, 'uuid', None)
+        if workspace_service is None or collaboration_service is None or not isinstance(account_uuid, str):
+            # Never turn a missing tenant kernel into a global secret mutation.
+            return
+        if workspace_service.policy.multi_workspace_enabled:
+            return
+
+        accesses = await collaboration_service.list_account_workspaces(account_uuid)
+        if len(accesses) != 1:
+            return
+        access = accesses[0]
+        if access.membership.role != MembershipRole.OWNER.value:
+            return
+        await self.ap.provider_service.update_space_model_provider_api_keys(
+            access.workspace.uuid,
+            api_key,
+        )

    async def is_initialized(self) -> bool:
-        result = await self.ap.persistence_mgr.execute_async(sqlalchemy.select(user.User).limit(1))
+        account = await self._identity_scalar(
+            sqlalchemy.select(user.User).limit(1),
+            f'instance:{self._jwt_identity()[1]}',
+        )
+        return account is not None

-        result_list = result.all()
-        return result_list is not None and len(result_list) > 0
+    async def get_login_capabilities(self) -> dict[str, bool]:
+        """Derive enabled public login methods in an explicit discovery scope."""
+        password_count = sqlalchemy.func.count().filter(user.User.password.is_not(None), user.User.password != '')
+        space_count = sqlalchemy.func.count().filter(user.User.space_account_uuid.is_not(None))
+        statement = sqlalchemy.select(password_count, space_count).where(
+            user.User.status == user.AccountStatus.ACTIVE.value
+        )
+        digest = hashlib.sha256(f'login-capabilities:{self._jwt_identity()[1]}'.encode('utf-8')).hexdigest()
+        current_session = getattr(self.ap.persistence_mgr, 'current_session', lambda: None)
+        identity_uow = getattr(self.ap.persistence_mgr, 'identity_discovery_uow', None)
+        if current_session() is None and callable(identity_uow):
+            async with identity_uow(digest) as discovery:
+                result = await discovery.session.execute(statement)
+        else:
+            result = await self.ap.persistence_mgr.execute_async(statement)
+        password_accounts, space_accounts = result.one()
+        return {
+            'password_login_enabled': bool(password_accounts),
+            'space_login_enabled': bool(space_accounts),
+        }
+
+    async def get_workspace_owner(self, workspace_uuid: str) -> user.User | None:
+        """Resolve the active owner Account for a Workspace."""
+        statement = (
+            sqlalchemy.select(user.User)
+            .join(WorkspaceMembership, WorkspaceMembership.account_uuid == user.User.uuid)
+            .where(
+                WorkspaceMembership.workspace_uuid == workspace_uuid,
+                WorkspaceMembership.role == MembershipRole.OWNER.value,
+                WorkspaceMembership.status == MembershipStatus.ACTIVE.value,
+                user.User.status == user.AccountStatus.ACTIVE.value,
+            )
+        )
+        current_session = self.ap.persistence_mgr.current_session()
+        if current_session is not None:
+            return await current_session.scalar(statement)
+        return await self._identity_scalar(statement, f'workspace-owner:{workspace_uuid}')
+
+    def _session_factory(self) -> async_sessionmaker[AsyncSession]:
+        return async_sessionmaker(self.ap.persistence_mgr.get_db_engine(), expire_on_commit=False)
+
+    def _jwt_identity(self) -> tuple[str, str]:
+        workspace_service = getattr(self.ap, 'workspace_service', None)
+        instance_uuid = str(getattr(workspace_service, 'instance_uuid', '') or constants.instance_id).strip()
+        # UserService is constructed only after config/bootstrap in production.
+        # The fallback keeps lightweight isolated unit tests deterministic.
+        if not instance_uuid:
+            instance_uuid = 'uninitialized-test-instance'
+        return 'langbot-core', f'langbot-instance:{instance_uuid}'
+
+    def _legacy_local_tokens_allowed(self) -> bool:
+        workspace_service = getattr(self.ap, 'workspace_service', None)
+        policy = getattr(workspace_service, 'policy', None)
+        return getattr(policy, 'multi_workspace_enabled', False) is not True

    async def create_user(self, user_email: str, password: str) -> None:
+        """Create the first local Account and Workspace owner atomically."""
+
+        await self.create_initial_account(user_email, password)
+
+    async def create_initial_account(self, user_email: str, password: str) -> user.User:
+        self._require_local_directory()
+        normalized_email = normalize_email(user_email)
        hashed_password = await self._hash_password(password)

-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.insert(user.User).values(user=user_email, password=hashed_password, account_type='local')
+        async with self._create_user_lock:
+            async with self._session_factory()() as session:
+                async with session.begin():
+                    existing_count = int(
+                        (await session.scalar(sqlalchemy.select(sqlalchemy.func.count()).select_from(user.User))) or 0
+                    )
+                    if existing_count:
+                        raise PublicRegistrationClosedError('System already initialized')
+                    account = self._new_account(normalized_email, hashed_password)
+                    session.add(account)
+                    await session.flush()
+                    await self.ap.workspace_service.bootstrap_local_account(account.uuid, session=session)
+                    return account
+
+    async def register_invited_account(
+        self,
+        invitation_token: str,
+        user_email: str,
+        password: str,
+    ) -> tuple[user.User, typing.Any]:
+        """Create an invited Account and accept its Membership in one transaction."""
+
+        normalized_email = normalize_email(user_email)
+        if self._uses_control_plane_directory():
+            raise ControlPlaneDirectoryRequiredError(
+                'Cloud invitation registration must use a Space account to preserve control-plane identity'
+            )
+        invitation, _ = await self.ap.workspace_collaboration_service.inspect_invitation(invitation_token)
+        if invitation.normalized_email != normalized_email:
+            from ....workspace.collaboration import InvitationEmailMismatchError
+
+            raise InvitationEmailMismatchError('Invitation email does not match the Account')
+        hashed_password = await self._hash_password(password)
+
+        async with self._create_user_lock:
+            async with self._session_factory()() as session:
+                async with session.begin():
+                    existing = await session.scalar(
+                        sqlalchemy.select(user.User).where(user.User.normalized_email == normalized_email)
+                    )
+                    if existing is not None:
+                        raise AccountExistsLoginRequiredError('An Account already exists for this email')
+                    account = self._new_account(normalized_email, hashed_password)
+                    session.add(account)
+                    await session.flush()
+                    membership = await self.ap.workspace_collaboration_service.accept_invitation(
+                        invitation_token,
+                        account.uuid,
+                        session=session,
+                    )
+                return account, membership
+
+    def _new_account(self, normalized_email: str, hashed_password: str) -> user.User:
+        return user.User(
+            uuid=str(uuid.uuid4()),
+            user=normalized_email,
+            normalized_email=normalized_email,
+            password=hashed_password,
+            account_type='local',
+            status=user.AccountStatus.ACTIVE.value,
+            source=user.AccountSource.LOCAL.value,
+            projection_revision=0,
        )

    async def get_user_by_email(self, user_email: str) -> user.User | None:
-        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(user.User).where(user.User.user == user_email)
+        normalized_email = user_email.strip().casefold()
+        return await self._identity_scalar(
+            sqlalchemy.select(user.User).where(user.User.normalized_email == normalized_email),
+            f'email:{normalized_email}',
        )

-        result_list = result.all()
-        return result_list[0] if result_list is not None and len(result_list) > 0 else None
+    async def get_user_by_uuid(self, account_uuid: str) -> user.User | None:
+        return await self._identity_scalar(
+            sqlalchemy.select(user.User).where(user.User.uuid == account_uuid),
+            f'uuid:{account_uuid}',
+        )

    async def get_user_by_space_account_uuid(self, space_account_uuid: str) -> user.User | None:
        """Get user by Space account UUID"""
-        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(user.User).where(user.User.space_account_uuid == space_account_uuid)
+        return await self._identity_scalar(
+            sqlalchemy.select(user.User).where(user.User.space_account_uuid == space_account_uuid),
+            f'space:{space_account_uuid}',
        )

-        result_list = result.all()
-        return result_list[0] if result_list is not None and len(result_list) > 0 else None
-
    async def authenticate(self, user_email: str, password: str) -> str | None:
-        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(user.User).where(user.User.user == user_email)
-        )
-
-        result_list = result.all()
-
-        if result_list is None or len(result_list) == 0:
+        user_obj = await self.get_user_by_email(user_email)
+        if user_obj is None:
            raise ValueError('用户不存在')
-
-        user_obj = result_list[0]
+        self._require_active_account(user_obj)

        # Check if this user has a local password set
        if not user_obj.password:
@@ -78,30 +398,121 @@ class UserService:

        await self._verify_password(user_obj.password, password)

-        return await self.generate_jwt_token(user_email)
+        return await self.generate_jwt_token(user_obj)

-    async def generate_jwt_token(self, user_email: str) -> str:
+    async def generate_jwt_token(self, account: user.User | str) -> str:
        jwt_secret = self.ap.instance_config.data['system']['jwt']['secret']
        jwt_expire = self.ap.instance_config.data['system']['jwt']['expire']

+        account_obj: user.User | None = account if not isinstance(account, str) and hasattr(account, 'user') else None
+        user_email = account_obj.user if account_obj is not None else account
+        if account_obj is None and hasattr(self.ap, 'persistence_mgr'):
+            try:
+                account_obj = await self.get_user_by_email(user_email)
+            except (AttributeError, TypeError):
+                # Lightweight unit-test and bootstrap callers may not have persistence wired.
+                account_obj = None
+
        payload = {
            'user': user_email,
-            'iss': 'LangBot-' + constants.edition,
+            'iss': self._jwt_identity()[0],
+            'aud': self._jwt_identity()[1],
            'exp': datetime.datetime.now(datetime.timezone.utc) + datetime.timedelta(seconds=jwt_expire),
        }
+        if account_obj is not None:
+            self._require_active_account(account_obj)
+            payload.update(
+                {
+                    'sub': account_obj.uuid,
+                    'account_revision': account_obj.projection_revision,
+                }
+            )

        return jwt.encode(payload, jwt_secret, algorithm='HS256')

    async def verify_jwt_token(self, token: str) -> str:
-        jwt_secret = self.ap.instance_config.data['system']['jwt']['secret']
+        account = await self.get_authenticated_account(token, allow_unresolved_legacy=True)
+        if isinstance(account, str):
+            return account
+        return account.user

-        return jwt.decode(token, jwt_secret, algorithms=['HS256'])['user']
+    async def get_authenticated_account(
+        self,
+        token: str,
+        *,
+        allow_unresolved_legacy: bool = False,
+    ) -> user.User | str:
+        """Resolve a JWT to an active Account, accepting bounded legacy email tokens."""
+
+        jwt_secret = self.ap.instance_config.data['system']['jwt']['secret']
+        issuer, audience = self._jwt_identity()
+        try:
+            payload = jwt.decode(
+                token,
+                jwt_secret,
+                algorithms=['HS256'],
+                issuer=issuer,
+                audience=audience,
+                options={'require': ['exp', 'iss', 'aud']},
+            )
+        except jwt.MissingRequiredClaimError:
+            # Preserve one bounded OSS upgrade path for previously issued
+            # community tokens. SaaS/Cloud policy never accepts these tokens,
+            # and a token carrying a new-style or foreign audience cannot fall
+            # back into the legacy decoder.
+            unverified = jwt.decode(token, options={'verify_signature': False})
+            if (
+                not self._legacy_local_tokens_allowed()
+                or 'aud' in unverified
+                or unverified.get('iss') != 'LangBot-community'
+            ):
+                raise
+            payload = jwt.decode(
+                token,
+                jwt_secret,
+                algorithms=['HS256'],
+                options={'require': ['exp'], 'verify_aud': False, 'verify_iss': False},
+            )
+        account_obj: user.User | None = None
+        account_uuid = payload.get('sub')
+        if isinstance(account_uuid, str) and account_uuid:
+            try:
+                account_obj = await self.get_user_by_uuid(account_uuid)
+            except AttributeError:
+                account_obj = None
+        if account_obj is None:
+            legacy_email = payload.get('user')
+            if not isinstance(legacy_email, str) or not legacy_email:
+                raise ValueError('JWT Account identity is missing')
+            try:
+                account_obj = await self.get_user_by_email(legacy_email)
+            except AttributeError:
+                account_obj = None
+            if account_obj is None and allow_unresolved_legacy:
+                return legacy_email
+        if account_obj is None:
+            raise ValueError('Account not found')
+        self._require_active_account(account_obj)
+        token_revision = payload.get('account_revision')
+        if token_revision is not None and int(token_revision) != account_obj.projection_revision:
+            raise ValueError('Account token revision is stale')
+        return account_obj
+
+    @staticmethod
+    def _require_active_account(account: user.User) -> None:
+        status = getattr(account, 'status', user.AccountStatus.ACTIVE.value)
+        if isinstance(status, str) and status != user.AccountStatus.ACTIVE.value:
+            raise AccountDisabledError('Account is disabled')

    async def reset_password(self, user_email: str, new_password: str) -> None:
        hashed_password = await self._hash_password(new_password)
+        normalized_email = normalize_email(user_email)

-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.update(user.User).where(user.User.user == user_email).values(password=hashed_password)
+        await self._identity_execute(
+            sqlalchemy.update(user.User)
+            .where(user.User.normalized_email == normalized_email)
+            .values(password=hashed_password),
+            f'email:{normalized_email}',
        )

    async def change_password(self, user_email: str, current_password: str, new_password: str) -> None:
@@ -115,9 +526,13 @@ class UserService:
        await self._verify_password(user_obj.password, current_password)

        hashed_password = await self._hash_password(new_password)
+        normalized_email = normalize_email(user_email)

-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.update(user.User).where(user.User.user == user_email).values(password=hashed_password)
+        await self._identity_execute(
+            sqlalchemy.update(user.User)
+            .where(user.User.normalized_email == normalized_email)
+            .values(password=hashed_password),
+            f'email:{normalized_email}',
        )

    # Space user management
@@ -132,6 +547,16 @@ class UserService:
        expires_in: int = 0,
    ) -> user.User:
        """Create or update a Space user account (only if system not initialized or user exists)"""
+        if self._uses_control_plane_directory():
+            return await self._update_projected_space_user(
+                space_account_uuid=space_account_uuid,
+                email=email,
+                access_token=access_token,
+                refresh_token=refresh_token,
+                api_key=api_key,
+                expires_in=expires_in,
+            )
+        self._require_local_directory()
        expires_at = datetime.datetime.now() + datetime.timedelta(seconds=expires_in) if expires_in > 0 else None

        async with self._create_user_lock:
@@ -140,7 +565,7 @@ class UserService:

            if existing_user:
                # Update existing user's tokens
-                await self.ap.persistence_mgr.execute_async(
+                await self._identity_execute(
                    sqlalchemy.update(user.User)
                    .where(user.User.space_account_uuid == space_account_uuid)
                    .values(
@@ -148,19 +573,56 @@ class UserService:
                        space_refresh_token=refresh_token,
                        space_api_key=api_key,
                        space_access_token_expires_at=expires_at,
-                    )
+                    ),
+                    f'space:{space_account_uuid}',
                )
-                await self.ap.provider_service.update_space_model_provider_api_keys(api_key)
+                await self._update_space_provider_for_account(existing_user, api_key)
                return await self.get_user_by_space_account_uuid(space_account_uuid)

            # Check if user with same email exists
            existing_email_user = await self.get_user_by_email(email)
            if existing_email_user:
-                # Update existing user to link with Space account
+                # Email is display/contact identity, not an OAuth subject. An
+                # unknown Space subject must never take over an existing local
+                # Account merely by presenting the same email. The Account
+                # owner must first authenticate locally and use the explicit,
+                # account-bound bind flow.
+                raise account_errors.SpaceAccountBindingRequiredError()
+
+            # Check if system is already initialized
+            is_initialized = await self.is_initialized()
+            if is_initialized:
+                raise account_errors.SpaceAccountNotRegisteredError()
+
+            # Create new Space user (first time initialization)
+            if hasattr(self.ap.persistence_mgr, 'get_db_engine') and hasattr(self.ap, 'workspace_service'):
+                async with self._session_factory()() as session:
+                    async with session.begin():
+                        account = user.User(
+                            uuid=str(uuid.uuid4()),
+                            user=normalize_email(email),
+                            normalized_email=normalize_email(email),
+                            password='',
+                            account_type='space',
+                            status=user.AccountStatus.ACTIVE.value,
+                            source=user.AccountSource.LOCAL.value,
+                            projection_revision=0,
+                            space_account_uuid=space_account_uuid,
+                            space_access_token=access_token,
+                            space_refresh_token=refresh_token,
+                            space_api_key=api_key,
+                            space_access_token_expires_at=expires_at,
+                        )
+                        session.add(account)
+                        await session.flush()
+                        await self.ap.workspace_service.bootstrap_local_account(account.uuid, session=session)
+            else:
+                # Compatibility path for lightweight service tests without a real engine.
                await self.ap.persistence_mgr.execute_async(
-                    sqlalchemy.update(user.User)
-                    .where(user.User.user == email)
-                    .values(
+                    sqlalchemy.insert(user.User).values(
+                        user=normalize_email(email),
+                        normalized_email=normalize_email(email),
+                        password='',
                        account_type='space',
                        space_account_uuid=space_account_uuid,
                        space_access_token=access_token,
@@ -169,30 +631,56 @@ class UserService:
                        space_access_token_expires_at=expires_at,
                    )
                )
-                await self.ap.provider_service.update_space_model_provider_api_keys(api_key)
-                return await self.get_user_by_email(email)
+            created_user = await self.get_user_by_space_account_uuid(space_account_uuid)
+            if created_user is not None:
+                await self._update_space_provider_for_account(created_user, api_key)
+            return created_user

-            # Check if system is already initialized
-            is_initialized = await self.is_initialized()
-            if is_initialized:
-                raise account_errors.AccountEmailMismatchError()
+    async def _update_projected_space_user(
+        self,
+        *,
+        space_account_uuid: str,
+        email: str,
+        access_token: str,
+        refresh_token: str,
+        api_key: str,
+        expires_in: int,
+    ) -> user.User:
+        """Attach OAuth credentials to an already projected Cloud Account."""

-            # Create new Space user (first time initialization)
-            await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.insert(user.User).values(
-                    user=email,
-                    password='',  # Space users don't have local password
-                    account_type='space',
-                    space_account_uuid=space_account_uuid,
+        normalized_email = normalize_email(email)
+        expires_at = datetime.datetime.now() + datetime.timedelta(seconds=expires_in) if expires_in > 0 else None
+        async with self._create_user_lock:
+            projected = await self.get_user_by_space_account_uuid(space_account_uuid)
+            if (
+                projected is None
+                or projected.uuid != space_account_uuid
+                or projected.normalized_email != normalized_email
+                or projected.source != user.AccountSource.CLOUD_PROJECTION.value
+                or projected.account_type != 'space'
+            ):
+                raise ControlPlaneDirectoryRequiredError('Space Account is not present in the verified Cloud directory')
+            self._require_active_account(projected)
+            await self._identity_execute(
+                sqlalchemy.update(user.User)
+                .where(
+                    user.User.uuid == projected.uuid,
+                    user.User.space_account_uuid == space_account_uuid,
+                    user.User.source == user.AccountSource.CLOUD_PROJECTION.value,
+                )
+                .values(
                    space_access_token=access_token,
                    space_refresh_token=refresh_token,
                    space_api_key=api_key,
                    space_access_token_expires_at=expires_at,
-                )
+                ),
+                f'space:{space_account_uuid}',
            )
-            await self.ap.provider_service.update_space_model_provider_api_keys(api_key)
-
-            return await self.get_user_by_space_account_uuid(space_account_uuid)
+            refreshed = await self.get_user_by_space_account_uuid(space_account_uuid)
+            if refreshed is None:
+                raise ControlPlaneDirectoryRequiredError('Space Account disappeared from the verified Cloud directory')
+            self._require_active_account(refreshed)
+            return refreshed

    async def authenticate_space_user(
        self, access_token: str, refresh_token: str, expires_in: int = 0
@@ -221,15 +709,44 @@ class UserService:
        )

        # Generate JWT token
-        jwt_token = await self.generate_jwt_token(email)
+        jwt_token = await self.generate_jwt_token(user_obj)

        return jwt_token, user_obj

    async def get_first_user(self) -> user.User | None:
        """Get the first user (for single-user mode)"""
-        result = await self.ap.persistence_mgr.execute_async(sqlalchemy.select(user.User).limit(1))
-        result_list = result.all()
-        return result_list[0] if result_list else None
+        return await self._identity_scalar(
+            sqlalchemy.select(user.User).limit(1),
+            f'instance:{self._jwt_identity()[1]}',
+        )
+
+    async def _identity_scalar(
+        self,
+        statement: typing.Any,
+        identity: str,
+    ) -> user.User | None:
+        """Execute one exact Account lookup in an explicit discovery transaction."""
+
+        digest = hashlib.sha256(identity.encode('utf-8')).hexdigest()
+        current_session = getattr(self.ap.persistence_mgr, 'current_session', lambda: None)
+        identity_uow = getattr(self.ap.persistence_mgr, 'identity_discovery_uow', None)
+        if current_session() is None and callable(identity_uow):
+            async with identity_uow(digest) as discovery:
+                return await discovery.session.scalar(statement)
+        result = await self.ap.persistence_mgr.execute_async(statement)
+        rows = result.all()
+        return rows[0] if rows else None
+
+    async def _identity_execute(self, statement: typing.Any, identity: str) -> typing.Any:
+        """Execute one exact Account mutation in an explicit transaction."""
+
+        digest = hashlib.sha256(identity.encode('utf-8')).hexdigest()
+        current_session = getattr(self.ap.persistence_mgr, 'current_session', lambda: None)
+        identity_uow = getattr(self.ap.persistence_mgr, 'identity_discovery_uow', None)
+        if current_session() is None and callable(identity_uow):
+            async with identity_uow(digest) as discovery:
+                return await discovery.session.execute(statement)
+        return await self.ap.persistence_mgr.execute_async(statement)

    async def set_password(self, user_email: str, new_password: str, current_password: str | None = None) -> None:
        """Set or change password for a user"""
@@ -246,12 +763,19 @@ class UserService:
            await self._verify_password(user_obj.password, current_password)

        hashed_password = await self._hash_password(new_password)
-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.update(user.User).where(user.User.user == user_email).values(password=hashed_password)
+        normalized_email = normalize_email(user_email)
+        await self._identity_execute(
+            sqlalchemy.update(user.User)
+            .where(user.User.normalized_email == normalized_email)
+            .values(password=hashed_password),
+            f'email:{normalized_email}',
        )

    async def bind_space_account(self, user_email: str, code: str) -> user.User:
        """Bind Space account to existing local account"""
+        local_account = await self.get_user_by_email(user_email)
+        if local_account is None:
+            raise ValueError('User not found')
        # Exchange code for tokens
        token_data = await self.ap.space_service.exchange_oauth_code(code)
        access_token = token_data.get('access_token')
@@ -273,28 +797,33 @@ class UserService:

        if not space_account_uuid or not space_email:
            raise ValueError('Invalid Space user info')
+        if normalize_email(space_email) != normalize_email(user_email):
+            raise account_errors.AccountEmailMismatchError()

        # Check if this Space account is already bound to another user
        existing_space_user = await self.get_user_by_space_account_uuid(space_account_uuid)
-        if existing_space_user and existing_space_user.user != user_email:
+        if existing_space_user and existing_space_user.normalized_email != normalize_email(user_email):
            raise ValueError('This Space account is already bound to another user')

        # Update local account to Space account
-        await self.ap.persistence_mgr.execute_async(
+        normalized_email = normalize_email(user_email)
+        await self._identity_execute(
            sqlalchemy.update(user.User)
-            .where(user.User.user == user_email)
+            .where(user.User.normalized_email == normalized_email)
            .values(
-                user=space_email,  # Update email to Space email
+                user=normalize_email(space_email),  # Update email to Space email
+                normalized_email=normalize_email(space_email),
                account_type='space',
                space_account_uuid=space_account_uuid,
                space_access_token=access_token,
                space_refresh_token=refresh_token,
                space_api_key=api_key,
                space_access_token_expires_at=expires_at,
-            )
+            ),
+            f'email:{normalized_email}',
        )

        # Update Space model provider API keys
-        await self.ap.provider_service.update_space_model_provider_api_keys(api_key)
+        await self._update_space_provider_for_account(local_account, api_key)

        return await self.get_user_by_email(space_email)
@@ -4,6 +4,12 @@ import sqlalchemy

 from ....core import app
 from ....entity.persistence import webhook
+from .secrets import SECRET_MASK, mask_secret_value, restore_secret_placeholders
+from .tenant import TenantContext, require_workspace_uuid, scope_statement
+
+
+_DEFAULT_MAX_WEBHOOKS_PER_WORKSPACE = 16
+_HARD_MAX_WEBHOOKS_PER_WORKSPACE = 64


 class WebhookService:
@@ -12,31 +18,99 @@ class WebhookService:
    def __init__(self, ap: app.Application) -> None:
        self.ap = ap

-    async def get_webhooks(self) -> list[dict]:
+    def max_per_workspace(self) -> int:
+        """Return the configured webhook cap within the process hard limit."""
+
+        config = getattr(getattr(self.ap, 'instance_config', None), 'data', {})
+        try:
+            value = int(
+                config.get('webhooks', {}).get(
+                    'max_per_workspace',
+                    _DEFAULT_MAX_WEBHOOKS_PER_WORKSPACE,
+                )
+            )
+        except (AttributeError, TypeError, ValueError):
+            value = _DEFAULT_MAX_WEBHOOKS_PER_WORKSPACE
+        return min(max(value, 1), _HARD_MAX_WEBHOOKS_PER_WORKSPACE)
+
+    def _serialize_webhook(self, entity, *, include_secret: bool) -> dict:
+        serialized = self.ap.persistence_mgr.serialize_model(webhook.Webhook, entity)
+        if not include_secret:
+            serialized = serialized.copy()
+            serialized['url'] = mask_secret_value(serialized.get('url'))
+        return serialized
+
+    async def get_webhooks(self, context: TenantContext, *, include_secret: bool = False) -> list[dict]:
        """Get all webhooks"""
-        result = await self.ap.persistence_mgr.execute_async(sqlalchemy.select(webhook.Webhook))
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.select(webhook.Webhook).order_by(webhook.Webhook.id).limit(_HARD_MAX_WEBHOOKS_PER_WORKSPACE),
+                webhook.Webhook,
+                context,
+            )
+        )

        webhooks = result.all()
-        return [self.ap.persistence_mgr.serialize_model(webhook.Webhook, wh) for wh in webhooks]
+        return [self._serialize_webhook(wh, include_secret=include_secret) for wh in webhooks]

-    async def create_webhook(self, name: str, url: str, description: str = '', enabled: bool = True) -> dict:
+    async def create_webhook(
+        self,
+        context: TenantContext,
+        name: str,
+        url: str,
+        description: str = '',
+        enabled: bool = True,
+    ) -> dict:
        """Create a new webhook"""
-        webhook_data = {'name': name, 'url': url, 'description': description, 'enabled': enabled}
+        workspace_uuid = require_workspace_uuid(context)
+        max_webhooks = self.max_per_workspace()
+        count_result = await self.ap.persistence_mgr.execute_async(
+            sqlalchemy.select(sqlalchemy.func.count())
+            .select_from(webhook.Webhook)
+            .where(webhook.Webhook.workspace_uuid == workspace_uuid)
+        )
+        if (count_result.scalar() or 0) >= max_webhooks:
+            raise ValueError(f'Maximum number of webhooks ({max_webhooks}) reached')

-        await self.ap.persistence_mgr.execute_async(sqlalchemy.insert(webhook.Webhook).values(**webhook_data))
+        url = restore_secret_placeholders(url, sensitive=True)
+        webhook_data = {
+            'workspace_uuid': workspace_uuid,
+            'name': name,
+            'url': url,
+            'description': description,
+            'enabled': enabled,
+        }
+
+        insert_result = await self.ap.persistence_mgr.execute_async(
+            sqlalchemy.insert(webhook.Webhook).values(**webhook_data)
+        )

        # Retrieve the created webhook
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(webhook.Webhook).where(webhook.Webhook.url == url).order_by(webhook.Webhook.id.desc())
+            scope_statement(
+                sqlalchemy.select(webhook.Webhook).where(webhook.Webhook.id == insert_result.inserted_primary_key[0]),
+                webhook.Webhook,
+                workspace_uuid,
+            )
        )
        created_webhook = result.first()

        return self.ap.persistence_mgr.serialize_model(webhook.Webhook, created_webhook)

-    async def get_webhook(self, webhook_id: int) -> dict | None:
+    async def get_webhook(
+        self,
+        context: TenantContext,
+        webhook_id: int,
+        *,
+        include_secret: bool = False,
+    ) -> dict | None:
        """Get a specific webhook by ID"""
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(webhook.Webhook).where(webhook.Webhook.id == webhook_id)
+            scope_statement(
+                sqlalchemy.select(webhook.Webhook).where(webhook.Webhook.id == webhook_id),
+                webhook.Webhook,
+                context,
+            )
        )

        wh = result.first()
@@ -44,16 +118,27 @@ class WebhookService:
        if wh is None:
            return None

-        return self.ap.persistence_mgr.serialize_model(webhook.Webhook, wh)
+        return self._serialize_webhook(wh, include_secret=include_secret)

    async def update_webhook(
-        self, webhook_id: int, name: str = None, url: str = None, description: str = None, enabled: bool = None
-    ) -> None:
+        self,
+        context: TenantContext,
+        webhook_id: int,
+        name: str | None = None,
+        url: str | None = None,
+        description: str | None = None,
+        enabled: bool | None = None,
+    ) -> bool:
        """Update a webhook's metadata"""
        update_data = {}
        if name is not None:
            update_data['name'] = name
        if url is not None:
+            if url == SECRET_MASK:
+                current = await self.get_webhook(context, webhook_id, include_secret=True)
+                if current is None:
+                    return False
+                url = restore_secret_placeholders(url, current.get('url'), sensitive=True)
            update_data['url'] = url
        if description is not None:
            update_data['description'] = description
@@ -61,20 +146,37 @@ class WebhookService:
            update_data['enabled'] = enabled

        if update_data:
-            await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.update(webhook.Webhook).where(webhook.Webhook.id == webhook_id).values(**update_data)
+            result = await self.ap.persistence_mgr.execute_async(
+                scope_statement(
+                    sqlalchemy.update(webhook.Webhook).where(webhook.Webhook.id == webhook_id).values(**update_data),
+                    webhook.Webhook,
+                    context,
+                )
            )
+            return (result.rowcount or 0) > 0
+        return await self.get_webhook(context, webhook_id) is not None

-    async def delete_webhook(self, webhook_id: int) -> None:
+    async def delete_webhook(self, context: TenantContext, webhook_id: int) -> bool:
        """Delete a webhook"""
-        await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.delete(webhook.Webhook).where(webhook.Webhook.id == webhook_id)
+        result = await self.ap.persistence_mgr.execute_async(
+            scope_statement(
+                sqlalchemy.delete(webhook.Webhook).where(webhook.Webhook.id == webhook_id),
+                webhook.Webhook,
+                context,
+            )
        )
+        return (result.rowcount or 0) > 0

-    async def get_enabled_webhooks(self) -> list[dict]:
+    async def get_enabled_webhooks(self, context: TenantContext) -> list[dict]:
        """Get all enabled webhooks"""
        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(webhook.Webhook).where(webhook.Webhook.enabled == True)
+            scope_statement(
+                sqlalchemy.select(webhook.Webhook).where(webhook.Webhook.enabled == True),
+                webhook.Webhook,
+                context,
+            )
+            .order_by(webhook.Webhook.id)
+            .limit(self.max_per_workspace())
        )

        webhooks = result.all()
@@ -0,0 +1,30 @@
+from __future__ import annotations
+
+import contextvars
+
+from ..http.context import RequestContext
+
+
+_request_context: contextvars.ContextVar[RequestContext | None] = contextvars.ContextVar(
+    'langbot_mcp_request_context',
+    default=None,
+)
+
+
+def bind_request_context(context: RequestContext) -> contextvars.Token[RequestContext | None]:
+    """Bind the authenticated MCP request while its ASGI request is executing."""
+
+    return _request_context.set(context)
+
+
+def reset_request_context(token: contextvars.Token[RequestContext | None]) -> None:
+    _request_context.reset(token)
+
+
+def get_request_context() -> RequestContext:
+    """Return the current trusted MCP context or fail closed."""
+
+    context = _request_context.get()
+    if context is None:
+        raise RuntimeError('MCP Workspace context is unavailable')
+    return context
--- a/Show More
+++ b/Show More