Feat/saas sandbox adaptation (#2234)

* fix(box): trust Box-reported skill paths when filesystem is not shared In separated deployments (Docker Compose, k8s sidecar, --standalone-box, remote runtime.endpoint) the Box runtime owns its own filesystem, so the skill package_root it reports via list_skills is not resolvable on the LangBot side. LangBot's reload_skills and build_skill_extra_mounts validated those paths with os.path.isdir() against its own filesystem, which silently dropped every skill in such deployments — breaking the sandbox skill feature for the nsjail/SaaS backend. Add BoxService.shares_filesystem_with_box, derived from the connector transport (stdio = shared, WebSocket = separated), with an explicit override seam for tests/embedders. Gate both isdir() guards on it: keep local validation in shared-fs stdio mode, trust Box-reported paths otherwise. The Box runtime only reports skills found on its own filesystem, so those paths are valid there by construction. Adds topology-derivation tests (real connector, no mocks) and skill-retention tests for both shared and separated filesystems. * build(docker): ship a self-contained nsjail sandbox backend in the image Compile nsjail 3.6 from source in a dedicated multi-stage build and carry only the binary plus its runtime libs (libprotobuf32, libnl-route-3-200) into the final image. This lets the Box runtime isolate sandboxed code via nsjail user/mount/pid/net namespaces without a host Docker socket — the prerequisite for running Box on LangBot Cloud (k8s), where mounting docker.sock would grant node root and is not acceptable for multi-tenant. The build toolchain (build-essential/bison/flex/protobuf-dev/libnl-dev) stays in the nsjail-build stage and is not present in the shipped image. Verified: image builds (583MB), nsjail --help exits 0, libraries resolve, and the real NsjailBackend executes an isolated command end-to-end on a v6.1/cgroup2 host matching LangBot Cloud prod (rlimit fallback path, since container /sys/fs/cgroup is read-only; PID-namespace isolation confirmed). * feat(box): SaaS guard to force a single global sandbox scope Add system.limitation.force_box_session_id_template: when non-empty it overrides every pipeline's box-session-id-template at resolve time, pinning all queries to one shared sandbox (e.g. {global}). This is the authoritative, unbypassable guard — it runs on every exec call, so editing the pipeline config via API cannot escape it. The web UI locks the Sandbox Scope selector via a combined box_scope_editable flag (box available AND not forced). * build(deps): pin langbot-plugin==0.4.2b1 (nsjail cgroup container-safety beta) * fix(web): show forced sandbox scope + make disabled tooltip tap-friendly When a SaaS deployment pins every pipeline to a fixed sandbox scope via system.limitation.force_box_session_id_template, the Sandbox Scope selector was correctly locked but still displayed the pipeline's stored value (e.g. the per-chat default), misrepresenting the scope that the runtime actually enforces on every exec. Coerce the displayed/saved value to the forced template so the locked selector truthfully shows the active scope (e.g. Global). Also fix the disabled_tooltip being invisible on touch devices: hover-only Radix tooltips never open without a pointer, so the explanation of why the field is locked could not be read on mobile. Wrap the info icon so a tap toggles the tooltip while desktop hover still works. * feat(web): hide sidebar new-version prompt for edition=cloud Cloud instances are upgraded centrally by the operator, so surfacing a GitHub 'new version available' badge to tenants is misleading and actionable only by the operator. Skip the release check entirely when edition=cloud. * style(web): prettier formatting for DisabledTooltipIcon ternary * chore(deps): bump langbot-plugin to 0.4.2b2 Picks up the SDK fix that creates a read-write host_path before the nsjail bind-mount, fixing the SaaS MCP shared-workspace sandbox failure (exec exit 255 with empty output when host_path didn't exist). * chore(deps): bump langbot-plugin to 0.4.2b3 Picks up the nsjail /dev-node fix so stdio MCP servers (uvx-launched) can start under force_global_sandbox instead of failing with 'Connection closed / please check URL'. * fix(web): show real MCP runtime status on installed extensions list The installed-extensions list badge keyed solely off the enable flag, so a server that was still CONNECTING (or in ERROR) was shown as 'Connected'. Reflect the actual runtime_info.status (connecting/connected/error/disabled) with matching colors, and poll quietly every 3s while any MCP server is connecting so the badge transitions without a manual refresh. * chore(deps): bump langbot-plugin to 0.4.2b4 Picks up the 30s start_managed_process timeout so cold uvx MCP bootstraps don't get torn down mid-install. * style(web): satisfy prettier — parenthesize nullish-coalescing in ternary * fix(mcp): isolate transient test sessions from the shared Box session A config-page 'test' (server_name='_', no persisted UUID) ran in the same shared 'mcp-shared' Box session as live MCP servers. A failing test (e.g. empty args) churned that shared session and tore down healthy, already- connected servers — leaving them stuck after exhausting their retries. Mark UUID-less sessions as transient, give them their own isolated Box session ('mcp-test-<uuid>'), and fully delete that session on cleanup so tests can never disturb live servers and don't leak sessions. * fix(mcp): tear down transient test session after test completes A successful config-page test left its isolated 'mcp-test-<uuid>' Box session running (the lifecycle task blocks until shutdown). Wrap the transient test coroutine so it always shuts the session down afterward, preventing isolated test sessions from leaking.
2026-06-13 01:06:03 +00:00 · 2026-06-09 19:30:17 +08:00
parent 47fe9bde03
commit 8e558ad3a1
20 changed files with 579 additions and 87 deletions
--- a/src/langbot/pkg/api/http/service/mcp.py
+++ b/src/langbot/pkg/api/http/service/mcp.py
@@ -152,7 +152,24 @@ class MCPService:
                coroutine = runtime_mcp_session.refresh()
        else:
            runtime_mcp_session = await self.ap.tool_mgr.mcp_tool_loader.load_mcp_server(server_config=server_data)
-            coroutine = runtime_mcp_session.start()
+
+            # A transient test owns an isolated Box session. Always tear it down
+            # after the test completes (success or failure) so it does not leak.
+            test_session = runtime_mcp_session
+
+            async def _run_and_cleanup() -> None:
+                try:
+                    await test_session.start()
+                finally:
+                    try:
+                        await test_session.shutdown()
+                    except Exception as exc:
+                        self.ap.logger.warning(
+                            f'Failed to tear down transient MCP test session '
+                            f'{test_session.server_name}: {type(exc).__name__}: {exc}'
+                        )
+
+            coroutine = _run_and_cleanup()

        ctx = taskmgr.TaskContext.new()
        wrapper = self.ap.task_mgr.create_user_task(
--- a/src/langbot/pkg/box/connector.py
+++ b/src/langbot/pkg/box/connector.py
@@ -120,13 +120,19 @@ class BoxRuntimeConnector(ManagedRuntimeConnector):
        self._relay_port = parsed.port or _DEFAULT_PORT
        self._filtered_box_config = _filter_config_for_runtime(_get_box_config(ap))

-    def _uses_websocket(self) -> bool:
+    def uses_websocket(self) -> bool:
        """Whether the connector should use WebSocket to reach the Box runtime.

        True when:
          - Running inside Docker (Box runtime is a separate container)
          - The ``--standalone-box`` CLI flag was passed
          - An explicit ``runtime.endpoint`` was configured
+
+        When this is True the Box runtime lives in a separate process with its
+        own filesystem view (container, pod sidecar, or remote host), so paths
+        it reports (e.g. skill ``package_root``) are NOT resolvable on the
+        LangBot side. When False, Box runs as a stdio child process that shares
+        LangBot's filesystem.
        """
        return bool(
            self.configured_runtime_endpoint
@@ -134,6 +140,10 @@ class BoxRuntimeConnector(ManagedRuntimeConnector):
            or platform.use_websocket_to_connect_box_runtime()
        )

+    # Backwards-compatible private alias.
+    def _uses_websocket(self) -> bool:
+        return self.uses_websocket()
+
    async def initialize(self) -> None:
        if self._uses_websocket():
            if platform.get_platform() == 'win32' and not self.configured_runtime_endpoint:
--- a/src/langbot/pkg/box/service.py
+++ b/src/langbot/pkg/box/service.py
@@ -67,6 +67,10 @@ class BoxService:
        self._available = False
        self._connector_error: str = ''
        self._reconnecting = False
+        # Optional explicit override for shares_filesystem_with_box. None means
+        # "derive from the connector transport". Set by tests / embedders that
+        # know the real LangBot<->Box filesystem topology.
+        self._shares_filesystem_with_box_override: bool | None = None

    @property
    def enabled(self) -> bool:
@@ -148,6 +152,32 @@ class BoxService:
    def available(self) -> bool:
        return self._available

+    @property
+    def shares_filesystem_with_box(self) -> bool:
+        """Whether LangBot and the Box runtime share a filesystem view.
+
+        This is True only when Box runs as a local stdio child process of
+        LangBot (same container/host). In that case paths the Box runtime
+        reports — notably skill ``package_root`` — resolve identically on the
+        LangBot side, so LangBot may validate them against its own filesystem.
+
+        It is False for every separated deployment (Docker Compose, k8s
+        sidecar, ``--standalone-box``, or an explicit ``runtime.endpoint``),
+        where the Box runtime owns its own filesystem and LangBot must trust
+        the paths it reports rather than checking them locally.
+
+        When Box is wired up with an injected client (tests, custom embeds)
+        there is no connector to introspect; we conservatively report False so
+        LangBot never wrongly drops Box-reported skills. An explicit override
+        can be set via ``_shares_filesystem_with_box`` (used by tests and any
+        embedder that knows the real topology).
+        """
+        if self._shares_filesystem_with_box_override is not None:
+            return self._shares_filesystem_with_box_override
+        if self._runtime_connector is None:
+            return False
+        return not self._runtime_connector.uses_websocket()
+
    async def execute_spec_payload(
        self,
        spec_payload: dict,
@@ -191,13 +221,25 @@ class BoxService:
        return self._serialize_result(result)

    def resolve_box_session_id(self, query: pipeline_query.Query) -> str:
-        """Resolve the Box session_id from the pipeline's template and query variables."""
-        template = (
-            (query.pipeline_config or {})
-            .get('ai', {})
-            .get('local-agent', {})
-            .get('box-session-id-template', '{launcher_type}_{launcher_id}')
-        )
+        """Resolve the Box session_id from the pipeline's template and query variables.
+
+        When ``system.limitation.force_box_session_id_template`` is set to a
+        non-empty value, that template overrides whatever the pipeline
+        configured. This is the authoritative SaaS guard: it runs on every
+        ``exec`` call, so a tenant cannot escape a single shared sandbox even
+        by editing the pipeline config directly through the API (which only
+        gates the web UI).
+        """
+        forced_template = self._forced_box_session_id_template()
+        if forced_template:
+            template = forced_template
+        else:
+            template = (
+                (query.pipeline_config or {})
+                .get('ai', {})
+                .get('local-agent', {})
+                .get('box-session-id-template', '{launcher_type}_{launcher_id}')
+            )
        variables = dict(query.variables or {})
        launcher_type = getattr(query, 'launcher_type', None)
        if hasattr(launcher_type, 'value'):
@@ -220,14 +262,24 @@ class BoxService:
        all skill packages mounted, regardless of which skill is currently
        activated.

-        Skills whose ``package_root`` is missing or no longer a directory on
-        the LangBot-visible filesystem are skipped with a warning instead of
-        being passed through to the backend. Without this guard the three
-        backends behave inconsistently on a stale mount: nsjail refuses to
-        start the sandbox (failing every exec in the session), Docker
-        silently auto-creates a root-owned empty directory on the host, and
-        E2B silently skips the upload — none of which surfaces an
-        actionable error to the agent or operator.
+        Path validation is filesystem-topology dependent. When LangBot and the
+        Box runtime share a filesystem (local stdio mode), a skill whose
+        ``package_root`` is missing or no longer a directory is skipped with a
+        warning instead of being passed through to the backend. Without that
+        guard the three backends behave inconsistently on a stale mount: nsjail
+        refuses to start the sandbox (failing every exec in the session),
+        Docker silently auto-creates a root-owned empty directory on the host,
+        and E2B silently skips the upload — none of which surfaces an
+        actionable error.
+
+        When Box runs as a separate process (Docker Compose, k8s sidecar,
+        ``--standalone-box``, or a remote ``runtime.endpoint``), the
+        ``package_root`` reported by ``list_skills`` is the Box runtime's own
+        filesystem path and is NOT resolvable on the LangBot side. Validating
+        it locally would wrongly drop every skill, so LangBot trusts the path
+        and lets the Box runtime resolve it. The Box runtime only ever reports
+        skills it discovered on its own filesystem, so the path is valid there
+        by construction.
        """
        skill_mgr = getattr(self.ap, 'skill_mgr', None)
        if skill_mgr is None:
@@ -235,13 +287,15 @@ class BoxService:

        from ..provider.tools.loaders import skill as skill_loader

+        validate_locally = self.shares_filesystem_with_box
+
        visible_skills = skill_loader.get_visible_skills(self.ap, query)
        mounts: list[dict] = []
        for skill_name, skill_data in visible_skills.items():
            package_root = str(skill_data.get('package_root', '') or '').strip()
            if not package_root:
                continue
-            if not os.path.isdir(package_root):
+            if validate_locally and not os.path.isdir(package_root):
                self.ap.logger.warning(
                    f'Skill "{skill_name}" package_root missing on filesystem '
                    f'({package_root}); skipping mount to prevent sandbox failures. '
@@ -564,6 +618,20 @@ class BoxService:
        raw = str(self._local_config().get('image', '') or '').strip()
        return raw or None

+    def _forced_box_session_id_template(self) -> str:
+        """Return the SaaS-forced sandbox-scope template, or '' when unset.
+
+        Read from ``system.limitation.force_box_session_id_template``. A
+        non-empty value pins every pipeline to a single sandbox scope
+        (e.g. ``'{global}'``) and cannot be overridden per-pipeline.
+        """
+        limitation = (
+            (self.ap.instance_config.data or {}).get('system', {}).get('limitation', {})
+            if getattr(self.ap, 'instance_config', None) is not None
+            else {}
+        )
+        return str(limitation.get('force_box_session_id_template', '') or '').strip()
+
    def _load_workspace_quota_mb(self) -> int | None:
        raw_value = self._local_config().get('workspace_quota_mb')
        if raw_value in (None, ''):
--- a/src/langbot/pkg/provider/tools/loaders/mcp.py
+++ b/src/langbot/pkg/provider/tools/loaders/mcp.py
@@ -73,6 +73,13 @@ class RuntimeMCPSession:
        self.enable = enable
        self.session = None

+        # Transient test sessions (created from the config page "test" button,
+        # which carry no persisted server UUID) must NOT share the live
+        # "mcp-shared" Box session. Otherwise a failing test churns the shared
+        # session and tears down healthy, already-connected servers. Callers
+        # flag these via server_config['_transient'] = True.
+        self.is_transient = bool(server_config.get('_transient', False))
+
        self.exit_stack = AsyncExitStack()
        self.functions = []

@@ -402,6 +409,11 @@ class RuntimeMCPSession:
        return self._box_stdio_runtime.uses_box_stdio()

    def _build_box_session_id(self) -> str:
+        # Transient test sessions get their own isolated Box session so a
+        # failing/short-lived test can never disturb the shared session that
+        # hosts live, already-connected MCP servers.
+        if self.is_transient:
+            return f'mcp-test-{self.server_uuid}'
        return 'mcp-shared'

    def _rewrite_path(self, path: str, host_path: str | None) -> str:
@@ -503,10 +515,14 @@ class MCPLoader(loader.ToolLoader):
                - extra_args: 额外的配置参数 (可选)
        """
        uuid_ = server_config.get('uuid')
+        is_transient = False
        if not uuid_:
            self.ap.logger.warning('Server UUID is None for MCP server, maybe testing in the config page.')
            uuid_ = str(uuid_module.uuid4())
            server_config['uuid'] = uuid_
+            # No persisted UUID => this is a throwaway "test" session from the
+            # config page. Isolate it from the shared live Box session.
+            is_transient = True

        name = server_config['name']
        uuid = server_config['uuid']
@@ -519,6 +535,7 @@ class MCPLoader(loader.ToolLoader):
            'uuid': uuid,
            'mode': mode,
            'enable': enable,
+            '_transient': is_transient,
            **extra_args,
        }

--- a/src/langbot/pkg/provider/tools/loaders/mcp_stdio.py
+++ b/src/langbot/pkg/provider/tools/loaders/mcp_stdio.py
@@ -293,10 +293,25 @@ class BoxStdioSessionRuntime:
        if not self.uses_box_stdio():
            return

+        workspace = self._build_workspace(host_path=None)
+
+        # Transient test sessions own their isolated Box session, so tear the
+        # whole session down rather than leaking it. This cannot affect live
+        # servers because they live in the separate shared session.
+        if getattr(self.owner, 'is_transient', False):
+            try:
+                await workspace.cleanup()
+            except Exception as exc:
+                self.ap.logger.warning(
+                    f'MCP server {self.server_name}: failed to delete transient test session '
+                    f'{self.owner._build_box_session_id()}: {type(exc).__name__}: {exc}'
+                )
+            await self._cleanup_staged_workspace()
+            return
+
        # In the shared-session model, we do not delete the session itself.
        # Stop only this MCP server's managed process; deleting the session
        # would kill other MCP servers sharing the same container.
-        workspace = self._build_workspace(host_path=None)
        try:
            await workspace.stop_managed_process(self.process_id)
        except Exception as exc:
--- a/src/langbot/pkg/skill/manager.py
+++ b/src/langbot/pkg/skill/manager.py
@@ -46,6 +46,13 @@ class SkillManager:
            self.ap.logger.info('Box runtime unavailable; skill cache is empty.')
            return

+        # LangBot may only validate Box-reported paths against its own
+        # filesystem when the two share one (local stdio mode). In separated
+        # deployments (Docker Compose, k8s sidecar, --standalone-box, remote
+        # endpoint) the package_root lives on the Box runtime's filesystem and
+        # is not resolvable here, so we trust what Box reports.
+        validate_locally = bool(getattr(box_service, 'shares_filesystem_with_box', False))
+
        try:
            dropped = 0
            for skill_data in await box_service.list_skills():
@@ -53,7 +60,7 @@ class SkillManager:
                if not skill_name:
                    continue
                package_root = str(skill_data.get('package_root', '') or '').strip()
-                if package_root and not os.path.isdir(package_root):
+                if validate_locally and package_root and not os.path.isdir(package_root):
                    self.ap.logger.warning(
                        f'Skill "{skill_name}" reported by Box runtime but '
                        f'package_root missing on LangBot filesystem '
--- a/src/langbot/templates/config.yaml
+++ b/src/langbot/templates/config.yaml
@@ -25,6 +25,12 @@ system:
        max_bots: -1
        max_pipelines: -1
        max_extensions: -1
+        # When set to a non-empty string, every pipeline is forced to use this
+        # Box sandbox-scope template regardless of its own configuration, and
+        # the per-pipeline "Sandbox Scope" selector is locked in the web UI.
+        # Used by SaaS deployments to confine a tenant to a single shared
+        # sandbox (set to '{global}'). Empty string = no restriction.
+        force_box_session_id_template: ''
    task_retention:
        # Keep at most this many completed async task records in memory
        completed_limit: 200
--- a/src/langbot/templates/metadata/pipeline/ai.yaml
+++ b/src/langbot/templates/metadata/pipeline/ai.yaml
@@ -152,21 +152,22 @@ stages:
          es_ES: Determina cómo se comparten los entornos sandbox entre mensajes.
          ru_RU: Определяет, как песочницы используются совместно между сообщениями.
        disable_if:
-          field: __system.box_available
+          field: __system.box_scope_editable
          operator: eq
          value: false
        disabled_tooltip:
          en_US: >-
-            Box sandbox is disabled or unavailable. Enable it in config.yaml
-            (box.enabled = true) and ensure the runtime is reachable to change
-            this setting.
-          zh_Hans: Box 沙箱已禁用或不可用。请在配置中启用（box.enabled = true）并确认运行时连接正常，才能修改此项。
-          zh_Hant: Box 沙箱已停用或無法使用。請在設定中啟用（box.enabled = true）並確認執行時連線正常，才能修改此項。
-          ja_JP: Box サンドボックスが無効または利用できません。設定で有効化（box.enabled = true）し、ランタイムが接続できることを確認してから変更してください。
-          vi_VN: Sandbox Box đã tắt hoặc không khả dụng. Hãy bật trong cấu hình (box.enabled = true) và đảm bảo runtime hoạt động để chỉnh sửa.
-          th_TH: Sandbox Box ถูกปิดใช้งานหรือไม่พร้อมใช้งาน กรุณาเปิดใช้งานในการตั้งค่า (box.enabled = true) และตรวจสอบว่ารันไทม์เชื่อมต่อปกติก่อนปรับค่า
-          es_ES: El sandbox de Box está desactivado o no disponible. Actívelo en la configuración (box.enabled = true) y asegúrese de que el runtime esté conectado para modificar este ajuste.
-          ru_RU: Песочница Box отключена или недоступна. Включите её в конфигурации (box.enabled = true) и убедитесь, что среда выполнения работает, чтобы изменить эту настройку.
+            Sandbox scope can't be changed: either the Box sandbox is disabled
+            or unavailable (enable it in config.yaml with box.enabled = true and
+            ensure the runtime is reachable), or this deployment pins all
+            pipelines to a fixed scope.
+          zh_Hans: "无法修改沙箱作用域：Box 沙箱已禁用或不可用（请在配置中启用 box.enabled = true 并确认运行时连接正常），或本部署已将所有流水线固定为统一作用域。"
+          zh_Hant: "無法修改沙箱作用域：Box 沙箱已停用或無法使用（請在設定中啟用 box.enabled = true 並確認執行時連線正常），或本部署已將所有流水線固定為統一作用域。"
+          ja_JP: "サンドボックススコープを変更できません：Box サンドボックスが無効/利用不可（設定で box.enabled = true にしてランタイム接続を確認）、またはこのデプロイがすべてのパイプラインを固定スコープに制限しています。"
+          vi_VN: "Không thể thay đổi phạm vi sandbox：Box sandbox bị tắt hoặc không khả dụng (bật box.enabled = true và đảm bảo runtime hoạt động), hoặc bản triển khai này cố định mọi pipeline về một phạm vi."
+          th_TH: "ไม่สามารถเปลี่ยนขอบเขต Sandbox：Box sandbox ถูกปิดหรือไม่พร้อมใช้งาน (เปิด box.enabled = true และตรวจสอบรันไทม์) หรือการ deploy นี้ล็อกทุก pipeline ไว้ที่ขอบเขตเดียว"
+          es_ES: "No se puede cambiar el alcance del sandbox: el sandbox de Box está desactivado o no disponible (actívelo con box.enabled = true y verifique el runtime), o este despliegue fija todas las pipelines a un alcance único."
+          ru_RU: "Невозможно изменить область песочницы: песочница Box отключена или недоступна (включите box.enabled = true и проверьте среду выполнения), либо это развёртывание фиксирует единую область для всех конвейеров."
        type: select
        required: false
        default: "{launcher_type}_{launcher_id}"