Files
LangBot/src/langbot/pkg/skill/manager.py
Junyan Chin 8e558ad3a1 Feat/saas sandbox adaptation (#2234)
* fix(box): trust Box-reported skill paths when filesystem is not shared

In separated deployments (Docker Compose, k8s sidecar, --standalone-box,
remote runtime.endpoint) the Box runtime owns its own filesystem, so the
skill package_root it reports via list_skills is not resolvable on the
LangBot side. LangBot's reload_skills and build_skill_extra_mounts
validated those paths with os.path.isdir() against its own filesystem,
which silently dropped every skill in such deployments — breaking the
sandbox skill feature for the nsjail/SaaS backend.

Add BoxService.shares_filesystem_with_box, derived from the connector
transport (stdio = shared, WebSocket = separated), with an explicit
override seam for tests/embedders. Gate both isdir() guards on it: keep
local validation in shared-fs stdio mode, trust Box-reported paths
otherwise. The Box runtime only reports skills found on its own
filesystem, so those paths are valid there by construction.

Adds topology-derivation tests (real connector, no mocks) and
skill-retention tests for both shared and separated filesystems.

* build(docker): ship a self-contained nsjail sandbox backend in the image

Compile nsjail 3.6 from source in a dedicated multi-stage build and carry
only the binary plus its runtime libs (libprotobuf32, libnl-route-3-200)
into the final image. This lets the Box runtime isolate sandboxed code via
nsjail user/mount/pid/net namespaces without a host Docker socket — the
prerequisite for running Box on LangBot Cloud (k8s), where mounting
docker.sock would grant node root and is not acceptable for multi-tenant.

The build toolchain (build-essential/bison/flex/protobuf-dev/libnl-dev)
stays in the nsjail-build stage and is not present in the shipped image.

Verified: image builds (583MB), nsjail --help exits 0, libraries resolve,
and the real NsjailBackend executes an isolated command end-to-end on a
v6.1/cgroup2 host matching LangBot Cloud prod (rlimit fallback path, since
container /sys/fs/cgroup is read-only; PID-namespace isolation confirmed).

* feat(box): SaaS guard to force a single global sandbox scope

Add system.limitation.force_box_session_id_template: when non-empty it
overrides every pipeline's box-session-id-template at resolve time, pinning
all queries to one shared sandbox (e.g. {global}). This is the authoritative,
unbypassable guard — it runs on every exec call, so editing the pipeline
config via API cannot escape it. The web UI locks the Sandbox Scope selector
via a combined box_scope_editable flag (box available AND not forced).

* build(deps): pin langbot-plugin==0.4.2b1 (nsjail cgroup container-safety beta)

* fix(web): show forced sandbox scope + make disabled tooltip tap-friendly

When a SaaS deployment pins every pipeline to a fixed sandbox scope via
system.limitation.force_box_session_id_template, the Sandbox Scope selector was
correctly locked but still displayed the pipeline's stored value (e.g. the
per-chat default), misrepresenting the scope that the runtime actually enforces
on every exec. Coerce the displayed/saved value to the forced template so the
locked selector truthfully shows the active scope (e.g. Global).

Also fix the disabled_tooltip being invisible on touch devices: hover-only Radix
tooltips never open without a pointer, so the explanation of why the field is
locked could not be read on mobile. Wrap the info icon so a tap toggles the
tooltip while desktop hover still works.

* feat(web): hide sidebar new-version prompt for edition=cloud

Cloud instances are upgraded centrally by the operator, so surfacing a GitHub
'new version available' badge to tenants is misleading and actionable only by
the operator. Skip the release check entirely when edition=cloud.

* style(web): prettier formatting for DisabledTooltipIcon ternary

* chore(deps): bump langbot-plugin to 0.4.2b2

Picks up the SDK fix that creates a read-write host_path before the
nsjail bind-mount, fixing the SaaS MCP shared-workspace sandbox failure
(exec exit 255 with empty output when host_path didn't exist).

* chore(deps): bump langbot-plugin to 0.4.2b3

Picks up the nsjail /dev-node fix so stdio MCP servers (uvx-launched) can
start under force_global_sandbox instead of failing with 'Connection closed
/ please check URL'.

* fix(web): show real MCP runtime status on installed extensions list

The installed-extensions list badge keyed solely off the enable flag, so a
server that was still CONNECTING (or in ERROR) was shown as 'Connected'.
Reflect the actual runtime_info.status (connecting/connected/error/disabled)
with matching colors, and poll quietly every 3s while any MCP server is
connecting so the badge transitions without a manual refresh.

* chore(deps): bump langbot-plugin to 0.4.2b4

Picks up the 30s start_managed_process timeout so cold uvx MCP bootstraps
don't get torn down mid-install.

* style(web): satisfy prettier — parenthesize nullish-coalescing in ternary

* fix(mcp): isolate transient test sessions from the shared Box session

A config-page 'test' (server_name='_', no persisted UUID) ran in the same
shared 'mcp-shared' Box session as live MCP servers. A failing test (e.g.
empty args) churned that shared session and tore down healthy, already-
connected servers — leaving them stuck after exhausting their retries.

Mark UUID-less sessions as transient, give them their own isolated Box
session ('mcp-test-<uuid>'), and fully delete that session on cleanup so
tests can never disturb live servers and don't leak sessions.

* fix(mcp): tear down transient test session after test completes

A successful config-page test left its isolated 'mcp-test-<uuid>' Box
session running (the lifecycle task blocks until shutdown). Wrap the
transient test coroutine so it always shuts the session down afterward,
preventing isolated test sessions from leaking.
2026-06-09 19:30:17 +08:00

143 lines
5.8 KiB
Python

from __future__ import annotations
import os
import typing
from ..core import app
if typing.TYPE_CHECKING:
pass
class SkillManager:
"""Skill manager backed by Box-managed or local filesystem packages.
In sandbox deployments, skills are loaded from the Box runtime. Local
data/skills remains as the fallback for non-Box development.
Skills are activated through the `activate` tool (Tool Call mechanism),
aligned with Claude Code's design. This protects KV Cache and follows
industry standard.
"""
ap: app.Application
skills: dict[str, dict]
def __init__(self, ap: app.Application):
self.ap = ap
self.skills = {}
async def initialize(self):
await self.reload_skills()
async def reload_skills(self):
"""Reload all skills from the Box runtime.
Box is the only source of truth for skills. When Box is unavailable
(disabled in config or unreachable) the cache is emptied — there is
no local filesystem fallback. Skills whose ``package_root`` is no
longer visible on the LangBot-side filesystem are dropped so they
don't surface as stale ``extra_mounts``.
"""
self.skills = {}
box_service = getattr(self.ap, 'box_service', None)
if box_service is None or not getattr(box_service, 'available', False):
self.ap.logger.info('Box runtime unavailable; skill cache is empty.')
return
# LangBot may only validate Box-reported paths against its own
# filesystem when the two share one (local stdio mode). In separated
# deployments (Docker Compose, k8s sidecar, --standalone-box, remote
# endpoint) the package_root lives on the Box runtime's filesystem and
# is not resolvable here, so we trust what Box reports.
validate_locally = bool(getattr(box_service, 'shares_filesystem_with_box', False))
try:
dropped = 0
for skill_data in await box_service.list_skills():
skill_name = skill_data.get('name')
if not skill_name:
continue
package_root = str(skill_data.get('package_root', '') or '').strip()
if validate_locally and package_root and not os.path.isdir(package_root):
self.ap.logger.warning(
f'Skill "{skill_name}" reported by Box runtime but '
f'package_root missing on LangBot filesystem '
f'({package_root}); dropping from in-memory cache.'
)
dropped += 1
continue
self.skills[skill_name] = skill_data
if dropped:
self.ap.logger.warning(
f'Loaded {len(self.skills)} skills from Box runtime '
f'({dropped} dropped due to missing package_root).'
)
else:
self.ap.logger.info(f'Loaded {len(self.skills)} skills from Box runtime')
except Exception as exc:
self.ap.logger.warning(f'Failed to load skills from Box runtime: {exc}')
def refresh_skill_from_disk(self, skill_name: str) -> bool:
"""Confirm a single skill is present in the cache.
With Box as the only source of truth, the actual reload is driven by
SkillService callers awaiting ``reload_skills``; this method only
reports whether the cache still has the skill.
"""
if not skill_name:
return False
return skill_name in self.skills
def get_skill_by_name(self, name: str) -> dict | None:
"""Get skill data by name."""
return self.skills.get(name)
def get_skill_index(self, bound_skills: list[str] | None = None) -> str:
"""Render the pipeline-visible skills as a short ``name: description``
index suitable for the system prompt.
``bound_skills`` follows the same convention as
``query.variables['_pipeline_bound_skills']``: ``None`` means every
loaded skill is exposed; an explicit list filters to that subset.
Returns an empty string when no skills are visible.
"""
lines: list[str] = []
for skill in self.skills.values():
name = skill.get('name')
if not name:
continue
if bound_skills is not None and name not in bound_skills:
continue
display = skill.get('display_name') or name
description = (skill.get('description') or '').strip().replace('\n', ' ')
lines.append(f'- {name} ({display}): {description}')
if not lines:
return ''
return 'Available Skills:\n' + '\n'.join(lines)
def build_skill_aware_prompt_addition(self, bound_skills: list[str] | None = None) -> str:
"""Build the system-prompt addendum that makes the LLM aware of the
pipeline-visible skills.
Only metadata (name + description) is injected — the full SKILL.md is
loaded later via the ``activate`` Tool Call, protecting KV cache and
matching Claude Code's progressive disclosure pattern. Returns an
empty string when no skills are visible (no prompt change at all).
"""
skill_index = self.get_skill_index(bound_skills)
if not skill_index:
return ''
return (
'\n\n'
f'{skill_index}\n\n'
"When the user's request clearly matches one or more skills "
'based on their descriptions above, call the `activate` tool with '
'the skill name to load its full instructions. Only the name and '
'description are visible here; the actual instructions arrive as '
'the tool result. If no skill is a clear match, respond normally '
'without activating any skill.'
)