mirror of
https://github.com/langbot-app/LangBot.git
synced 2026-06-09 23:36:02 +00:00
feat(box/mcp): instance-based orphan cleanup, error classification, session API, and integration tests
## Changes
### Precise orphan container cleanup
- Runtime generates a unique instance_id on startup
- Every container gets a `langbot.box.instance_id` label
- `cleanup_orphaned_containers()` only removes containers from
previous instances, preserving containers owned by the current one
- Containers from older versions (no label) are also cleaned up
- `cleanup_orphaned_containers` added to `BaseSandboxBackend` as
a no-op default method, removing hasattr duck-typing
### Fine-grained MCP error classification
- New `MCPSessionErrorPhase` enum with 7 phases: session_create,
dep_install, process_start, relay_connect, mcp_init, runtime,
tool_call
- Each phase in `_init_box_stdio_server()` sets the error phase
before re-raising, enabling precise failure diagnosis
- `retry_count` tracked across retry attempts
- `get_runtime_info_dict()` exposes `error_phase` and `retry_count`
### GET /v1/sessions/{id} API
- `BoxRuntime.get_session()` returns session details including
managed process info when present
- `handle_get_session` HTTP handler + route in server.py
- `BoxRuntimeClient.get_session()` abstract method + remote impl
### stdio defaults to Box when runtime is available
- `_uses_box_stdio()` checks `box_service.available` instead of
requiring explicit `box` key in server_config
- `BoxService.initialize()` catches runtime errors gracefully,
sets `available=False` instead of crashing LangBot startup
- When no container runtime exists, stdio MCP falls back to
host-direct execution
### Code quality (from /simplify review)
- Extracted `_VENV_DIRS` / `_VENV_BIN_DIRS` module-level constants
- Removed dead `_box_network_mode()` method and unused `bc` variable
- Fixed broken import `from ....box.models` → `from ...box.models`
- Cached `_resolve_host_path()` result — computed once, passed through
- Config hash now includes `host_path` field
- Batched orphan cleanup into single `rm -f` command
### Session leak fix
- `_cleanup_box_stdio_session()` now runs in `_lifecycle_loop`'s
finally block, covering all exit paths (normal shutdown, error,
retry, final failure)
### Integration tests
- 6 end-to-end tests covering managed process lifecycle, WebSocket
stdio bidirectional IO, session cleanup verification, single
session query, process exit detection, and orphan cleanup safety
This commit is contained in:
@@ -5,6 +5,7 @@ import collections
|
||||
import dataclasses
|
||||
import datetime as dt
|
||||
import logging
|
||||
import uuid
|
||||
|
||||
from .backend import BaseSandboxBackend, DockerBackend, PodmanBackend
|
||||
from .errors import (
|
||||
@@ -64,12 +65,14 @@ class BoxRuntime:
|
||||
self._backend: BaseSandboxBackend | None = None
|
||||
self._sessions: dict[str, _RuntimeSession] = {}
|
||||
self._lock = asyncio.Lock()
|
||||
self.instance_id = uuid.uuid4().hex[:12]
|
||||
|
||||
async def initialize(self):
|
||||
self._backend = await self._select_backend()
|
||||
if self._backend is not None:
|
||||
self._backend.instance_id = self.instance_id
|
||||
try:
|
||||
await self._backend.cleanup_orphaned_containers()
|
||||
await self._backend.cleanup_orphaned_containers(self.instance_id)
|
||||
except Exception as exc:
|
||||
self.logger.warning(f'LangBot Box orphan container cleanup failed: {exc}')
|
||||
|
||||
@@ -164,6 +167,17 @@ class BoxRuntime:
|
||||
def get_sessions(self) -> list[dict]:
|
||||
return [self._session_to_dict(s.info) for s in self._sessions.values()]
|
||||
|
||||
def get_session(self, session_id: str) -> dict:
|
||||
runtime_session = self._sessions.get(session_id)
|
||||
if runtime_session is None:
|
||||
raise BoxSessionNotFoundError(f'session {session_id} not found')
|
||||
result = self._session_to_dict(runtime_session.info)
|
||||
if runtime_session.managed_process is not None:
|
||||
result['managed_process'] = self._managed_process_to_dict(
|
||||
session_id, runtime_session.managed_process
|
||||
)
|
||||
return result
|
||||
|
||||
async def get_status(self) -> dict:
|
||||
backend_info = await self.get_backend_info()
|
||||
return {
|
||||
|
||||
Reference in New Issue
Block a user