Compare commits

..

11 Commits

Author SHA1 Message Date
huanghuoguoguo 5b2826fa49 Add performance and reliability QA gates (#2283)
* Add performance and reliability QA gates

* test(skills): prepare user path performance gate

* test(skills): add debug chat load gate

* test(skills): extend fake provider load profiles

* test(skills): add debug chat timing and isolation probes

* test(skills): clarify manual QA perf gates
2026-06-25 21:02:44 +08:00
Hyu 20636ac432 Merge pull request #2284 from langbot-app/fix/api-password-thread-offload
fix(api): offload password hashing from event loop
2026-06-25 20:31:44 +08:00
Hyu af42602547 Merge pull request #2285 from langbot-app/fix/monitoring-null-payloads
fix(monitoring): tolerate null API payloads
2026-06-25 20:26:25 +08:00
dadachann 53b20e2b13 fix(monitoring): tolerate null API payloads
Normalize monitoring API responses before rendering so empty or error payloads with data:null cannot crash the dashboard. Also guard chart, token, and box session arrays before reading length/map.
2026-06-25 08:22:01 -04:00
dadachann 1242dc2d21 fix(api): offload password hashing from event loop 2026-06-25 06:29:16 -04:00
RockChinQ 04628d93cb docs: add architecture guide for agents 2026-06-25 04:17:19 -04:00
RockChinQ 9c22a1521c fix(box): defer separated workspace ownership to runtime 2026-06-25 00:09:40 -04:00
RockChinQ c8d5039580 feat(box): expose Docker CPU limit toggle 2026-06-24 23:31:53 -04:00
dadachann 85d8d9304e fix(web): keep feedback dialog interactive 2026-06-24 10:10:19 -04:00
Hyu 76471af179 feat(web): add sidebar feedback popover
Co-authored-by: dadachann <185672915+dadachann@users.noreply.github.com>
2026-06-24 16:43:50 +08:00
RockChinQ 59b2a7cd51 fix(monitoring): hide disabled box status on cloud 2026-06-23 06:40:05 -04:00
80 changed files with 8071 additions and 397 deletions
+1
View File
@@ -48,6 +48,7 @@ coverage.xml
.coverage
src/langbot/web/
testsdk/
.qa/
# Build artifacts
/dist
+79 -134
View File
@@ -1,160 +1,105 @@
# AGENTS.md
This file guides code agents (Claude Code, GitHub Copilot, OpenAI Codex, etc.) working in the LangBot project. `CLAUDE.md` is a symlink to this file.
This file guides code agents working in the LangBot main repository. `CLAUDE.md` is a symlink to this file.
## Project Overview
Read `ARCHITECTURE.md` before non-trivial backend, frontend, runtime, plugin, Box, MCP, persistence, or cross-repo SDK changes. This file is the working checklist; `ARCHITECTURE.md` is the system map.
LangBot is an open-source, LLM-native instant-messaging bot development platform. It aims to provide an out-of-the-box IM bot development experience with Agent, RAG, MCP and other LLM application capabilities, supporting mainstream global IM platforms and exposing rich APIs for custom development.
## Quick Facts
LangBot has a comprehensive web frontend — almost every operation can be performed through it.
- Python backend: `>=3.11,<4.0`, dependencies managed by `uv`.
- Frontend: `web/` is Vite + React Router 7 + shadcn/ui + Tailwind, managed by `pnpm`.
- Backend framework: Quart served by Hypercorn on `api.port`, default `5300`.
- Frontend dev server: `web/` on `3000`, with `VITE_API_BASE_URL` pointing at the backend.
- Plugin/Box/runtime contracts live in sibling repo `langbot-plugin-sdk`, pinned as `langbot-plugin` in `pyproject.toml`.
- **Python**: `>=3.11,<4.0`, dependencies managed by `uv`. Package version is in `pyproject.toml`.
- **Frontend**: `web/` is a **Vite + React Router 7 + shadcn/ui + Tailwind CSS** SPA, managed by `pnpm`. (Note: this is NOT Next.js — the `dev` script is `vite`.)
- **Backend framework**: Quart (the async flavour of Flask). The HTTP API and the pre-built web UI are both served by the backend on `http://127.0.0.1:5300`.
## Repository Layout
```
LangBot/
├── main.py # Entrypoint shim -> langbot.__main__.main()
├── pyproject.toml # Python project + deps (uv), pins langbot-plugin==<x.y.z>
├── src/langbot/
│ ├── __main__.py # Real entrypoint, CLI args (--standalone-runtime, --standalone-box, --debug)
│ ├── pkg/ # Core backend package
│ │ ├── api/ # HTTP API controllers + services (Quart)
│ │ ├── core/ # App bootstrap, stages, task manager
│ │ ├── platform/ # IM platform adapters, bot managers, session managers
│ │ ├── provider/ # LLM providers, requesters, tool providers
│ │ ├── pipeline/ # Pipelines, stages, query pool
│ │ ├── plugin/ # Bridge connecting LangBot to the plugin runtime (see below)
│ │ ├── box/ # Code-sandbox subsystem (Docker / nsjail / E2B backends)
│ │ ├── skill/ # Skill subsystem
│ │ ├── rag/ , vector/ # RAG + vector store
│ │ ├── command/ # Built-in commands
│ │ ├── persistence/ # ORM models + Alembic migrations (SQLite & PostgreSQL)
│ │ ├── storage/ # Object/file storage abstractions
│ │ ├── config/, entity/, discover/, utils/, telemetry/, survey/
│ ├── libs/ # Vendored SDKs (qq_official_api, wecom_api, etc.)
│ └── templates/ # Config/component templates (e.g. templates/config.yaml)
├── web/ # Frontend SPA (Vite + React Router 7 + shadcn + Tailwind)
└── docker/ # docker-compose deployment files
```
## Development Environment Setup
Full guide lives in the wiki: **["开发配置" / Dev Config](https://docs.langbot.app/zh/develop/dev-config)**. Summary:
### Backend
```bash
pip install uv
uv sync --dev # uv creates a .venv/ for you; point your editor's interpreter at it
uv run main.py # serves API + web UI on http://127.0.0.1:5300
```
On first run the config file is generated at `data/config.yaml`. DB is SQLite by default (zero setup); PostgreSQL is supported. Migrations run automatically on startup.
### Frontend
Requires Node.js + [pnpm](https://pnpm.io/installation).
```bash
cd web
cp .env.example .env # Windows: copy .env.example .env
pnpm install
pnpm dev # http://127.0.0.1:3000 (npm install / npm run dev also work)
```
`pnpm dev` reads `VITE_API_BASE_URL` from `web/.env` so the dev frontend can reach the backend on port `5300`. In production the frontend is pre-built into static files served by the backend on the same origin.
### Code formatting
The repo runs lint + format checks in CI. Install the pre-commit hooks so the same checks run locally before each commit:
## Essential Commands
```bash
uv sync --dev
uv run main.py
uv run pre-commit install
cd web
pnpm install
pnpm dev
pnpm build
```
## Plugin System
LangBot's plugin system (Plugin SDK, CLI `lbp`, Plugin Runtime, and the shared entity/API definitions) lives in a **separate repository**: [`langbot-plugin-sdk`](https://github.com/langbot-app/langbot-plugin-sdk). LangBot depends on it via the pinned `langbot-plugin` package in `pyproject.toml`.
### Architecture (what to know inside this repo)
- Plugins run as independent processes managed by the **Plugin Runtime**. The Runtime supports two control transports: `stdio` and `websocket`.
- When LangBot is started directly by a user (not in a container), it spawns and connects to the Runtime over **stdio** (lightweight/personal use).
- When LangBot runs in a container, it connects to a standalone Runtime over **WebSocket** (production).
- The bridge code lives in `src/langbot/pkg/plugin/` (`connector.py`, `handler.py`).
- Relevant config (`data/config.yaml`): `plugin.runtime_ws_url` (e.g. `ws://langbot_plugin_runtime:5400/control/ws`). Start LangBot with `--standalone-runtime` to make it connect to an externally-launched Runtime over WebSocket instead of spawning one over stdio.
### Debugging the Plugin Runtime / CLI / SDK
This is documented in detail in the **SDK repo's `AGENTS.md`** and in the wiki page **["调试插件运行时、CLI、SDK" / Plugin Runtime](https://docs.langbot.app/zh/develop/plugin-runtime)**. The short version:
- Clone `LangBot` and `langbot-plugin-sdk` as siblings under one parent dir so the editor resolves shared entities.
- Start a standalone Runtime from the SDK repo: `uv run --no-sync lbp rt` (control port `5400`, debug port `5401`).
- To make LangBot use a locally-modified SDK: from the SDK dir, with LangBot's `.venv` active, run `uv pip install .`, then launch LangBot with `uv run --no-sync main.py --standalone-runtime` (keep `--no-sync` so your local SDK isn't overwritten).
### Debugging the Box (sandbox) runtime
The Box subsystem (`src/langbot/pkg/box/`) is the code sandbox. It picks the first available backend among **Docker / nsjail / E2B**. The standalone Box runtime is launched via the SDK CLI: `lbp box`. Backend selection details, the `lbp box` flags, and the SDK-side architecture are documented in the SDK repo's `AGENTS.md`.
Relevant config (`data/config.yaml`, `box:` section): `box.enabled` (master switch — disabling it also disables the native sandbox tools, skill add/edit, and stdio-mode MCP servers), `box.backend` (`'local'` = Docker/nsjail auto-pick, or `'docker'` / `'nsjail'` / `'e2b'`; also settable via `BOX__BACKEND`), and `box.runtime.endpoint` (external Box runtime base URL, e.g. `ws://127.0.0.1:5410`; empty = local auto-managed runtime). Like the plugin runtime, LangBot can connect to an externally-launched Box runtime by setting that endpoint and starting with `--standalone-box`.
> A common false "No supported sandbox backend (Docker / nsjail / E2B) is available" comes from Docker being installed and running but the current user not being in the `docker` group → `docker info` gets `permission denied` on the socket. Fix: `sudo usermod -aG docker <user>` and restart the backend in a shell that has the new group.
## Development Standards
- LangBot is a global project: **all code comments and docstrings must be in English**, and every user-facing string must support **i18n** (`en_US` + `zh_Hans` at minimum, plus `ja_JP` where the repo already has it).
- LangBot is adopted in both toC and toB scenarios — always consider compatibility and security.
- **Commit message format**: `<type>(<scope>): <subject>`
- `type`: one of `feat`, `fix`, `docs`, `style`, `refactor`, `perf`, `test`, `chore`, etc.
- `scope`: the affected package/module/file/class.
- `subject`: concise description of the change.
### Database migrations (Alembic)
LangBot uses [Alembic](https://alembic.sqlalchemy.org/) for migrations, supporting both SQLite and PostgreSQL from a single set of scripts. Migration files live in `src/langbot/pkg/persistence/alembic/versions/`.
If you change ORM model definitions, generate a migration:
Useful focused tests:
```bash
# Run from the project root (requires data/config.yaml to exist)
uv run python -m langbot.pkg.persistence.alembic_runner autogenerate "description of your change"
uv run pytest tests/unit_tests -q
uv run pytest tests/integration -q
uv run pytest tests/integration/persistence -q
uv run pytest tests/manual/mcp_smoke.py
cd web
pnpm lint
pnpm test:e2e
```
Review and edit the generated script before committing. Migrations execute automatically on startup. `autogenerate` detects schema changes (add/drop columns, tables, type changes) but **data migrations** (e.g. mutating JSON field contents) must be hand-written into the generated script. `env.py` sets `render_as_batch=True`, so SQLite's ALTER TABLE limits are handled automatically — no need to branch per database. More in the wiki ["开发配置"](https://docs.langbot.app/zh/develop/dev-config#数据库迁移).
Run the narrowest useful test first, then broader checks when confidence is needed.
When writing a migration, follow these rules:
## Where to Look
- **Revision id ≤ 32 characters.** PostgreSQL stores `alembic_version.version_num` as `varchar(32)`; a longer id raises `StringDataRightTruncationError` at runtime. Prefer short, descriptive ids like `0005_add_llm_context_length`.
- **Guard every operation against missing tables/columns.** Fresh installs build the schema via `create_all()` and then stamp the Alembic baseline, so a migration may run against a table that already has the change — or, in tests, against an empty database. Check `inspector.get_table_names()` / `inspector.get_columns(...)` before `add_column` / `drop_column`, mirroring the existing migrations.
- **Keep a single linear head.** Chain `down_revision` to the current head; do not create branches. Run the migration tests after adding one: `uv run pytest tests/integration/persistence/ -q` (the PostgreSQL test needs a running PG via `TEST_POSTGRES_URL`).
- Architecture map: `ARCHITECTURE.md`.
- Dev environment guide: https://docs.langbot.app/zh/develop/dev-config.
- Plugin runtime / CLI / SDK debugging: https://docs.langbot.app/zh/develop/plugin-runtime.
- API-key auth: `docs/API_KEY_AUTH.md`.
- Box deep-dive notes: `docs/review/box-architecture.md` and related files.
- In-repo skills: `skills/` is the single source of truth for LangBot agent skills.
- SDK repo: `../langbot-plugin-sdk/` when changing shared entities, plugin APIs, action protocol, `lbp rt`, or `lbp box`.
> **Legacy migration system (deprecated — do not extend).** The old 3.x migration system under `src/langbot/pkg/persistence/migrations/` (`DBMigration` subclasses in `dbmXXX_*.py`, run from `pkg/persistence/mgr.py`) is **frozen**. Do **not** add new `dbmXXX_*.py` files. The chain is capped at `required_database_version = 25` (`pkg/utils/constants.py`); those files only exist to upgrade pre-existing 3.x databases up to the Alembic baseline and are kept read-only. All new schema changes go through Alembic.
## Cross-Repo SDK Work
## Agent-Facing Surfaces (MCP + Skills)
When changing SDK contracts used by LangBot:
LangBot is built to be **agent-friendly**. Three surfaces let AI agents work
with LangBot, and they MUST be kept in lockstep with the HTTP API:
```bash
# from langbot-plugin-sdk, with LangBot's .venv active
uv pip install .
1. **MCP server**`src/langbot/pkg/api/mcp/` exposes a curated subset of the
API as MCP tools at `/mcp` (API-key authenticated, including the
`api.global_api_key` from config.yaml). `server.py` defines the tools (they
call the service layer directly); `mount.py` is the ASGI dispatcher.
2. **In-repo skills**`skills/` is the **single source of truth** for agent
skills (plugin/core/deploy/e2e/MCP-ops). Docs and the landing page link here
rather than embedding their own copies.
3. **API-key auth**`api.global_api_key` (config.yaml) authenticates the API
and MCP without a login session; see `docs/API_KEY_AUTH.md`.
# from LangBot, preserve the locally installed SDK
uv run --no-sync main.py
```
> **Maintenance rule (important).** When you add, remove, or change an HTTP API
> endpoint that should be agent-accessible, you MUST update **both** the matching
> MCP tool in `src/langbot/pkg/api/mcp/server.py` **and** the relevant skill under
> `skills/` (especially `skills/skills/langbot-mcp-ops`). The API, the MCP tool
> surface, and the skills are one system — drift between them is a bug.
For standalone runtime debugging:
## Some Principles
```bash
# in langbot-plugin-sdk
uv run --no-sync lbp rt
uv run --no-sync lbp box
# in LangBot
uv run --no-sync main.py --standalone-runtime
uv run --no-sync main.py --standalone-box
```
Config keys to verify in `data/config.yaml` / `src/langbot/templates/config.yaml`:
- Plugin runtime: `plugin.runtime_ws_url`, default Docker host `langbot_plugin_runtime:5400/control/ws`.
- Box runtime: `box.enabled`, `box.backend`, `box.runtime.endpoint`, Docker host `langbot_box:5410`.
- API/MCP auth: `api.global_api_key`.
## Change Rules
- HTTP API changes that should be agent-accessible must update the matching MCP tool in `src/langbot/pkg/api/mcp/server.py` and the relevant skill under `skills/` in the same pass.
- New schema changes use Alembic under `src/langbot/pkg/persistence/alembic/versions/`; do not add legacy `dbmXXX` migrations.
- New platform behavior belongs in platform adapters only for platform translation; pipeline/business logic belongs in `pkg/pipeline/` or services.
- User-facing strings must support i18n (`en_US`, `zh_Hans`; include `ja_JP` where the repo already does).
- Code comments and docstrings must be English.
- Keep compatibility and security in mind; LangBot is used in both self-hosted/community and toB deployments.
- Commit message format: `<type>(<scope>): <subject>`.
## Runtime Pitfalls
- Local stdio Plugin Runtime disconnects do not auto-reconnect; restart LangBot if that path breaks.
- Orphan runtime processes on `5400`/`5401` commonly break plugin debugging.
- Use `uv run --no-sync` after locally installing the SDK, or `uv` may restore the pinned package.
- A false Box “no backend” often means Docker is running but the current user lacks Docker socket permission.
- Do not confuse external MCP servers LangBot connects to (`pkg/provider/tools/loaders/mcp.py`) with LangBot's own `/mcp` server (`pkg/api/mcp/`).
- `CLAUDE.md` is a symlink to this file; edit `AGENTS.md`, not the symlink.
## Principles
- Keep it simple, stupid.
- Entities should not be multiplied unnecessarily.
+250
View File
@@ -0,0 +1,250 @@
# Architecture
This document is a map of LangBot's moving parts. It is intentionally more stable than a feature guide and more concrete than the README: when you need to change behavior, start here, then follow the file references into the code.
For agent-specific working rules, see `AGENTS.md`. For plugin-runtime and Box-runtime implementation details, also read the sibling SDK repo: [`langbot-plugin-sdk`](https://github.com/langbot-app/langbot-plugin-sdk).
## What LangBot Is
LangBot is an open-source platform for building production IM bots backed by LLMs, agents, RAG, plugins, MCP tools, and a web management panel.
At runtime, one LangBot process owns:
- a Quart/Hypercorn HTTP service and the built web UI on `:5300`;
- messaging-platform adapters such as Discord, Telegram, Slack, WeChat, QQ, WeCom, Lark, DingTalk, KOOK, LINE, Satori, Matrix, and HTTP/WebSocket bots;
- a pipeline engine that turns inbound platform messages into LLM/tool/plugin work and replies;
- persistence, storage, vector database, telemetry, monitoring, and configuration managers;
- bridges to the Plugin Runtime and Box Runtime provided by `langbot-plugin-sdk`;
- an MCP server at `/mcp` exposing a curated agent-facing subset of the service layer.
## Repository Boundary
LangBot is not a single-repo system.
- `LangBot/` is the main product: backend, web UI, platform adapters, pipeline engine, HTTP API, MCP server, RAG, persistence, skills integration, and the bridge code that talks to runtimes.
- `langbot-plugin-sdk/` is published as `langbot-plugin` and pinned in `LangBot/pyproject.toml`. It contains plugin developer APIs, shared entities, `lbp`, the Plugin Runtime (`lbp rt`), and the Box Runtime (`lbp box`).
- Plugins import SDK APIs from `langbot_plugin.*`; the LangBot main process imports the same package for shared entities and runtime protocols.
This split matters. If a change modifies SDK entities, component APIs, action protocols, `lbp rt`, or `lbp box`, verify the sibling SDK repo and install the local SDK into LangBot's virtualenv when testing cross-repo behavior.
## Startup Path
The process entrypoint is small and layered:
1. `main.py` delegates to `langbot.__main__.main()`.
2. `src/langbot/__main__.py` parses `--standalone-runtime`, `--standalone-box`, and `--debug`, checks dependencies, generates missing config/data files, and calls `pkg.core.boot.main()`.
3. `pkg/core/boot.py` executes startup stages in order: `LoadConfigStage`, `GenKeysStage`, `SetupLoggerStage`, `BuildAppStage`, `ShowNotesStage`.
4. `BuildAppStage` constructs the `Application` object by wiring managers, services, runtime connectors, and controllers.
5. `Application.run()` starts the platform manager, query controller, HTTP controller, telemetry/cleanup loops, and plugin initialization.
The central runtime object is `pkg/core/app.py::Application`. It is a service locator for long-lived managers. That is not elegant, but it is the current architectural center; most subsystems receive `ap: Application` and collaborate through it.
## Top-Level Layout
```text
LangBot/
├── main.py # Entrypoint shim
├── pyproject.toml # Python package, deps, pinned langbot-plugin
├── src/langbot/
│ ├── __main__.py # CLI entrypoint and boot handoff
│ ├── pkg/
│ │ ├── core/ # Application, boot stages, task manager
│ │ ├── api/ # HTTP API + MCP server mount
│ │ ├── platform/ # IM adapters and runtime bot manager
│ │ ├── pipeline/ # Message routing and pipeline stages
│ │ ├── provider/ # LLM runners, model manager, tools
│ │ ├── plugin/ # LangBot-side Plugin Runtime connector/handler
│ │ ├── box/ # LangBot-side Box service/connector
│ │ ├── skill/ # Skill metadata/activation integration
│ │ ├── rag/ , vector/ # Knowledge-base and vector DB integration
│ │ ├── persistence/ # SQLAlchemy/SQLModel, Alembic, legacy migrations
│ │ ├── storage/ # Local/S3 file storage abstraction
│ │ └── config/, entity/, utils/, telemetry/, survey/
│ ├── libs/ # Vendored third-party platform SDKs
│ └── templates/ # Default config and component metadata
├── web/ # Vite + React Router + shadcn/ui + Tailwind SPA
├── docker/ # Deployment manifests
├── skills/ # In-repo agent skills, single source of truth
└── tests/ # Unit/integration/e2e/manual tests
```
## The Runtime Graph
The most useful mental model is this graph:
```text
Platform adapter
→ RuntimeBot
→ MessageAggregator
→ QueryPool
→ Controller
→ RuntimePipeline
→ PipelineStage chain
→ RequestRunner / ToolManager / PluginRuntimeConnector / BoxService
→ response via adapter
```
The HTTP and MCP surfaces are parallel entrypoints into the same service layer:
```text
HTTP client / Web UI
→ Quart route group
→ api/http/service/*
→ Application managers / persistence / runtime connectors
MCP client
→ /mcp mount
→ api/mcp/server.py tools
→ the same service layer directly
```
## Message Flow
Inbound platform messages enter through adapter-specific SDK callbacks. The common path is:
1. A platform adapter under `pkg/platform/sources/` converts platform-specific events into SDK message/event entities.
2. `RuntimeBot` in `pkg/platform/botmgr.py` applies pipeline routing rules and either discards the message, pushes it to webhooks, or sends it to the message aggregator.
3. `MessageAggregator` batches/normalizes messages before adding a `Query` to `QueryPool`.
4. `Controller` in `pkg/pipeline/controller.py` selects queries subject to global pipeline concurrency and per-session concurrency.
5. `RuntimePipeline` in `pkg/pipeline/pipelinemgr.py` runs configured pipeline stages using a responsibility-chain style executor that supports generator stages.
6. The chat stage emits plugin events, calls a configured `RequestRunner`, handles streaming/non-streaming responses, records telemetry, and appends conversation history.
7. Output stages send text, cards, chunks, files, or error notices back through the original platform adapter.
Pipeline components are registered by decorators and package import side effects. When adding a new stage, loader, runner, or adapter, check the corresponding preregistration mechanism instead of inventing a second registry.
## Platform Layer
Platform code lives under `pkg/platform/`.
- `botmgr.py` owns runtime bots, routing rules, event logging, webhook pushing, and adapter lifecycle.
- `sources/` contains adapter implementations. Each adapter subclasses `langbot_plugin.api.definition.abstract.platform.adapter.AbstractMessagePlatformAdapter` from the SDK.
- Platform entities such as `MessageChain`, `Image`, `At`, `Voice`, and events come from `langbot-plugin-sdk`, not from this repo.
The platform layer should translate between external platform APIs and LangBot's shared message/event model. It should not contain LLM-provider logic or pipeline business logic.
## Pipeline Layer
Pipeline code lives under `pkg/pipeline/`.
Important pieces:
- `pool.py::QueryPool` stores pending queries and cached in-flight queries for plugin backward-compatible calls.
- `controller.py::Controller` schedules query processing and enforces concurrency.
- `pipelinemgr.py::RuntimePipeline` materializes database pipeline config into a runtime stage chain.
- `process/handlers/chat.py::ChatMessageHandler` is the main LLM conversation handler.
- Stage families include response rules, banned sessions, content filters, preprocessors, rate limits, message truncation, long text handling, response-back, command handling, and wrappers.
Pipelines are configuration-driven. Prefer adding a stage or extending an existing stage family over hard-coding behavior in platform adapters.
## Provider, RAG, and Tools
Provider code lives under `pkg/provider/`.
- `modelmgr/` manages configured model providers and requesters.
- `runners/` implements request runners such as the local agent runner and external workflow integrations.
- `tools/toolmgr.py` aggregates tools from native tools, plugin tools, external MCP servers, and skill-authoring tools.
- `tools/loaders/mcp.py` is the MCP client side: external MCP servers that LangBot connects to for agent tools.
- RAG lives across `pkg/rag/`, `pkg/vector/`, model services, and plugin KnowledgeEngine actions.
Do not confuse LangBot's MCP client side with LangBot's own MCP server at `/mcp`; they are different surfaces.
## Plugin System
The plugin system crosses the repo boundary.
In this repo:
- `pkg/plugin/connector.py` connects LangBot to the Plugin Runtime over stdio or WebSocket.
- `pkg/plugin/handler.py` exposes LangBot actions to the runtime and calls runtime actions for plugin operations.
- `pkg/provider/tools/loaders/plugin.py` exposes plugin Tool components to LLM runners.
- Pipeline handlers emit SDK events such as normal-message events and prompt-processing events.
In `langbot-plugin-sdk`:
- `src/langbot_plugin/api/` defines `BasePlugin`, component base classes, message/event entities, contexts, proxies, and manifests.
- `src/langbot_plugin/runtime/` implements `lbp rt`, plugin discovery, dependency installation, process launching, and control/debug connections.
- `src/langbot_plugin/entities/io/` defines the action protocol shared by LangBot, runtime, and plugin processes.
The Plugin Runtime supports stdio and WebSocket control transports. Direct local LangBot runs usually spawn the runtime over stdio. Containerized/standalone deployments connect over WebSocket using `plugin.runtime_ws_url` and `--standalone-runtime`.
## Box Runtime and Skills
Box is the sandbox subsystem used by native agent tools, stdio MCP servers, skill authoring, and managed processes.
In this repo:
- `pkg/box/service.py` is the application-facing facade for exec, sessions, managed processes, skill CRUD, status, reconnects, quotas, mounts, and sandbox profiles.
- `pkg/box/connector.py` connects to the Box Runtime over stdio, Windows subprocess+WebSocket, or remote WebSocket.
- `pkg/provider/tools/loaders/native.py`, `mcp_stdio.py`, and skill loaders depend on Box availability.
- `pkg/skill/manager.py` loads skills from the Box runtime, falling back to local `data/skills` when needed.
In `langbot-plugin-sdk`:
- `src/langbot_plugin/box/server.py` implements `lbp box` and the WebSocket endpoints on `:5410`.
- `src/langbot_plugin/box/runtime.py` owns sandbox sessions and managed processes.
- `backend.py`, `nsjail_backend.py`, and `e2b_backend.py` implement sandbox backends.
- `skill_store.py` manages skill packages from the Box side.
Important config keys live under `box:` in `src/langbot/templates/config.yaml`: `box.enabled`, `box.backend`, `box.runtime.endpoint`, and `box.local.*`. Start LangBot with `--standalone-box` when connecting to an externally launched Box runtime.
## HTTP API, Web UI, and MCP Server
`pkg/api/http/controller/main.py` builds a Quart app, registers route groups, serves the built SPA, and wraps the ASGI app with the MCP dispatcher.
- HTTP route groups live under `pkg/api/http/controller/groups/`.
- Service-layer logic lives under `pkg/api/http/service/`.
- The built web UI is served from the frontend build path with SPA fallback.
- The MCP server lives under `pkg/api/mcp/` and is mounted at `/mcp`.
The MCP server intentionally exposes a curated subset of the API. Tools call service classes directly rather than making HTTP requests back into LangBot.
Maintenance rule: when adding, removing, or changing an HTTP endpoint that should be agent-accessible, update the matching MCP tool and the relevant in-repo skill under `skills/` in the same pass.
## Persistence and Configuration
Persistence is centered on `pkg/persistence/mgr.py`.
- SQLite is the default database; PostgreSQL is supported.
- Models live under `pkg/entity/persistence/`.
- Fresh schemas are created from metadata, then legacy migrations run up to the frozen 3.x baseline, then Alembic migrations run to head.
- New schema changes should use Alembic under `pkg/persistence/alembic/versions/`; do not extend the frozen legacy migration chain.
Configuration starts from `src/langbot/templates/config.yaml` and is generated into `data/config.yaml` on first run. Most long-lived managers read from `ap.instance_config.data`.
## Frontend
The frontend lives in `web/` and is a Vite SPA using React Router 7, shadcn/ui, Tailwind CSS, and pnpm. It is not Next.js, despite some historical filenames.
In development, `pnpm dev` serves the UI on `:3000` and reads `VITE_API_BASE_URL` to call the backend on `:5300`. In production, the built frontend is packaged into the Python distribution and served by the backend.
Keep frontend API behavior aligned with `pkg/api/http/service/` and route groups. User-facing strings must go through the existing i18n setup.
## Agent-Facing Surfaces
LangBot is deliberately agent-friendly. The agent-facing surfaces are part of the architecture, not extra docs.
- `skills/` is the single source of truth for in-repo skills.
- `pkg/api/mcp/server.py` exposes the LangBot MCP server at `/mcp`.
- `api.global_api_key` authenticates API/MCP access without a browser login.
- `AGENTS.md` and `ARCHITECTURE.md` tell coding agents how the repo works.
When one of these changes, update the others if the behavior or contract changed. API, MCP tools, and skills are one system; drift is a bug.
## Where to Change Things
- New HTTP API: add/adjust a service in `pkg/api/http/service/`, a route group in `pkg/api/http/controller/groups/`, tests, and MCP/skills if agent-accessible.
- New platform adapter: add a `pkg/platform/sources/*` adapter, component metadata/templates as needed, i18n, docs, and tests/smoke coverage.
- New pipeline behavior: add or extend a pipeline stage family under `pkg/pipeline/`; avoid putting pipeline rules in adapters.
- New LLM provider/requester: work under `pkg/provider/modelmgr/` and related service/UI surfaces.
- New LLM tool source: extend `pkg/provider/tools/loaders/` and `ToolManager` intentionally.
- New plugin component/API/protocol: change `langbot-plugin-sdk` first or in lockstep, then update LangBot bridge code.
- New Box capability: change both `pkg/box/` and `langbot-plugin-sdk/src/langbot_plugin/box/`, plus config and tests.
- New database schema: add an Alembic migration, not a legacy `dbmXXX` migration.
## Design Biases
- Keep platform translation, pipeline orchestration, provider execution, and runtime protocols separate.
- Reuse existing registries and service layers instead of adding parallel paths.
- Prefer small, explicit agent surfaces over exposing every internal API.
- Treat cross-repo contracts with the SDK as public interfaces.
- Test behavior at the narrowest useful layer first, then add integration/e2e coverage for runtime or platform changes.
+2 -1
View File
@@ -62,11 +62,12 @@ services:
- TZ=Asia/Shanghai
# Unified env-override convention: SECTION__SUBSECTION__KEY overrides the
# matching config.yaml field (see LoadConfigStage). These map onto
# box.local.* and are forwarded to the Box runtime via INIT RPC.
# box.* and are forwarded to the Box runtime via INIT RPC.
- BOX__LOCAL__HOST_ROOT=${LANGBOT_BOX_ROOT:-${PWD}/data/box}
- BOX__LOCAL__DEFAULT_WORKSPACE=default
- BOX__LOCAL__SKILLS_ROOT=skills
- BOX__LOCAL__ALLOWED_MOUNT_ROOTS=${LANGBOT_BOX_ROOT:-${PWD}/data/box}
- BOX__DOCKER__CPU_LIMIT_ENABLED=${LANGBOT_BOX_DOCKER_CPU_LIMIT_ENABLED:-true}
ports:
- 5300:5300 # For web ui and webhook callback
- 2280-2285:2280-2285 # For platform reverse connection
+2 -1
View File
@@ -26,7 +26,7 @@ and LangBot's own Local Agent) working with the LangBot ecosystem.
## Quick start (for an AI agent)
1. Read this README, `AGENTS.md`, and `qa-agent-docs/` to understand the layout.
1. Read this README, `AGENTS.md`, and `docs/user-guide.md` to understand the layout.
2. Read `skills/.env` for shared local defaults. On a new machine, copy
`skills/.env.example` to `skills/.env.local` (gitignored) and override
machine-specific values there. Never commit secrets.
@@ -48,6 +48,7 @@ bin/lbs env show # inspect resolved env defaults (redacted)
bin/lbs env doctor # diagnose local environment readiness
bin/lbs case list --ready
bin/lbs test plan <case-id>
bin/lbs suite plan langbot-debug-chat-load-gate
```
## Maintenance rule
+171
View File
@@ -0,0 +1,171 @@
# LangBot QA Skills User Guide
Use this guide as the first operational path after reading `README.md` and
`AGENTS.md`.
## 1. Configure Local Inputs
Read `skills/.env`, then create `skills/.env.local` for machine-local values.
Do not commit `.env.local`, browser profiles, reports, tokens, API keys, OAuth
state, or provider credentials.
Minimum local fields for live browser QA:
```bash
LANGBOT_REPO=/path/to/LangBot
LANGBOT_WEB_REPO=/path/to/LangBot/web
LANGBOT_BACKEND_URL=http://127.0.0.1:5300
LANGBOT_FRONTEND_URL=http://127.0.0.1:3000
LANGBOT_DEV_FRONTEND_URL=http://127.0.0.1:3000
LANGBOT_BROWSER_PROFILE=/path/to/langbot-browser-profile
LANGBOT_CHROMIUM_EXECUTABLE=/path/to/chromium-or-playwright-chrome
LANGBOT_E2E_LOGIN_USER=qa-local@example.com
```
`LANGBOT_E2E_LOGIN_USER` is a local QA account. The setup automation uses the
LangBot recovery key from the active checkout to initialize or refresh that
local account and write a browser `localStorage` token. It does not need the
user's GitHub or Space credentials.
## 2. Check Readiness
From `skills/`:
```bash
bin/lbs env show
bin/lbs env doctor
bin/lbs validate
bin/lbs index --check
```
`env doctor` should report reachable backend and frontend URLs before live
browser cases are run. Missing Space provider credentials are not a LangBot
product pass; classify them as `env_issue` and configure the local Space
provider before measuring Debug Chat performance.
## 3. Start Services
Start the backend from `LANGBOT_REPO`:
```bash
cd "$LANGBOT_REPO"
uv run main.py
```
Start the standalone frontend from `LANGBOT_WEB_REPO` and point it at the
backend:
```bash
cd "$LANGBOT_WEB_REPO"
VITE_API_BASE_URL="$LANGBOT_BACKEND_URL" pnpm dev --host 0.0.0.0
```
If `VITE_API_BASE_URL` is missing, browser tests can load the Vite page but send
API requests to the frontend port, which produces false UI failures.
## 4. Prepare User-Path Fixtures
For local-agent Debug Chat cases and the user-path performance gate:
```bash
node scripts/e2e/ensure-local-agent-pipeline.mjs --write-env
```
The script:
- refreshes the local QA login and browser token;
- marks the local wizard as skipped;
- creates or updates a local QA pipeline;
- scans Space LLM models, tests candidates, and switches to the first working
Space model with tested fallback models;
- writes `LANGBOT_PIPELINE_URL`, `LANGBOT_PIPELINE_NAME`, and local-agent
pipeline/model variables into `skills/.env.local`;
- returns `env_issue` when no Space model can be scanned or tested.
Useful model controls:
```bash
LANGBOT_E2E_MODEL_TEST_LIMIT=8
LANGBOT_E2E_MODEL_FALLBACK_COUNT=3
LANGBOT_E2E_SKIP_MODEL_UUIDS=uuid-a,uuid-b
LANGBOT_E2E_SKIP_MODEL_NAMES=model-a,model-b
LANGBOT_E2E_SCAN_SPACE_MODELS=true
```
The setup writes a current-runtime compatibility `max-round` value into the
pipeline config because this backend still reads that field directly during
message truncation. Do not treat it as a long-term QA contract.
## 5. Run Gates
Fast contract gate, no live service required:
```bash
bin/lbs suite run langbot-performance-contract-gate --run-id langbot-contract-local
```
Live backend gate:
```bash
bin/lbs suite run langbot-live-backend-gate --run-id langbot-backend-local
```
Browser-visible user-path performance gate:
```bash
bin/lbs suite plan langbot-user-path-performance-gate
bin/lbs suite run langbot-user-path-performance-gate --run-id langbot-user-path-local --include-manual-check
```
Controlled Debug Chat message-path load gate (manual/non-required; run fake-provider cases serially when they share `LANGBOT_FAKE_PROVIDER_URL`):
```bash
bin/lbs suite plan langbot-debug-chat-load-gate
bin/lbs test run langbot-fake-provider-debug-chat-load --run-id langbot-fake-load-local
bin/lbs test run langbot-fake-provider-debug-chat-slow-load --run-id langbot-fake-slow-local
bin/lbs test run langbot-fake-provider-debug-chat-fault-recovery --run-id langbot-fake-fault-local
bin/lbs test run langbot-space-debug-chat-concurrency-smoke --run-id langbot-space-smoke-local
```
Cross-pipeline Debug Chat isolation is a separate manual regression gate because
current releases may fail it due to product bug #2286:
```bash
bin/lbs suite plan langbot-debug-chat-isolation-gate
bin/lbs suite run langbot-debug-chat-isolation-gate --run-id langbot-debug-chat-isolation-local --include-manual-check
```
Start with `langbot-fake-provider-debug-chat-load`. It launches a local
OpenAI-compatible fake provider, creates the matching provider/model/pipeline,
then sends concurrent WebSocket Debug Chat messages through the real backend.
Use `langbot-fake-provider-debug-chat-slow-load` to measure the same path under
deterministic streaming latency. Use
`langbot-fake-provider-debug-chat-fault-recovery` to inject bounded provider
HTTP failures and confirm later Debug Chat requests recover. Use the separate
`langbot-debug-chat-isolation-gate` to verify that concurrent Debug Chat traffic
on two pipelines does not leak assistant responses across pipeline boundaries;
current releases may fail that gate because of #2286, so keep it out of the
normal load gate until the product fix lands.
Use `langbot-space-debug-chat-concurrency-smoke` only as a low-volume live
provider smoke; it includes Space/model/network latency and should be compared
against the fake-provider baseline before attributing failures to LangBot.
`manual_check` means the agent must confirm the declared preconditions for that
run window. When setup automation is declared, run output may stop early with
`env_issue`; fix that environment input before treating the product path as
measured.
## 6. Read Results
Suite reports live under `skills/reports/`. Evidence lives under
`skills/reports/evidence/<run-id>/`.
For performance cases, inspect:
- `metrics.json` for p50/p95/p99, error rate, and total duration;
- `automation-result.json` for threshold decisions and artifacts;
- `console.log` and `network.log` for frontend/API failures;
- backend logs for provider, runner, WebSocket, or persistence failures.
Do not call a user-path performance result a LangBot overhead regression until
provider/tool/network time has been separated or ruled out.
+109 -2
View File
@@ -48,7 +48,18 @@
},
"type": {
"type": "string",
"enum": ["smoke", "regression", "feature", "provider", "exploratory"]
"enum": [
"smoke",
"regression",
"feature",
"provider",
"exploratory",
"contract",
"performance",
"reliability",
"chaos",
"security"
]
},
"priority": {
"type": "string",
@@ -102,7 +113,11 @@
"backend_log",
"frontend_log",
"api_diagnostic",
"filesystem"
"filesystem",
"metrics",
"trace",
"profile",
"resource_log"
]
},
"minItems": 1
@@ -188,9 +203,101 @@
"type": "string",
"enum": ["person", "group"]
},
"automation_debug_chat_response_p95_ms": {
"type": "string"
},
"automation_debug_chat_max_error_rate": {
"type": "string"
},
"automation_debug_chat_load_requests": {
"type": "string"
},
"automation_debug_chat_load_concurrency": {
"type": "string"
},
"automation_debug_chat_load_timeout_ms": {
"type": "string"
},
"automation_debug_chat_load_response_p95_ms": {
"type": "string"
},
"automation_debug_chat_load_first_response_p95_ms": {
"type": "string"
},
"automation_debug_chat_load_max_error_rate": {
"type": "string"
},
"automation_debug_chat_load_min_error_rate": {
"type": "string"
},
"automation_debug_chat_load_min_error_count": {
"type": "string"
},
"automation_debug_chat_load_min_ok_count": {
"type": "string"
},
"automation_debug_chat_load_min_provider_fault_count": {
"type": "string"
},
"automation_debug_chat_load_expected_prefix": {
"type": "string"
},
"automation_debug_chat_load_prompt_template": {
"type": "string"
},
"automation_debug_chat_load_stream": {
"type": "string",
"enum": ["0", "1", "false", "true"]
},
"automation_debug_chat_load_reset": {
"type": "string",
"enum": ["0", "1", "false", "true"]
},
"automation_debug_chat_load_fail_on_final_mismatch": {
"type": "string",
"enum": ["0", "1", "false", "true"]
},
"automation_fake_provider_response_text": {
"type": "string"
},
"automation_fake_provider_first_token_delay_ms": {
"type": "string"
},
"automation_fake_provider_chunk_delay_ms": {
"type": "string"
},
"automation_fake_provider_chunk_count": {
"type": "string"
},
"automation_fake_provider_fail_first_n": {
"type": "string"
},
"automation_fake_provider_fail_every_n": {
"type": "string"
},
"automation_fake_provider_fault_status": {
"type": "string"
},
"automation_fake_provider_fail_after_first_chunk": {
"type": "string",
"enum": ["0", "1", "false", "true"]
},
"automation_fake_provider_dynamic_response": {
"type": "string",
"enum": ["0", "1", "false", "true"]
},
"automation_filesystem_checks_json": {
"type": "string"
},
"metrics_thresholds_json": {
"type": "string"
},
"load_profile_json": {
"type": "string"
},
"fault_model_json": {
"type": "string"
},
"automation_pipeline_url_env": {
"type": "string",
"pattern": "^[A-Z][A-Z0-9_]*$"
+11 -1
View File
@@ -18,7 +18,17 @@
},
"type": {
"type": "string",
"enum": ["smoke", "regression", "release_gate", "exploratory"]
"enum": [
"smoke",
"regression",
"release_gate",
"exploratory",
"contract",
"performance",
"reliability",
"chaos",
"security"
]
},
"priority": {
"type": "string",
@@ -0,0 +1,205 @@
#!/usr/bin/env node
import { spawn } from "node:child_process";
import { mkdir, readFile, writeFile } from "node:fs/promises";
import { dirname, resolve } from "node:path";
import { env } from "node:process";
import {
appendLine,
ensureEvidence,
evidencePaths,
loadEnvFiles,
redact,
writeResult,
} from "./lib/langbot-e2e.mjs";
const caseId = "ensure-fake-provider-cross-pipelines";
const DEFAULT_PIPELINE_A_NAME = "LangBot QA Fake Provider Debug Chat A";
const DEFAULT_PIPELINE_B_NAME = "LangBot QA Fake Provider Debug Chat B";
await loadEnvFiles();
const paths = evidencePaths(caseId);
await ensureEvidence(paths);
const writeEnv = process.argv.includes("--write-env");
const envLocalPath = resolve("skills/.env.local");
const pipelineAName = env.LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME || DEFAULT_PIPELINE_A_NAME;
const pipelineBName = env.LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME || DEFAULT_PIPELINE_B_NAME;
const result = {
source: "setup_automation",
case_id: caseId,
run_id: paths.runId,
status: "fail",
reason: "",
pipeline_a: {
name: pipelineAName,
id: "",
url: "",
},
pipeline_b: {
name: pipelineBName,
id: "",
url: "",
},
fake_provider: {
url: "",
base_url: "",
pid: null,
},
wrote_env: false,
evidence: {
console_log: paths.consoleLog,
automation_result_json: paths.automationResultJson,
result_json: paths.resultJson,
},
evidence_collected: ["api_diagnostic", "filesystem"],
};
try {
console.error(`[langbot-qa] configuring cross-pipeline QA fixtures: pipeline_a=\"${pipelineAName}\", pipeline_b=\"${pipelineBName}\"`);
console.error("[langbot-qa] run these fake-provider setup/probe commands serially when they share LANGBOT_FAKE_PROVIDER_URL.");
if (pipelineAName === pipelineBName) {
throw new Error("LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME and LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME must be different.");
}
const setupA = await runPipelineSetup(pipelineAName, "A");
const setupB = await runPipelineSetup(pipelineBName, "B");
result.pipeline_a = {
name: setupA.pipeline_name || pipelineAName,
id: setupA.pipeline_id || "",
url: setupA.pipeline_url || "",
};
result.pipeline_b = {
name: setupB.pipeline_name || pipelineBName,
id: setupB.pipeline_id || "",
url: setupB.pipeline_url || "",
};
result.fake_provider = {
url: setupB.fake_provider?.url || setupA.fake_provider?.url || "",
base_url: setupB.fake_provider?.base_url || setupA.fake_provider?.base_url || "",
pid: setupB.fake_provider?.pid ?? setupA.fake_provider?.pid ?? null,
};
if (!result.pipeline_a.url || !result.pipeline_b.url || !result.fake_provider.url) {
throw new Error("Cross-pipeline fake provider setup did not return both pipeline URLs and provider URL.");
}
if (writeEnv) {
await upsertEnvLocal(envLocalPath, {
LANGBOT_FAKE_PROVIDER_URL: result.fake_provider.url,
LANGBOT_FAKE_PROVIDER_BASE_URL: result.fake_provider.base_url,
LANGBOT_FAKE_PROVIDER_PID: result.fake_provider.pid ? String(result.fake_provider.pid) : "",
LANGBOT_FAKE_PROVIDER_PIPELINE_A_URL: result.pipeline_a.url,
LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME: result.pipeline_a.name,
LANGBOT_FAKE_PROVIDER_PIPELINE_B_URL: result.pipeline_b.url,
LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME: result.pipeline_b.name,
});
result.wrote_env = true;
}
result.status = "pass";
result.reason = "Fake provider cross-pipeline fixtures are configured.";
} catch (error) {
result.status = looksLikeEnvIssue(error) ? "env_issue" : "fail";
result.reason = safeReason(error.message);
} finally {
await writeResult(paths, result);
console.log(JSON.stringify(result, null, 2));
}
process.exit(result.status === "pass" ? 0 : result.status === "env_issue" ? 2 : 1);
function runPipelineSetup(pipelineName, label) {
return new Promise((resolvePromise, rejectPromise) => {
const child = spawn(process.execPath, ["scripts/e2e/ensure-fake-provider-pipeline.mjs"], {
cwd: resolve("."),
env: {
...env,
LANGBOT_FAKE_PROVIDER_PIPELINE_NAME: pipelineName,
LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS: env.LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS || "25",
LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS: env.LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS || "10",
LANGBOT_FAKE_PROVIDER_CHUNK_COUNT: env.LANGBOT_FAKE_PROVIDER_CHUNK_COUNT || "0",
LANGBOT_FAKE_PROVIDER_FAIL_FIRST_N: "0",
LANGBOT_FAKE_PROVIDER_FAIL_EVERY_N: "0",
LANGBOT_FAKE_PROVIDER_FAULT_STATUS: env.LANGBOT_FAKE_PROVIDER_FAULT_STATUS || "500",
LANGBOT_FAKE_PROVIDER_FAIL_AFTER_FIRST_CHUNK: "false",
LANGBOT_FAKE_PROVIDER_DYNAMIC_RESPONSE: "true",
},
stdio: ["ignore", "pipe", "pipe"],
});
let stdout = "";
let stderr = "";
child.stdout.on("data", (chunk) => {
const text = chunk.toString();
stdout += text;
appendLine(paths.consoleLog, `[setup ${label} stdout] ${text.trimEnd()}`).catch(() => {});
});
child.stderr.on("data", (chunk) => {
const text = chunk.toString();
stderr += text;
appendLine(paths.consoleLog, `[setup ${label} stderr] ${text.trimEnd()}`).catch(() => {});
});
child.on("error", rejectPromise);
child.on("close", (code) => {
const parsed = parseJsonOutput(stdout);
if (code !== 0 || parsed.status !== "pass") {
rejectPromise(new Error(parsed.reason || stderr || `Fake provider pipeline setup ${label} exited with ${code}.`));
return;
}
resolvePromise(parsed);
});
});
}
function parseJsonOutput(text) {
const trimmed = String(text || "").trim();
if (!trimmed) return {};
try {
return JSON.parse(trimmed);
} catch {
const start = trimmed.indexOf("{");
const end = trimmed.lastIndexOf("}");
if (start >= 0 && end > start) {
try {
return JSON.parse(trimmed.slice(start, end + 1));
} catch {
return {};
}
}
return {};
}
}
async function upsertEnvLocal(path, updates) {
await mkdir(dirname(path), { recursive: true });
let text = "";
try {
text = await readFile(path, "utf8");
} catch {
text = "";
}
const lines = text.split(/\r?\n/);
const seen = new Set();
const next = lines.map((line) => {
const trimmed = line.trim();
const match = trimmed.match(/^([A-Z][A-Z0-9_]*)=/);
if (!match || updates[match[1]] === undefined) return line;
seen.add(match[1]);
return `${match[1]}=${updates[match[1]]}`;
});
for (const [key, value] of Object.entries(updates)) {
if (!seen.has(key)) next.push(`${key}=${value}`);
}
await writeFile(path, `${next.join("\n").replace(/\n+$/, "")}\n`, "utf8");
}
function looksLikeEnvIssue(error) {
const message = String(error?.message || error || "");
return /fetch failed|ECONNREFUSED|ENOTFOUND|LANGBOT_.*not configured|Could not read recovery_key|Backend did not respond/i.test(message);
}
function safeReason(value) {
return redact(String(value || "")).slice(0, 1000);
}
@@ -0,0 +1,635 @@
#!/usr/bin/env node
import { spawn } from "node:child_process";
import { open, readFile, mkdir, writeFile } from "node:fs/promises";
import { dirname, resolve } from "node:path";
import { env } from "node:process";
import {
apiJson,
ensureEvidence,
evidencePaths,
loadEnvFiles,
redact,
resetAndAuthLocalUser,
writeResult,
} from "./lib/langbot-e2e.mjs";
const RUNNER_ID = "local-agent";
const DEFAULT_LOCAL_PASSWORD = "LangBotE2ELocalPass!2026";
const DEFAULT_PIPELINE_NAME = "LangBot QA Fake Provider Debug Chat";
const DEFAULT_PROVIDER_NAME = "LangBot QA Fake OpenAI Provider";
const QA_RESOURCE_DESCRIPTION = "Managed by LangBot skills QA automation for controlled fake-provider Debug Chat tests. Safe to delete when local QA fixtures are no longer needed.";
const DEFAULT_MODEL_NAME = "gpt-4o-mini";
const DEFAULT_REQUESTER = "openai-chat-completions";
const caseId = "ensure-fake-provider-pipeline";
await loadEnvFiles();
const paths = evidencePaths(caseId);
await ensureEvidence(paths);
const writeEnv = process.argv.includes("--write-env");
const frontendUrl = env.LANGBOT_FRONTEND_URL || "";
const backendUrl = env.LANGBOT_BACKEND_URL || "";
const envLocalPath = resolve("skills/.env.local");
const repoRoot = resolve(env.LANGBOT_REPO || "..");
const fakeStateDir = resolve(env.LANGBOT_FAKE_PROVIDER_STATE_DIR || resolve(repoRoot, ".qa/fake-provider"));
const fakeStatePath = resolve(fakeStateDir, "state.json");
const fakeStdoutPath = resolve(fakeStateDir, "fake-provider.stdout.log");
const fakeStderrPath = resolve(fakeStateDir, "fake-provider.stderr.log");
const pipelineName = env.LANGBOT_FAKE_PROVIDER_PIPELINE_NAME || DEFAULT_PIPELINE_NAME;
const providerName = env.LANGBOT_FAKE_PROVIDER_NAME || DEFAULT_PROVIDER_NAME;
const requester = env.LANGBOT_FAKE_PROVIDER_REQUESTER || DEFAULT_REQUESTER;
const modelName = env.LANGBOT_FAKE_PROVIDER_MODEL_NAME || DEFAULT_MODEL_NAME;
const result = {
source: "automation",
case_id: caseId,
run_id: paths.runId,
status: "fail",
reason: "",
frontend_url: frontendUrl,
backend_url: backendUrl,
fake_provider: {
url: "",
base_url: "",
pid: null,
reused: false,
config: {},
state_file: fakeStatePath,
stdout_log: fakeStdoutPath,
stderr_log: fakeStderrPath,
},
provider: {
uuid: "",
name: providerName,
requester,
created: false,
updated: false,
},
model: {
uuid: "",
name: modelName,
created: false,
updated: false,
test_status: "not_run",
test_reason: "",
},
pipeline_id: "",
pipeline_name: pipelineName,
pipeline_url: "",
created: false,
updated: false,
wrote_env: false,
evidence: {
console_log: paths.consoleLog,
network_log: paths.networkLog,
automation_result_json: paths.automationResultJson,
result_json: paths.resultJson,
},
evidence_collected: ["api_diagnostic", "network", "filesystem"],
};
try {
console.error(`[langbot-qa] configuring QA-owned fake-provider fixtures: provider=\"${providerName}\", pipeline=\"${pipelineName}\"`);
console.error("[langbot-qa] this setup may create or update local QA provider/model/pipeline resources on the selected backend.");
if (!backendUrl) {
result.status = "env_issue";
throw new Error("LANGBOT_BACKEND_URL is not configured.");
}
if (!frontendUrl) {
result.status = "env_issue";
throw new Error("LANGBOT_FRONTEND_URL is not configured.");
}
const fakeProvider = await ensureFakeProvider();
const setupConfig = await configureFakeProvider(fakeProvider.url, healthyFakeProviderConfig(), true);
result.fake_provider = {
...result.fake_provider,
...fakeProvider,
config: setupConfig.config || healthyFakeProviderConfig(),
};
const user = env.LANGBOT_E2E_LOGIN_USER || "";
const password = env.LANGBOT_E2E_LOGIN_PASSWORD || DEFAULT_LOCAL_PASSWORD;
if (!user) {
result.status = "env_issue";
throw new Error("LANGBOT_E2E_LOGIN_USER is required so this setup can create/update the fake provider pipeline.");
}
const auth = await resetAndAuthLocalUser({ backendUrl, user, password });
const wizard = await skipWizard({ backendUrl, token: auth.token });
if (wizard.status !== "pass") {
result.status = "fail";
throw new Error(wizard.reason || "Failed to mark the local QA wizard as skipped.");
}
const provider = await ensureProvider({
backendUrl,
token: auth.token,
name: providerName,
requester,
baseUrl: fakeProvider.base_url,
});
result.provider = provider;
const model = await ensureModel({
backendUrl,
token: auth.token,
providerUuid: provider.uuid,
name: modelName,
});
result.model = model;
const pipeline = await ensurePipeline({
backendUrl,
token: auth.token,
name: pipelineName,
modelUuid: model.uuid,
});
Object.assign(result, pipeline);
result.pipeline_url = `${frontendUrl.replace(/\/$/, "")}/home/pipelines?id=${encodeURIComponent(pipeline.pipeline_id)}`;
const runConfig = await configureFakeProvider(fakeProvider.url, targetFakeProviderConfig(), true);
result.fake_provider.config = runConfig.config || targetFakeProviderConfig();
if (writeEnv) {
await upsertEnvLocal(envLocalPath, {
LANGBOT_E2E_LOGIN_USER: user,
LANGBOT_FAKE_PROVIDER_URL: fakeProvider.url,
LANGBOT_FAKE_PROVIDER_BASE_URL: fakeProvider.base_url,
LANGBOT_FAKE_PROVIDER_PID: fakeProvider.pid ? String(fakeProvider.pid) : "",
LANGBOT_FAKE_PROVIDER_PROVIDER_UUID: provider.uuid,
LANGBOT_FAKE_PROVIDER_MODEL_UUID: model.uuid,
LANGBOT_FAKE_PROVIDER_PIPELINE_URL: result.pipeline_url,
LANGBOT_FAKE_PROVIDER_PIPELINE_NAME: pipelineName,
});
result.wrote_env = true;
}
result.status = "pass";
result.reason = `Fake provider pipeline is configured with ${requester}/${modelName}.`;
} catch (error) {
result.status = result.status === "env_issue" ? "env_issue" : "fail";
result.reason = result.reason || safeReason(error.message);
} finally {
await writeResult(paths, result);
console.log(JSON.stringify(result, null, 2));
}
process.exit(result.status === "pass" ? 0 : result.status === "env_issue" ? 2 : 1);
async function ensureFakeProvider() {
const envUrl = normalizeProviderRootUrl(env.LANGBOT_FAKE_PROVIDER_URL || "");
if (envUrl && await fakeProviderHealthy(envUrl) && await fakeProviderConfigurable(envUrl)) {
return {
url: envUrl,
base_url: `${envUrl}/v1`,
pid: null,
reused: true,
};
}
const state = await readState(fakeStatePath);
const stateUrl = normalizeProviderRootUrl(state.url || "");
if (stateUrl && await fakeProviderHealthy(stateUrl)) {
if (await fakeProviderConfigurable(stateUrl)) {
return {
url: stateUrl,
base_url: state.base_url || `${stateUrl}/v1`,
pid: Number.isInteger(state.pid) ? state.pid : null,
reused: true,
};
}
if (Number.isInteger(state.pid)) await stopProcess(state.pid);
}
await mkdir(fakeStateDir, { recursive: true });
await writeFile(fakeStatePath, `${JSON.stringify({ status: "starting", started_at: new Date().toISOString() }, null, 2)}\n`, "utf8");
const stdout = await open(fakeStdoutPath, "a");
const stderr = await open(fakeStderrPath, "a");
const scriptPath = resolve("scripts/e2e/fake-openai-provider.mjs");
const host = env.LANGBOT_FAKE_PROVIDER_HOST || "127.0.0.1";
const port = env.LANGBOT_FAKE_PROVIDER_PORT || "0";
const child = spawn(process.execPath, [
scriptPath,
`--host=${host}`,
`--port=${port}`,
`--state-file=${fakeStatePath}`,
], {
cwd: resolve("."),
detached: true,
env: {
...env,
LANGBOT_FAKE_PROVIDER_MODEL_NAME: modelName,
},
stdio: ["ignore", stdout.fd, stderr.fd],
});
child.unref();
await stdout.close();
await stderr.close();
const started = await waitForFakeProviderState(fakeStatePath, child.pid, 10_000);
if (!started.url || !await fakeProviderHealthy(started.url) || !await fakeProviderConfigurable(started.url)) {
throw new Error(`Fake provider did not become healthy. See ${fakeStderrPath}`);
}
return {
url: started.url,
base_url: started.base_url || `${started.url}/v1`,
pid: child.pid ?? started.pid ?? null,
reused: false,
};
}
async function configureFakeProvider(rootUrl, config, resetRequestCount) {
const response = await fetch(`${normalizeProviderRootUrl(rootUrl)}/__qa/config`, {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify({
config,
reset_request_count: resetRequestCount,
}),
signal: AbortSignal.timeout(3000),
});
const json = await response.json().catch(() => ({}));
if (!response.ok || json.ok !== true) {
throw new Error(`Fake provider config failed with HTTP ${response.status}.`);
}
return json;
}
async function fakeProviderHealthy(rootUrl) {
try {
const response = await fetch(`${rootUrl.replace(/\/$/, "")}/healthz`, {
signal: AbortSignal.timeout(2000),
});
if (!response.ok) return false;
const json = await response.json().catch(() => ({}));
return json.ok === true;
} catch {
return false;
}
}
async function fakeProviderConfigurable(rootUrl) {
try {
const response = await fetch(`${rootUrl.replace(/\/$/, "")}/__qa/config`, {
signal: AbortSignal.timeout(2000),
});
if (!response.ok) return false;
const json = await response.json().catch(() => ({}));
return json.ok === true && json.config && typeof json.config === "object";
} catch {
return false;
}
}
async function stopProcess(pid) {
try {
process.kill(pid, "SIGTERM");
} catch {
return;
}
await sleep(500);
}
async function waitForFakeProviderState(path, expectedPid, timeoutMs) {
const startedAt = Date.now();
let lastState = {};
while (Date.now() - startedAt < timeoutMs) {
const state = await readState(path);
if (state.url && (!expectedPid || state.pid === expectedPid)) return state;
lastState = state;
await sleep(150);
}
return lastState;
}
async function readState(path) {
try {
return JSON.parse(await readFile(path, "utf8"));
} catch {
return {};
}
}
function normalizeProviderRootUrl(value) {
const trimmed = String(value || "").trim().replace(/\/$/, "");
return trimmed.endsWith("/v1") ? trimmed.slice(0, -3) : trimmed;
}
function healthyFakeProviderConfig() {
return {
response_text: "OK",
first_token_delay_ms: 25,
chunk_delay_ms: 10,
chunk_count: 0,
fault_status: 500,
fail_first_n: 0,
fail_every_n: 0,
fail_after_first_chunk: false,
dynamic_response: true,
};
}
function targetFakeProviderConfig() {
return {
response_text: env.LANGBOT_FAKE_PROVIDER_RESPONSE_TEXT || "OK",
first_token_delay_ms: nonNegativeInteger(env.LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS, 25),
chunk_delay_ms: nonNegativeInteger(env.LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS, 10),
chunk_count: nonNegativeInteger(env.LANGBOT_FAKE_PROVIDER_CHUNK_COUNT, 0),
fault_status: httpFaultStatus(env.LANGBOT_FAKE_PROVIDER_FAULT_STATUS, 500),
fail_first_n: nonNegativeInteger(env.LANGBOT_FAKE_PROVIDER_FAIL_FIRST_N, 0),
fail_every_n: nonNegativeInteger(env.LANGBOT_FAKE_PROVIDER_FAIL_EVERY_N, 0),
fail_after_first_chunk: envBool(env.LANGBOT_FAKE_PROVIDER_FAIL_AFTER_FIRST_CHUNK, false),
dynamic_response: envBool(env.LANGBOT_FAKE_PROVIDER_DYNAMIC_RESPONSE, true),
};
}
async function skipWizard({ backendUrl, token }) {
const response = await apiJson(backendUrl, "/api/v1/system/wizard/completed", {
method: "POST",
token,
body: { status: "skipped" },
});
const ok = response.status < 400 && response.json.code === 0;
return {
status: ok ? "pass" : "fail",
http_status: response.status,
code: response.json.code ?? null,
reason: ok ? "Wizard marked skipped for local QA." : response.json.msg || "Wizard status update failed.",
};
}
async function ensureProvider({ backendUrl, token, name, requester, baseUrl }) {
const list = await apiJson(backendUrl, "/api/v1/provider/providers", { token });
if (isApiFailure(list)) {
throw new Error(list.json.msg || "Failed to list providers.");
}
const providers = list.json.data?.providers || [];
const existing = providers.find((provider) => (
provider.name === name
|| (provider.requester === requester && String(provider.base_url || "").replace(/\/$/, "") === baseUrl.replace(/\/$/, ""))
));
const body = {
name,
requester,
base_url: baseUrl,
api_keys: [env.LANGBOT_FAKE_PROVIDER_API_KEY || "langbot-fake-provider-key"],
};
if (existing?.uuid) {
const update = await apiJson(backendUrl, `/api/v1/provider/providers/${encodeURIComponent(existing.uuid)}`, {
method: "PUT",
token,
body,
});
if (isApiFailure(update)) {
throw new Error(update.json.msg || "Failed to update fake provider.");
}
return {
uuid: existing.uuid,
name,
requester,
created: false,
updated: true,
};
}
const create = await apiJson(backendUrl, "/api/v1/provider/providers", {
method: "POST",
token,
body,
});
const uuid = create.json.data?.uuid || "";
if (isApiFailure(create) || !uuid) {
throw new Error(create.json.msg || "Failed to create fake provider.");
}
return {
uuid,
name,
requester,
created: true,
updated: false,
};
}
async function ensureModel({ backendUrl, token, providerUuid, name }) {
const list = await apiJson(backendUrl, `/api/v1/provider/models/llm?provider_uuid=${encodeURIComponent(providerUuid)}`, { token });
if (isApiFailure(list)) {
throw new Error(list.json.msg || "Failed to list fake provider models.");
}
const models = list.json.data?.models || [];
const existing = models.find((model) => model.name === name);
const body = {
name,
provider_uuid: providerUuid,
abilities: [],
context_length: positiveInteger(env.LANGBOT_FAKE_PROVIDER_CONTEXT_LENGTH, 8192),
extra_args: {},
prefered_ranking: 0,
};
let modelUuid = existing?.uuid || "";
let created = false;
let updated = false;
if (modelUuid) {
const update = await apiJson(backendUrl, `/api/v1/provider/models/llm/${encodeURIComponent(modelUuid)}`, {
method: "PUT",
token,
body,
});
if (isApiFailure(update)) {
throw new Error(update.json.msg || "Failed to update fake provider model.");
}
updated = true;
} else {
const create = await apiJson(backendUrl, "/api/v1/provider/models/llm", {
method: "POST",
token,
body,
});
modelUuid = create.json.data?.uuid || "";
if (isApiFailure(create) || !modelUuid) {
throw new Error(create.json.msg || "Failed to create fake provider model.");
}
created = true;
}
const test = await apiJson(backendUrl, `/api/v1/provider/models/llm/${encodeURIComponent(modelUuid)}/test`, {
method: "POST",
token,
body: { extra_args: {} },
});
if (isApiFailure(test)) {
throw new Error(safeReason(test.json.msg || test.json.message || "Fake provider model test failed."));
}
return {
uuid: modelUuid,
name,
created,
updated,
test_status: "pass",
test_reason: "",
};
}
async function ensurePipeline({ backendUrl, token, name, modelUuid }) {
const list = await apiJson(backendUrl, "/api/v1/pipelines", { token });
if (isApiFailure(list)) {
throw new Error(list.json.msg || "Failed to list pipelines.");
}
const pipelines = list.json.data?.pipelines || [];
let pipeline = pipelines.find((item) => item.name === name) || null;
let created = false;
if (!pipeline) {
const create = await apiJson(backendUrl, "/api/v1/pipelines", {
method: "POST",
token,
body: {
name,
description: QA_RESOURCE_DESCRIPTION,
emoji: "QA",
},
});
const pipelineId = create.json.data?.uuid || "";
if (isApiFailure(create) || !pipelineId) {
throw new Error(create.json.msg || "Failed to create fake provider pipeline.");
}
created = true;
pipeline = { uuid: pipelineId };
}
const loaded = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(pipeline.uuid)}`, { token });
pipeline = loaded.json.data?.pipeline || null;
if (isApiFailure(loaded) || !pipeline?.uuid) {
throw new Error(loaded.json.msg || "Failed to load fake provider pipeline.");
}
const config = pipeline.config && typeof pipeline.config === "object" ? pipeline.config : {};
const ai = config.ai && typeof config.ai === "object" ? config.ai : {};
const existingLocalAgentConfig = ai["local-agent"] && typeof ai["local-agent"] === "object"
? ai["local-agent"]
: {};
const localAgentConfig = {
timeout: 60,
prompt: [{ role: "system", content: "You are a deterministic QA assistant. Reply exactly as instructed." }],
"remove-think": false,
"knowledge-bases": [],
"box-session-id-template": "{launcher_type}_{launcher_id}",
"retrieval-top-k": 5,
"rerank-model": "",
"rerank-top-k": 5,
"max-tool-iterations": 20,
"tool-execution-mode": "parallel",
"max-tool-result-chars": 20000,
"context-history-fetch-limit": 20,
"context-window-tokens": 8192,
"context-reserve-tokens": 1024,
"context-keep-recent-tokens": 2048,
"context-summary-tokens": 1024,
...existingLocalAgentConfig,
// Current backend truncation still reads this field directly.
"max-round": positiveInteger(existingLocalAgentConfig["max-round"], 10),
model: {
primary: modelUuid,
fallbacks: [],
},
};
const updatedConfig = {
...config,
ai: {
...ai,
runner: {
...(ai.runner && typeof ai.runner === "object" ? ai.runner : {}),
id: RUNNER_ID,
runner: RUNNER_ID,
"expire-time": 0,
},
"local-agent": localAgentConfig,
},
};
const update = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(pipeline.uuid)}`, {
method: "PUT",
token,
body: {
name,
description: QA_RESOURCE_DESCRIPTION,
emoji: "QA",
config: updatedConfig,
},
});
if (isApiFailure(update)) {
throw new Error(update.json.msg || "Failed to update fake provider pipeline.");
}
return {
pipeline_id: pipeline.uuid,
pipeline_name: name,
created,
updated: true,
};
}
function isApiFailure(response) {
return response.status >= 400 || (response.json.code !== undefined && response.json.code !== 0);
}
function positiveInteger(value, fallback) {
const parsed = Number(value);
return Number.isInteger(parsed) && parsed > 0 ? parsed : fallback;
}
function nonNegativeInteger(value, fallback) {
const parsed = Number(value);
return Number.isInteger(parsed) && parsed >= 0 ? parsed : fallback;
}
function httpFaultStatus(value, fallback) {
const parsed = Number(value);
return Number.isInteger(parsed) && parsed >= 400 && parsed <= 599 ? parsed : fallback;
}
function envBool(value, fallback) {
if (value === undefined || value === "") return fallback;
if (/^(1|true|yes|on)$/i.test(String(value))) return true;
if (/^(0|false|no|off)$/i.test(String(value))) return false;
return fallback;
}
function sleep(ms) {
return new Promise((resolve) => setTimeout(resolve, ms));
}
function safeReason(value) {
return redact(String(value || "")).slice(0, 1000);
}
async function upsertEnvLocal(path, updates) {
await mkdir(dirname(path), { recursive: true });
let text = "";
try {
text = await readFile(path, "utf8");
} catch {
text = "";
}
const lines = text.split(/\r?\n/);
const seen = new Set();
const next = lines.map((line) => {
const trimmed = line.trim();
const equals = trimmed.indexOf("=");
if (equals <= 0 || trimmed.startsWith("#")) return line;
const key = trimmed.slice(0, equals).trim();
if (!(key in updates)) return line;
seen.add(key);
return `${key}=${updates[key]}`;
});
for (const [key, value] of Object.entries(updates)) {
if (!seen.has(key)) next.push(`${key}=${value}`);
}
await writeFile(path, `${next.filter((line, index) => line !== "" || index < next.length - 1).join("\n")}\n`, "utf8");
}
@@ -10,6 +10,7 @@ import {
ensureEvidence,
evidencePaths,
loadEnvFiles,
redact,
resetAndAuthLocalUser,
safeScreenshot,
setBrowserToken,
@@ -17,9 +18,12 @@ import {
writeResult,
} from "./lib/langbot-e2e.mjs";
const RUNNER_ID = "plugin:langbot/local-agent/default";
const RUNNER_ID = "local-agent";
const SPACE_PROVIDER_UUID = "00000000-0000-0000-0000-000000000000";
const DEFAULT_PIPELINE_NAME = "Agent QA Local Agent Debug Chat";
const DEFAULT_LOCAL_PASSWORD = "LangBotE2ELocalPass!2026";
const DEFAULT_MODEL_TEST_LIMIT = 8;
const DEFAULT_MODEL_FALLBACK_COUNT = 3;
const caseId = "ensure-local-agent-pipeline";
await loadEnvFiles();
@@ -45,11 +49,18 @@ const result = {
pipeline_url: "",
runner_id: RUNNER_ID,
selected_model_id: "",
selected_model_name: "",
fallback_model_ids: [],
model_count: 0,
space_model_count: 0,
scanned_space_model_count: 0,
tested_model_count: 0,
model_tests: [],
created: false,
updated: false,
wrote_env: false,
auth: null,
wizard: null,
browser_token_check: null,
page_signal: "",
evidence: {
@@ -71,6 +82,7 @@ try {
const user = env.LANGBOT_E2E_LOGIN_USER || "";
const password = env.LANGBOT_E2E_LOGIN_PASSWORD || DEFAULT_LOCAL_PASSWORD;
if (!user) {
result.status = "env_issue";
throw new Error("LANGBOT_E2E_LOGIN_USER is required so this setup can create/update the pipeline via backend API.");
}
@@ -81,6 +93,13 @@ try {
backend_token_check: auth.check,
};
const wizard = await skipWizard({ backendUrl, token: auth.token });
result.wizard = wizard;
if (wizard.status !== "pass") {
result.status = "fail";
throw new Error(wizard.reason || "Failed to mark the local QA wizard as skipped.");
}
const prepared = await ensureLocalAgentPipeline({
backendUrl,
token: auth.token,
@@ -99,6 +118,10 @@ try {
LANGBOT_PIPELINE_NAME: result.pipeline_name || pipelineName,
LANGBOT_LOCAL_AGENT_PIPELINE_URL: result.pipeline_url,
LANGBOT_LOCAL_AGENT_PIPELINE_NAME: result.pipeline_name || pipelineName,
...(result.selected_model_id ? {
LANGBOT_LOCAL_AGENT_MODEL_UUID: result.selected_model_id,
LANGBOT_E2E_MODEL_UUID: result.selected_model_id,
} : {}),
});
result.wrote_env = true;
}
@@ -127,6 +150,21 @@ try {
process.exit(result.status === "pass" ? 0 : result.status === "env_issue" ? 2 : 1);
async function skipWizard({ backendUrl, token }) {
const response = await apiJson(backendUrl, "/api/v1/system/wizard/completed", {
method: "POST",
token,
body: { status: "skipped" },
});
const ok = response.status < 400 && response.json.code === 0;
return {
status: ok ? "pass" : "fail",
http_status: response.status,
code: response.json.code ?? null,
reason: ok ? "Wizard marked skipped for local QA." : response.json.msg || "Wizard status update failed.",
};
}
async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runnerId }) {
const [pipelineList, modelList] = await Promise.all([
apiJson(backendUrl, "/api/v1/pipelines", { token }),
@@ -149,7 +187,19 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
}
const models = modelList.json.data?.models || [];
const selectedModel = models.find((model) => model.uuid) || null;
const skippedModelIds = new Set(
String(env.LANGBOT_E2E_SKIP_MODEL_UUIDS || "")
.split(",")
.map((item) => item.trim())
.filter(Boolean),
);
const skippedModelNames = new Set(
String(env.LANGBOT_E2E_SKIP_MODEL_NAMES || "")
.split(",")
.map((item) => item.trim())
.filter(Boolean),
);
const spaceModels = models.filter((model) => isSpaceModel(model) && !skippedModelIds.has(model.uuid));
const pipelines = pipelineList.json.data?.pipelines || [];
let pipeline = pipelines.find((item) => item.name === pipelineName) || null;
let created = false;
@@ -170,6 +220,7 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
reason: createdResponse.json.msg || "Failed to create pipeline.",
create_status: createdResponse.status,
model_count: models.length,
space_model_count: spaceModels.length,
};
}
const pipelineId = createdResponse.json.data?.uuid || "";
@@ -183,6 +234,7 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
status: "fail",
reason: "Pipeline was not created or resolved.",
model_count: models.length,
space_model_count: spaceModels.length,
};
}
@@ -194,27 +246,37 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
get_status: loaded.status,
pipeline_id: pipeline.uuid,
model_count: models.length,
space_model_count: spaceModels.length,
};
}
pipeline = loaded.json.data.pipeline;
const config = pipeline.config && typeof pipeline.config === "object" ? pipeline.config : {};
const ai = config.ai && typeof config.ai === "object" ? config.ai : {};
const runnerConfig = ai.runner_config && typeof ai.runner_config === "object" ? ai.runner_config : {};
const rawExistingLocalAgentConfig = runnerConfig[runnerId] && typeof runnerConfig[runnerId] === "object"
? runnerConfig[runnerId]
const rawExistingLocalAgentConfig = ai["local-agent"] && typeof ai["local-agent"] === "object"
? ai["local-agent"]
: {};
const existingLocalAgentConfig = rawExistingLocalAgentConfig;
const existingModel = existingLocalAgentConfig.model && typeof existingLocalAgentConfig.model === "object"
? existingLocalAgentConfig.model
: {};
const requestedModelId = env.LANGBOT_LOCAL_AGENT_MODEL_UUID || env.LANGBOT_E2E_MODEL_UUID || "";
const selectedModelId = requestedModelId || existingModel.primary || selectedModel?.uuid || "";
const selected = await selectWorkingSpaceModel({
backendUrl,
token,
models,
skippedModelIds,
skippedModelNames,
requestedModelId,
existingModelId: existingModel.primary || "",
});
const selectedModelId = selected.selected_model_id || "";
const localAgentConfig = {
timeout: 300,
prompt: [{ role: "system", content: "You are a helpful assistant." }],
"remove-think": false,
"knowledge-bases": [],
"box-session-id-template": "{launcher_type}_{launcher_id}",
"retrieval-top-k": 5,
"rerank-model": "",
"rerank-top-k": 5,
@@ -227,9 +289,11 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
"context-keep-recent-tokens": 20000,
"context-summary-tokens": 8000,
...existingLocalAgentConfig,
// Current backend truncation still reads this field directly.
"max-round": positiveInteger(existingLocalAgentConfig["max-round"], 10),
model: {
primary: selectedModelId,
fallbacks: requestedModelId ? [] : Array.isArray(existingModel.fallbacks) ? existingModel.fallbacks : [],
fallbacks: selected.fallback_model_ids || [],
},
};
const updatedConfig = {
@@ -239,12 +303,10 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
runner: {
...(ai.runner && typeof ai.runner === "object" ? ai.runner : {}),
id: runnerId,
runner: runnerId,
"expire-time": 0,
},
runner_config: {
...runnerConfig,
[runnerId]: localAgentConfig,
},
"local-agent": localAgentConfig,
},
};
@@ -265,19 +327,31 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
update_status: updateResponse.status,
pipeline_id: pipeline.uuid,
model_count: models.length,
space_model_count: spaceModels.length,
scanned_space_model_count: selected.scanned_space_model_count,
tested_model_count: selected.tested_model_count,
model_tests: selected.model_tests,
selected_model_id: selectedModelId,
selected_model_name: selected.selected_model_name,
fallback_model_ids: selected.fallback_model_ids,
};
}
return {
status: selectedModelId ? "pass" : "env_issue",
reason: selectedModelId
? "Local-agent pipeline is configured for Debug Chat."
: "Pipeline was created but no LLM model is configured in this LangBot instance.",
? `Local-agent pipeline is configured for Debug Chat with Space model ${selected.selected_model_name || selectedModelId} and ${selected.fallback_model_ids.length} fallback(s).`
: selected.reason || "No working Space LLM model is configured in this LangBot instance.",
pipeline_id: pipeline.uuid,
pipeline_name: pipeline.name,
pipeline_name: pipelineName,
model_count: models.length,
space_model_count: spaceModels.length,
scanned_space_model_count: selected.scanned_space_model_count,
tested_model_count: selected.tested_model_count,
model_tests: selected.model_tests,
selected_model_id: selectedModelId,
selected_model_name: selected.selected_model_name,
fallback_model_ids: selected.fallback_model_ids,
created,
updated: true,
};
@@ -287,6 +361,229 @@ function isApiFailure(response) {
return response.status >= 400 || (response.json.code !== undefined && response.json.code !== 0);
}
function isSpaceModel(model) {
const provider = model?.provider && typeof model.provider === "object" ? model.provider : {};
return model?.provider_uuid === SPACE_PROVIDER_UUID
|| provider.uuid === SPACE_PROVIDER_UUID
|| provider.requester === "space-chat-completions"
|| provider.name === "LangBot Models";
}
async function selectWorkingSpaceModel({
backendUrl,
token,
models,
skippedModelIds,
skippedModelNames,
requestedModelId,
existingModelId,
}) {
const modelTests = [];
const testLimit = positiveInteger(env.LANGBOT_E2E_MODEL_TEST_LIMIT, DEFAULT_MODEL_TEST_LIMIT);
const fallbackCount = positiveInteger(env.LANGBOT_E2E_MODEL_FALLBACK_COUNT, DEFAULT_MODEL_FALLBACK_COUNT);
const workingModels = [];
const spaceModels = rankModels(models.filter((model) => (
model.uuid
&& isSpaceModel(model)
&& !skippedModelIds.has(model.uuid)
&& !skippedModelNames.has(model.name)
)));
const requestedModel = requestedModelId
? spaceModels.find((model) => model.uuid === requestedModelId) || null
: null;
const existingModel = existingModelId
? spaceModels.find((model) => model.uuid === existingModelId) || null
: null;
const candidates = uniqueCandidates([
...(requestedModel ? [existingCandidate(requestedModel, "requested")] : []),
...(existingModel ? [existingCandidate(existingModel, "existing-pipeline")] : []),
...spaceModels.map((model) => existingCandidate(model, "configured-space")),
]);
let scanResult = { status: "skipped", models: [], reason: "" };
if (env.LANGBOT_E2E_SCAN_SPACE_MODELS !== "false") {
scanResult = await scanSpaceModels({ backendUrl, token });
if (scanResult.status === "pass") {
const knownNames = new Set(spaceModels.map((model) => model.name));
candidates.push(...scanResult.models
.filter((model) => model.name && !knownNames.has(model.name) && !skippedModelNames.has(model.name))
.map((model) => scannedCandidate(model)));
}
}
const unique = uniqueCandidates(candidates);
for (const candidate of unique.slice(0, testLimit)) {
const test = await ensureAndTestModel({ backendUrl, token, candidate });
modelTests.push(test);
if (test.status === "pass" && test.model_uuid) {
workingModels.push(test);
if (workingModels.length >= fallbackCount + 1) break;
}
}
if (workingModels.length > 0) {
const [primary, ...fallbacks] = workingModels;
return {
status: "pass",
reason: "",
selected_model_id: primary.model_uuid,
selected_model_name: primary.model_name,
fallback_model_ids: fallbacks.map((model) => model.model_uuid),
scanned_space_model_count: scanResult.models.length,
tested_model_count: modelTests.length,
model_tests: modelTests,
};
}
const baseReason = unique.length === 0
? scanResult.reason || "No Space LLM model candidates are available."
: `No working Space LLM model found after testing ${modelTests.length} candidate(s).`;
return {
status: "env_issue",
reason: requestedModelId && !requestedModel
? `Requested Space LLM model ${requestedModelId} is missing or skipped; ${baseReason}`
: baseReason,
selected_model_id: "",
selected_model_name: "",
fallback_model_ids: [],
scanned_space_model_count: scanResult.models.length,
tested_model_count: modelTests.length,
model_tests: modelTests,
};
}
async function scanSpaceModels({ backendUrl, token }) {
const response = await apiJson(
backendUrl,
`/api/v1/provider/providers/${encodeURIComponent(SPACE_PROVIDER_UUID)}/scan-models?type=llm`,
{ token },
);
if (isApiFailure(response)) {
return {
status: "env_issue",
models: [],
reason: safeReason(response.json.msg || response.json.message || "Failed to scan Space LLM models."),
};
}
return {
status: "pass",
models: response.json.data?.models || [],
reason: "",
};
}
async function ensureAndTestModel({ backendUrl, token, candidate }) {
let modelUuid = candidate.uuid || "";
let created = false;
if (!modelUuid) {
const create = await apiJson(backendUrl, "/api/v1/provider/models/llm", {
method: "POST",
token,
body: {
name: candidate.name,
provider_uuid: SPACE_PROVIDER_UUID,
abilities: candidate.abilities || [],
context_length: candidate.context_length ?? null,
extra_args: {},
prefered_ranking: positiveInteger(candidate.prefered_ranking, 0),
},
});
modelUuid = create.json.data?.uuid || "";
if (isApiFailure(create) || !modelUuid) {
return modelTestResult(candidate, {
status: "fail",
reason: safeReason(create.json.msg || "Failed to create scanned Space model."),
http_status: create.status,
});
}
created = true;
}
const test = await apiJson(backendUrl, `/api/v1/provider/models/llm/${encodeURIComponent(modelUuid)}/test`, {
method: "POST",
token,
body: { extra_args: {} },
});
const passed = !isApiFailure(test);
if (!passed && created) {
await apiJson(backendUrl, `/api/v1/provider/models/llm/${encodeURIComponent(modelUuid)}`, {
method: "DELETE",
token,
}).catch(() => {});
}
return modelTestResult(candidate, {
status: passed ? "pass" : "fail",
reason: passed ? "" : safeReason(test.json.msg || test.json.message || "Space model test failed."),
http_status: test.status,
model_uuid: modelUuid,
created,
});
}
function modelTestResult(candidate, details) {
return {
source: candidate.source,
model_uuid: details.model_uuid || candidate.uuid || "",
model_name: candidate.name,
status: details.status,
reason: details.reason || "",
http_status: details.http_status ?? null,
created: Boolean(details.created),
};
}
function existingCandidate(model, source) {
return {
source,
uuid: model.uuid,
name: model.name,
abilities: model.abilities || [],
context_length: model.context_length,
prefered_ranking: model.prefered_ranking,
};
}
function scannedCandidate(model) {
return {
source: "scanned-space",
uuid: "",
name: model.name || model.id,
abilities: model.abilities || [],
context_length: model.context_length,
prefered_ranking: model.prefered_ranking,
};
}
function uniqueCandidates(candidates) {
const seen = new Set();
const result = [];
for (const candidate of candidates) {
const key = candidate.uuid ? `uuid:${candidate.uuid}` : `name:${candidate.name}`;
if (!candidate.name || seen.has(key)) continue;
seen.add(key);
result.push(candidate);
}
return result;
}
function rankModels(models) {
return [...models].sort((left, right) => {
const leftRank = Number.isFinite(Number(left.prefered_ranking)) ? Number(left.prefered_ranking) : 9999;
const rightRank = Number.isFinite(Number(right.prefered_ranking)) ? Number(right.prefered_ranking) : 9999;
if (leftRank !== rightRank) return leftRank - rightRank;
return String(left.name || "").localeCompare(String(right.name || ""));
});
}
function positiveInteger(value, fallback) {
const parsed = Number(value);
return Number.isInteger(parsed) && parsed > 0 ? parsed : fallback;
}
function safeReason(value) {
return redact(String(value || "")).slice(0, 1000);
}
async function upsertEnvLocal(path, updates) {
let text = "";
try {
+496
View File
@@ -0,0 +1,496 @@
#!/usr/bin/env node
import { createServer } from "node:http";
import { mkdir, writeFile } from "node:fs/promises";
import { dirname, resolve } from "node:path";
import { env, exit } from "node:process";
const args = parseArgs(process.argv.slice(2));
const host = args.host || env.LANGBOT_FAKE_PROVIDER_HOST || "127.0.0.1";
const port = integer(args.port ?? env.LANGBOT_FAKE_PROVIDER_PORT, 0);
const stateFile = args["state-file"] || env.LANGBOT_FAKE_PROVIDER_STATE_FILE || "";
const modelName = env.LANGBOT_FAKE_PROVIDER_MODEL_NAME || "gpt-4o-mini";
const config = {
response_text: env.LANGBOT_FAKE_PROVIDER_RESPONSE_TEXT || "OK",
first_token_delay_ms: integer(env.LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS, 25),
chunk_delay_ms: integer(env.LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS, 10),
chunk_count: integer(env.LANGBOT_FAKE_PROVIDER_CHUNK_COUNT, 0),
fault_status: integer(env.LANGBOT_FAKE_PROVIDER_FAULT_STATUS, 500),
fail_first_n: integer(env.LANGBOT_FAKE_PROVIDER_FAIL_FIRST_N, 0),
fail_every_n: integer(env.LANGBOT_FAKE_PROVIDER_FAIL_EVERY_N, 0),
fail_after_first_chunk: bool(env.LANGBOT_FAKE_PROVIDER_FAIL_AFTER_FIRST_CHUNK, false),
dynamic_response: !/^(0|false|no|off)$/i.test(env.LANGBOT_FAKE_PROVIDER_DYNAMIC_RESPONSE || ""),
request_log_limit: integer(env.LANGBOT_FAKE_PROVIDER_REQUEST_LOG_LIMIT, 500),
};
let requestCount = 0;
const recentRequests = [];
const server = createServer(async (request, response) => {
const startedAt = Date.now();
const startedPerf = performance.now();
let requestRecord = null;
const url = new URL(request.url || "/", `http://${request.headers.host || `${host}:${port}`}`);
try {
if (request.method === "GET" && url.pathname === "/healthz") {
sendJson(response, 200, {
ok: true,
model: modelName,
config,
request_count: requestCount,
recent_request_count: recentRequests.length,
});
return;
}
if (request.method === "GET" && url.pathname === "/__qa/config") {
sendJson(response, 200, {
ok: true,
model: modelName,
config,
request_count: requestCount,
recent_requests: recentRequests,
});
return;
}
if (request.method === "POST" && url.pathname === "/__qa/config") {
const body = await readJson(request);
applyConfig(body.config && typeof body.config === "object" ? body.config : body);
if (body.reset_request_count !== false) resetRequestState();
sendJson(response, 200, {
ok: true,
model: modelName,
config,
request_count: requestCount,
});
return;
}
if (request.method === "POST" && url.pathname === "/__qa/reset") {
resetRequestState();
sendJson(response, 200, {
ok: true,
model: modelName,
config,
request_count: requestCount,
});
return;
}
if (request.method === "GET" && ["/models", "/v1/models"].includes(url.pathname)) {
sendJson(response, 200, {
object: "list",
data: [
{
id: modelName,
object: "model",
created: 1,
owned_by: "langbot-qa",
type: "llm",
},
],
});
return;
}
if (request.method === "POST" && ["/chat/completions", "/v1/chat/completions"].includes(url.pathname)) {
requestCount += 1;
const body = await readJson(request);
const requestId = `chatcmpl-langbot-fake-${requestCount}`;
const shouldFail = requestCount <= config.fail_first_n
|| (config.fail_every_n > 0 && requestCount % config.fail_every_n === 0);
const replyText = responseTextForBody(body);
requestRecord = recordRequest({
id: requestId,
request_number: requestCount,
path: url.pathname,
stream: Boolean(body.stream),
model: body.model || "",
message_count: Array.isArray(body.messages) ? body.messages.length : 0,
should_fail: shouldFail,
status: "running",
http_status: null,
expected_text: replyText,
response_text_preview: previewText(replyText),
started_at: new Date(startedAt).toISOString(),
started_epoch_ms: startedAt,
configured_first_token_delay_ms: config.first_token_delay_ms,
configured_chunk_delay_ms: config.chunk_delay_ms,
configured_chunk_count: config.chunk_count,
});
if (shouldFail) {
await sleep(config.first_token_delay_ms);
sendJson(response, config.fault_status, {
error: {
message: `LangBot fake provider injected HTTP ${config.fault_status}`,
type: "fake_provider_fault",
code: "fake_provider_fault",
},
});
finishRequestRecord(requestRecord, startedPerf, {
status: "http_fault",
http_status: config.fault_status,
});
return;
}
if (body.stream) {
await streamCompletion(response, {
requestId,
model: body.model || modelName,
content: replyText,
failAfterFirstChunk: config.fail_after_first_chunk,
requestRecord,
startedPerf,
});
} else {
await sleep(config.first_token_delay_ms + config.chunk_delay_ms);
sendJson(response, 200, completionPayload({
requestId,
model: body.model || modelName,
content: replyText,
}));
markRequestTiming(requestRecord, "first_chunk", startedPerf);
markRequestTiming(requestRecord, "first_content_chunk", startedPerf);
requestRecord.content_chunk_count = 1;
finishRequestRecord(requestRecord, startedPerf, {
status: "ok",
http_status: 200,
});
}
return;
}
sendJson(response, 404, {
error: {
message: `No fake provider route for ${request.method} ${url.pathname}`,
type: "not_found",
},
});
} catch (error) {
if (requestRecord) {
finishRequestRecord(requestRecord, startedPerf, {
status: "fake_provider_error",
http_status: 500,
error: error instanceof Error ? error.message : String(error),
});
}
sendJson(response, 500, {
error: {
message: error instanceof Error ? error.message : String(error),
type: "fake_provider_error",
},
});
} finally {
const durationMs = Date.now() - startedAt;
if (url.pathname !== "/healthz") {
console.log(JSON.stringify({
at: new Date().toISOString(),
method: request.method,
path: url.pathname,
duration_ms: durationMs,
}));
}
}
});
server.listen(port, host, async () => {
const address = server.address();
const selectedPort = typeof address === "object" && address ? address.port : port;
const url = `http://${host}:${selectedPort}`;
const state = {
status: "ready",
pid: process.pid,
url,
base_url: `${url}/v1`,
model: modelName,
started_at: new Date().toISOString(),
};
if (stateFile) {
const path = resolve(stateFile);
await mkdir(dirname(path), { recursive: true });
await writeFile(path, `${JSON.stringify(state, null, 2)}\n`, "utf8");
}
console.log(JSON.stringify(state));
});
server.on("error", (error) => {
console.error(JSON.stringify({
status: "error",
reason: error instanceof Error ? error.message : String(error),
}));
exit(1);
});
process.on("SIGTERM", () => {
server.close(() => exit(0));
});
function parseArgs(argv) {
const result = {};
for (const item of argv) {
const match = item.match(/^--([^=]+)(?:=(.*))?$/);
if (!match) continue;
result[match[1]] = match[2] ?? "1";
}
return result;
}
function integer(value, fallback) {
const parsed = Number.parseInt(String(value ?? ""), 10);
return Number.isFinite(parsed) && parsed >= 0 ? parsed : fallback;
}
function bool(value, fallback) {
if (value === undefined || value === "") return fallback;
if (/^(1|true|yes|on)$/i.test(String(value))) return true;
if (/^(0|false|no|off)$/i.test(String(value))) return false;
return fallback;
}
function sleep(ms) {
return new Promise((resolve) => setTimeout(resolve, Math.max(0, ms)));
}
async function readJson(request) {
let text = "";
for await (const chunk of request) text += chunk.toString();
if (!text) return {};
return JSON.parse(text);
}
function sendJson(response, status, payload) {
const text = `${JSON.stringify(payload)}\n`;
response.writeHead(status, {
"content-type": "application/json",
"content-length": Buffer.byteLength(text),
});
response.end(text);
}
function completionPayload({ requestId, model, content }) {
const completionTokens = tokenEstimate(content);
return {
id: requestId,
object: "chat.completion",
created: Math.floor(Date.now() / 1000),
model,
choices: [
{
index: 0,
message: {
role: "assistant",
content,
},
finish_reason: "stop",
},
],
usage: {
prompt_tokens: 8,
completion_tokens: completionTokens,
total_tokens: 8 + completionTokens,
},
};
}
async function streamCompletion(response, {
requestId,
model,
content,
failAfterFirstChunk: failMidStream,
requestRecord,
startedPerf,
}) {
response.writeHead(200, {
"content-type": "text/event-stream; charset=utf-8",
"cache-control": "no-cache",
"connection": "keep-alive",
});
await sleep(config.first_token_delay_ms);
markRequestTiming(requestRecord, "first_chunk", startedPerf);
writeSse(response, {
id: requestId,
object: "chat.completion.chunk",
created: Math.floor(Date.now() / 1000),
model,
choices: [{ index: 0, delta: { role: "assistant" }, finish_reason: null }],
});
const chunks = splitContent(content);
for (let index = 0; index < chunks.length; index += 1) {
await sleep(config.chunk_delay_ms);
if (index === 0) markRequestTiming(requestRecord, "first_content_chunk", startedPerf);
requestRecord.content_chunk_count = (requestRecord.content_chunk_count || 0) + 1;
writeSse(response, {
id: requestId,
object: "chat.completion.chunk",
created: Math.floor(Date.now() / 1000),
model,
choices: [{ index: 0, delta: { content: chunks[index] }, finish_reason: null }],
});
if (failMidStream && index === 0) {
finishRequestRecord(requestRecord, startedPerf, {
status: "mid_stream_disconnect",
http_status: 200,
});
response.destroy(new Error("LangBot fake provider injected mid-stream disconnect"));
return;
}
}
await sleep(config.chunk_delay_ms);
const completionTokens = tokenEstimate(content);
writeSse(response, {
id: requestId,
object: "chat.completion.chunk",
created: Math.floor(Date.now() / 1000),
model,
choices: [{ index: 0, delta: {}, finish_reason: "stop" }],
usage: {
prompt_tokens: 8,
completion_tokens: completionTokens,
total_tokens: 8 + completionTokens,
},
});
response.write("data: [DONE]\n\n");
response.end();
finishRequestRecord(requestRecord, startedPerf, {
status: "ok",
http_status: 200,
});
}
function writeSse(response, payload) {
response.write(`data: ${JSON.stringify(payload)}\n\n`);
}
function splitContent(content) {
const text = String(content);
const requested = config.chunk_count;
if (requested <= 1 || text.length <= 1) return [text];
const chunkSize = Math.max(1, Math.ceil(text.length / requested));
const chunks = [];
for (let index = 0; index < text.length; index += chunkSize) {
chunks.push(text.slice(index, index + chunkSize));
}
return chunks;
}
function tokenEstimate(content) {
return Math.max(1, Math.ceil(String(content || "").length / 4));
}
function responseTextForBody(body) {
if (!config.dynamic_response) {
return config.response_text;
}
const messages = Array.isArray(body.messages) ? body.messages : [];
const lastUser = [...messages].reverse().find((message) => message?.role === "user");
const text = flattenContent(lastUser?.content || "");
const quoted = text.match(/["'“”](.{1,80}?)["'“”]/);
if (quoted?.[1]) return quoted[1].trim();
const exact = text.match(/(?:reply|回复|输出|return)\s+(?:exactly\s+)?([A-Za-z0-9_.:@-]{1,80})/i);
if (exact?.[1]) return exact[1].trim().replace(/[。.!?]+$/, "");
const only = text.match(/只回复\s*([A-Za-z0-9_.:@-]{1,80})/);
if (only?.[1]) return only[1].trim().replace(/[。.!?]+$/, "");
return config.response_text;
}
function flattenContent(content) {
if (typeof content === "string") return content;
if (Array.isArray(content)) {
return content
.map((item) => {
if (typeof item === "string") return item;
if (item && typeof item === "object") return item.text || "";
return "";
})
.join("\n");
}
return "";
}
function recordRequest(entry) {
const item = {
...entry,
at: new Date().toISOString(),
finished_at: null,
finished_epoch_ms: null,
duration_ms: null,
first_chunk_at: null,
first_chunk_epoch_ms: null,
first_chunk_ms: null,
first_content_chunk_at: null,
first_content_chunk_epoch_ms: null,
first_content_chunk_ms: null,
content_chunk_count: 0,
};
recentRequests.push(item);
while (recentRequests.length > config.request_log_limit) recentRequests.shift();
return item;
}
function markRequestTiming(entry, key, startedPerf) {
if (!entry || entry[`${key}_at`]) return;
const now = Date.now();
entry[`${key}_at`] = new Date(now).toISOString();
entry[`${key}_epoch_ms`] = now;
entry[`${key}_ms`] = rounded(performance.now() - startedPerf);
}
function finishRequestRecord(entry, startedPerf, updates = {}) {
if (!entry || entry.finished_at) return;
const now = Date.now();
Object.assign(entry, updates);
entry.finished_at = new Date(now).toISOString();
entry.finished_epoch_ms = now;
entry.duration_ms = rounded(performance.now() - startedPerf);
}
function rounded(value) {
return Number(value.toFixed(3));
}
function previewText(value) {
return String(value || "").slice(0, 120);
}
function resetRequestState() {
requestCount = 0;
recentRequests.length = 0;
}
function applyConfig(updates) {
if (!updates || typeof updates !== "object") return;
assignString(updates, "response_text");
assignNonNegativeInteger(updates, "first_token_delay_ms");
assignNonNegativeInteger(updates, "chunk_delay_ms");
assignNonNegativeInteger(updates, "chunk_count");
assignNonNegativeInteger(updates, "fail_first_n");
assignNonNegativeInteger(updates, "fail_every_n");
assignNonNegativeInteger(updates, "request_log_limit");
if (updates.fault_status !== undefined) {
const parsed = Number.parseInt(String(updates.fault_status), 10);
if (Number.isInteger(parsed) && parsed >= 400 && parsed <= 599) config.fault_status = parsed;
}
assignBoolean(updates, "fail_after_first_chunk");
assignBoolean(updates, "dynamic_response");
}
function assignString(updates, key) {
if (updates[key] !== undefined) config[key] = String(updates[key]);
}
function assignNonNegativeInteger(updates, key) {
if (updates[key] === undefined) return;
const parsed = Number.parseInt(String(updates[key]), 10);
if (Number.isInteger(parsed) && parsed >= 0) config[key] = parsed;
}
function assignBoolean(updates, key) {
if (updates[key] === undefined) return;
config[key] = bool(updates[key], config[key]);
}
+2 -1
View File
@@ -72,6 +72,7 @@ export async function writeResult(paths, result) {
}
export async function loadEnvFiles(paths = ["skills/.env", "skills/.env.local"]) {
const processEnvKeys = new Set(Object.keys(env));
for (const path of paths) {
let text = "";
try {
@@ -86,7 +87,7 @@ export async function loadEnvFiles(paths = ["skills/.env", "skills/.env.local"])
if (equals <= 0) continue;
const key = trimmed.slice(0, equals).trim();
const value = trimmed.slice(equals + 1).trim().replace(/^["']|["']$/g, "");
if (!(key in env)) env[key] = value;
if (!processEnvKeys.has(key)) env[key] = value;
}
}
}
+79 -1
View File
@@ -54,6 +54,7 @@ const debugChatSessionType = env.LANGBOT_E2E_DEBUG_CHAT_SESSION_TYPE || "person"
const pipelineConfigDiagnosticPath = resolve(paths.evidenceDir, "pipeline-config-diagnostic.json");
const debugChatResetDiagnosticPath = resolve(paths.evidenceDir, "debug-chat-reset-diagnostic.json");
const pipelineConfigRestoreDiagnosticPath = resolve(paths.evidenceDir, "pipeline-config-restore-diagnostic.json");
const metricsPath = resolve(paths.evidenceDir, "metrics.json");
const startedAt = new Date();
let browser;
@@ -80,10 +81,11 @@ let result = {
console_log: paths.consoleLog,
network_log: paths.networkLog,
screenshot: paths.screenshot,
metrics_json: metricsPath,
automation_result_json: paths.automationResultJson,
result_json: paths.resultJson,
},
evidence_collected: ["ui", "screenshot", "console", "network"],
evidence_collected: ["ui", "screenshot", "console", "network", "metrics"],
};
function boolFromEnv(value, defaultValue) {
@@ -103,6 +105,29 @@ function parseJsonEnv(key, fallback) {
}
}
function positiveNumberEnv(key, fallback) {
const value = Number(env[key] || "");
return Number.isFinite(value) && value >= 0 ? value : fallback;
}
function percentile(values, percentileValue) {
if (values.length === 0) return 0;
const sorted = [...values].sort((a, b) => a - b);
const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
return Number(sorted[index].toFixed(3));
}
function stats(values) {
if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
return {
min: Number(Math.min(...values).toFixed(3)),
p50: percentile(values, 50),
p95: percentile(values, 95),
p99: percentile(values, 99),
max: Number(Math.max(...values).toFixed(3)),
};
}
function promptStepsFromEnv() {
const rawSteps = parseJsonEnv("LANGBOT_E2E_PROMPTS_JSON", null);
if (rawSteps === null) {
@@ -658,6 +683,7 @@ try {
} else {
for (let index = 0; index < promptSteps.length; index += 1) {
const step = promptSteps[index];
const promptStartedAt = Date.now();
const chatResult = await runDebugChatPrompt(page, {
prompt: step.prompt,
expectedText: step.expectedText,
@@ -665,11 +691,13 @@ try {
imagePath: index === 0 ? imagePath : "",
failureSignals: failureSignals.length > 0 ? failureSignals : undefined,
});
const promptDurationMs = Date.now() - promptStartedAt;
result.chat_results.push({
index,
expected_text: step.expectedText,
status: chatResult.status,
reason: chatResult.reason,
response_duration_ms: promptDurationMs,
min_expected_count: chatResult.min_expected_count,
final_count: chatResult.final_count,
before_assistant_expected_count: chatResult.before_assistant_expected_count,
@@ -714,6 +742,56 @@ try {
const finishedAt = new Date();
result.finished_at = finishedAt.toISOString();
result.finished_at_local = localIsoWithOffset(finishedAt);
result.duration_ms = finishedAt.getTime() - startedAt.getTime();
const responseDurations = result.chat_results
.map((item) => item.response_duration_ms)
.filter((value) => Number.isFinite(value));
const passedPrompts = result.chat_results.filter((item) => item.status === "pass").length;
const attemptedPrompts = result.chat_results.length;
const errorRate = attemptedPrompts === 0 ? 1 : Number(((attemptedPrompts - passedPrompts) / attemptedPrompts).toFixed(4));
const responseStats = stats(responseDurations);
const responseP95BudgetMs = positiveNumberEnv(
"LANGBOT_E2E_DEBUG_CHAT_RESPONSE_P95_MS",
positiveNumberEnv("LANGBOT_DEBUG_CHAT_RESPONSE_P95_MS", safeResponseTimeoutMs),
);
const maxErrorRate = positiveNumberEnv("LANGBOT_E2E_DEBUG_CHAT_MAX_ERROR_RATE", 0);
const metrics = {
probe: caseId,
url: result.url,
prompt_count: result.prompt_count,
attempted_prompt_count: attemptedPrompts,
passed_prompt_count: passedPrompts,
error_rate: errorRate,
response_duration_ms: responseStats,
total_duration_ms: result.duration_ms,
chat_results: result.chat_results,
};
result.metrics_summary = {
prompt_count: metrics.prompt_count,
attempted_prompt_count: metrics.attempted_prompt_count,
passed_prompt_count: metrics.passed_prompt_count,
error_rate: metrics.error_rate,
response_p50_ms: metrics.response_duration_ms.p50,
response_p95_ms: metrics.response_duration_ms.p95,
total_duration_ms: metrics.total_duration_ms,
};
result.thresholds_summary = {
response_p95_ms: {
actual: metrics.response_duration_ms.p95,
max: responseP95BudgetMs,
pass: attemptedPrompts > 0 && metrics.response_duration_ms.p95 <= responseP95BudgetMs,
},
error_rate: {
actual: metrics.error_rate,
max: maxErrorRate,
pass: metrics.error_rate <= maxErrorRate,
},
};
await writeFile(metricsPath, `${JSON.stringify(metrics, null, 2)}\n`, "utf8");
if (result.status === "pass" && !Object.values(result.thresholds_summary).every((item) => item.pass)) {
result.status = "fail";
result.reason = "Debug Chat performance breached response latency or error-rate thresholds.";
}
const existingEvidence = {};
for (const [key, value] of Object.entries(result.evidence)) {
if (typeof value !== "string") continue;
+476
View File
@@ -130,6 +130,7 @@
"references/local-agent-runner.md",
"references/mcp-stdio-testing.md",
"references/model-provider-testing.md",
"references/performance-reliability-testing.md",
"references/pipeline-debug-chat.md",
"references/plugin-e2e-smoke.md",
"references/sandbox-skill-authoring.md",
@@ -150,6 +151,16 @@
"agent-runner-release-preflight",
"agent-runner-runtime-chaos",
"dify-agent-debug-chat",
"langbot-fake-provider-debug-chat-cross-pipeline-isolation",
"langbot-fake-provider-debug-chat-fault-recovery",
"langbot-fake-provider-debug-chat-load",
"langbot-fake-provider-debug-chat-slow-load",
"langbot-fault-taxonomy-contract",
"langbot-live-backend-latency",
"langbot-live-backend-log-health",
"langbot-live-control-plane-api",
"langbot-overhead-accounting-contract",
"langbot-space-debug-chat-concurrency-smoke",
"langrag-kb-retrieve",
"langrag-parser-golden-e2e",
"langrag-sentinel-kb-discover",
@@ -165,6 +176,7 @@
"mcp-stdio-register",
"mcp-stdio-tool-call",
"pipeline-debug-chat",
"pipeline-debug-chat-performance",
"plugin-e2e-smoke",
"provider-deepseek",
"qa-plugin-smoke-live-install",
@@ -486,6 +498,316 @@
"backend_log"
]
},
{
"id": "langbot-fake-provider-debug-chat-cross-pipeline-isolation",
"title": "LangBot Debug Chat fake-provider cross-pipeline isolation probe",
"mode": "probe",
"area": "reliability",
"type": "reliability",
"priority": "p1",
"risk": "high",
"ci_eligible": false,
"tags": [
"reliability",
"debug-chat",
"websocket",
"fake-provider",
"isolation",
"concurrency",
"metrics"
],
"automation": "skills/langbot-testing/probes/langbot-debug-chat-cross-pipeline-isolation.mjs",
"setup_automation": [
"node:scripts/e2e/ensure-fake-provider-cross-pipelines.mjs --write-env"
],
"setup_provides_env": [
"LANGBOT_FAKE_PROVIDER_URL",
"LANGBOT_FAKE_PROVIDER_BASE_URL",
"LANGBOT_FAKE_PROVIDER_PID",
"LANGBOT_FAKE_PROVIDER_PIPELINE_A_URL",
"LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME",
"LANGBOT_FAKE_PROVIDER_PIPELINE_B_URL",
"LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME"
],
"evidence_required": [
"metrics",
"network",
"api_diagnostic",
"filesystem"
]
},
{
"id": "langbot-fake-provider-debug-chat-fault-recovery",
"title": "LangBot Debug Chat fake-provider fault recovery probe",
"mode": "probe",
"area": "reliability",
"type": "chaos",
"priority": "p1",
"risk": "high",
"ci_eligible": false,
"tags": [
"reliability",
"chaos",
"debug-chat",
"websocket",
"fake-provider",
"fault-injection",
"metrics"
],
"automation": "skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs",
"setup_automation": [
"node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
],
"setup_provides_env": [
"LANGBOT_FAKE_PROVIDER_URL",
"LANGBOT_FAKE_PROVIDER_BASE_URL",
"LANGBOT_FAKE_PROVIDER_PID",
"LANGBOT_FAKE_PROVIDER_PROVIDER_UUID",
"LANGBOT_FAKE_PROVIDER_MODEL_UUID",
"LANGBOT_FAKE_PROVIDER_PIPELINE_URL",
"LANGBOT_FAKE_PROVIDER_PIPELINE_NAME"
],
"evidence_required": [
"metrics",
"network",
"api_diagnostic",
"filesystem"
]
},
{
"id": "langbot-fake-provider-debug-chat-load",
"title": "LangBot Debug Chat controlled fake-provider load probe",
"mode": "probe",
"area": "performance",
"type": "performance",
"priority": "p1",
"risk": "medium",
"ci_eligible": false,
"tags": [
"performance",
"debug-chat",
"websocket",
"fake-provider",
"load",
"metrics"
],
"automation": "skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs",
"setup_automation": [
"node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
],
"setup_provides_env": [
"LANGBOT_FAKE_PROVIDER_URL",
"LANGBOT_FAKE_PROVIDER_BASE_URL",
"LANGBOT_FAKE_PROVIDER_PID",
"LANGBOT_FAKE_PROVIDER_PROVIDER_UUID",
"LANGBOT_FAKE_PROVIDER_MODEL_UUID",
"LANGBOT_FAKE_PROVIDER_PIPELINE_URL",
"LANGBOT_FAKE_PROVIDER_PIPELINE_NAME"
],
"evidence_required": [
"metrics",
"network",
"api_diagnostic",
"filesystem"
]
},
{
"id": "langbot-fake-provider-debug-chat-slow-load",
"title": "LangBot Debug Chat slow fake-provider load probe",
"mode": "probe",
"area": "performance",
"type": "performance",
"priority": "p1",
"risk": "medium",
"ci_eligible": false,
"tags": [
"performance",
"debug-chat",
"websocket",
"fake-provider",
"slow-provider",
"load",
"metrics"
],
"automation": "skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs",
"setup_automation": [
"node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
],
"setup_provides_env": [
"LANGBOT_FAKE_PROVIDER_URL",
"LANGBOT_FAKE_PROVIDER_BASE_URL",
"LANGBOT_FAKE_PROVIDER_PID",
"LANGBOT_FAKE_PROVIDER_PROVIDER_UUID",
"LANGBOT_FAKE_PROVIDER_MODEL_UUID",
"LANGBOT_FAKE_PROVIDER_PIPELINE_URL",
"LANGBOT_FAKE_PROVIDER_PIPELINE_NAME"
],
"evidence_required": [
"metrics",
"network",
"api_diagnostic",
"filesystem"
]
},
{
"id": "langbot-fault-taxonomy-contract",
"title": "LangBot fault taxonomy and cleanup contract",
"mode": "probe",
"area": "reliability",
"type": "chaos",
"priority": "p1",
"risk": "medium",
"ci_eligible": true,
"tags": [
"reliability",
"chaos",
"contract",
"synthetic"
],
"automation": "skills/langbot-testing/probes/langbot-fault-taxonomy-contract.mjs",
"setup_automation": [],
"setup_provides_env": [],
"evidence_required": [
"metrics",
"filesystem"
]
},
{
"id": "langbot-live-backend-latency",
"title": "LangBot live backend basic latency probe",
"mode": "probe",
"area": "performance",
"type": "performance",
"priority": "p1",
"risk": "medium",
"ci_eligible": false,
"tags": [
"performance",
"live-backend",
"latency",
"metrics"
],
"automation": "skills/langbot-testing/probes/langbot-live-backend-latency.mjs",
"setup_automation": [],
"setup_provides_env": [],
"evidence_required": [
"metrics",
"network",
"api_diagnostic",
"filesystem"
]
},
{
"id": "langbot-live-backend-log-health",
"title": "LangBot live backend log health probe",
"mode": "probe",
"area": "reliability",
"type": "reliability",
"priority": "p1",
"risk": "medium",
"ci_eligible": false,
"tags": [
"reliability",
"live-backend",
"backend-log",
"metrics"
],
"automation": "skills/langbot-testing/probes/langbot-live-backend-log-health.mjs",
"setup_automation": [],
"setup_provides_env": [],
"evidence_required": [
"metrics",
"backend_log",
"filesystem"
]
},
{
"id": "langbot-live-control-plane-api",
"title": "LangBot live control-plane API probe",
"mode": "probe",
"area": "performance",
"type": "performance",
"priority": "p1",
"risk": "medium",
"ci_eligible": false,
"tags": [
"performance",
"reliability",
"live-backend",
"control-plane",
"metrics"
],
"automation": "skills/langbot-testing/probes/langbot-live-control-plane-api.mjs",
"setup_automation": [],
"setup_provides_env": [],
"evidence_required": [
"metrics",
"network",
"api_diagnostic",
"filesystem"
]
},
{
"id": "langbot-overhead-accounting-contract",
"title": "LangBot overhead accounting metrics contract",
"mode": "probe",
"area": "performance",
"type": "performance",
"priority": "p1",
"risk": "medium",
"ci_eligible": true,
"tags": [
"performance",
"metrics",
"contract",
"synthetic"
],
"automation": "skills/langbot-testing/probes/langbot-overhead-accounting-contract.mjs",
"setup_automation": [],
"setup_provides_env": [],
"evidence_required": [
"metrics",
"resource_log",
"filesystem"
]
},
{
"id": "langbot-space-debug-chat-concurrency-smoke",
"title": "LangBot Debug Chat real Space-provider concurrency smoke",
"mode": "probe",
"area": "performance",
"type": "performance",
"priority": "p1",
"risk": "high",
"ci_eligible": false,
"tags": [
"performance",
"debug-chat",
"websocket",
"space",
"live-provider",
"smoke",
"metrics"
],
"automation": "skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs",
"setup_automation": [
"node:scripts/e2e/ensure-local-agent-pipeline.mjs --write-env"
],
"setup_provides_env": [
"LANGBOT_PIPELINE_URL",
"LANGBOT_PIPELINE_NAME",
"LANGBOT_LOCAL_AGENT_PIPELINE_URL",
"LANGBOT_LOCAL_AGENT_PIPELINE_NAME",
"LANGBOT_LOCAL_AGENT_MODEL_UUID",
"LANGBOT_E2E_MODEL_UUID"
],
"evidence_required": [
"metrics",
"network",
"api_diagnostic",
"filesystem"
]
},
{
"id": "langrag-kb-retrieve",
"title": "LangRAG knowledge base ingests and retrieves a sentinel document",
@@ -911,6 +1233,38 @@
"backend_log"
]
},
{
"id": "pipeline-debug-chat-performance",
"title": "Pipeline Debug Chat user-path performance probe",
"mode": "agent-browser",
"area": "pipeline",
"type": "performance",
"priority": "p1",
"risk": "medium",
"ci_eligible": false,
"tags": [
"performance",
"pipeline",
"debug-chat",
"user-path",
"metrics"
],
"automation": "scripts/e2e/pipeline-debug-chat.mjs",
"setup_automation": [
"node:scripts/e2e/ensure-local-agent-pipeline.mjs --write-env"
],
"setup_provides_env": [
"LANGBOT_PIPELINE_URL",
"LANGBOT_PIPELINE_NAME"
],
"evidence_required": [
"ui",
"screenshot",
"console",
"network",
"metrics"
]
},
{
"id": "plugin-e2e-smoke",
"title": "Plugin system installs a local plugin and exposes tool/page APIs",
@@ -1059,6 +1413,12 @@
"suites": [
"agent-runner-release-gate",
"core-smoke",
"langbot-debug-chat-isolation-gate",
"langbot-debug-chat-load-gate",
"langbot-live-backend-gate",
"langbot-performance-contract-gate",
"langbot-performance-reliability-gate",
"langbot-user-path-performance-gate",
"local-agent-gate"
],
"suite_summaries": [
@@ -1121,6 +1481,113 @@
"local-agent-basic-debug-chat"
]
},
{
"id": "langbot-debug-chat-isolation-gate",
"title": "LangBot Debug Chat isolation gate",
"description": "Manual/non-required cross-pipeline Debug Chat isolation gate. Current releases may fail this gate because of product bug #2286; use it as regression evidence after the routing fix lands.",
"type": "reliability",
"priority": "p1",
"tags": [
"reliability",
"debug-chat",
"websocket",
"isolation",
"concurrency"
],
"cases": [
"langbot-fake-provider-debug-chat-cross-pipeline-isolation"
]
},
{
"id": "langbot-debug-chat-load-gate",
"title": "LangBot Debug Chat load gate",
"description": "Manual/non-required message-path load checks for Pipeline Debug Chat: controlled fake-provider baseline, slow-provider and fault-recovery profiles, plus optional real Space-provider smoke. Cross-pipeline isolation is split into langbot-debug-chat-isolation-gate because current releases may fail it due to product bug #2286.",
"type": "performance",
"priority": "p1",
"tags": [
"performance",
"debug-chat",
"websocket",
"load"
],
"cases": [
"langbot-fake-provider-debug-chat-load",
"langbot-fake-provider-debug-chat-slow-load",
"langbot-fake-provider-debug-chat-fault-recovery",
"langbot-space-debug-chat-concurrency-smoke"
]
},
{
"id": "langbot-live-backend-gate",
"title": "LangBot live backend reliability gate",
"description": "Live backend control-plane responsiveness and runtime log health checks for a locally running LangBot instance.",
"type": "reliability",
"priority": "p1",
"tags": [
"performance",
"reliability",
"live-backend",
"metrics"
],
"cases": [
"langbot-live-backend-latency",
"langbot-live-control-plane-api",
"langbot-live-backend-log-health"
]
},
{
"id": "langbot-performance-contract-gate",
"title": "LangBot performance contract gate",
"description": "Fast synthetic contract checks for performance metric accounting and non-destructive reliability fault taxonomy.",
"type": "contract",
"priority": "p1",
"tags": [
"performance",
"reliability",
"contract",
"metrics"
],
"cases": [
"langbot-overhead-accounting-contract",
"langbot-fault-taxonomy-contract"
]
},
{
"id": "langbot-performance-reliability-gate",
"title": "LangBot performance and reliability starter gate",
"description": "Starter gate for LangBot performance accounting, live backend control-plane latency, and non-destructive fault taxonomy checks.",
"type": "reliability",
"priority": "p1",
"tags": [
"performance",
"reliability",
"metrics",
"chaos"
],
"cases": [
"langbot-overhead-accounting-contract",
"langbot-fault-taxonomy-contract",
"langbot-live-backend-latency",
"langbot-live-control-plane-api",
"langbot-live-backend-log-health"
]
},
{
"id": "langbot-user-path-performance-gate",
"title": "LangBot user-path performance gate",
"description": "Browser-visible performance checks for user-facing LangBot paths such as Pipeline Debug Chat.",
"type": "performance",
"priority": "p1",
"tags": [
"performance",
"browser",
"debug-chat",
"user-path"
],
"cases": [
"pipeline-debug-chat-performance"
]
},
{
"id": "local-agent-gate",
"title": "Local Agent runner regression gate",
@@ -1265,6 +1732,7 @@
"sandbox-native-tools-unavailable",
"socks-proxy-without-socksio",
"survey-widget-blocks-debug-chat",
"telemetry-proxy-noise",
"tool-name-collision-between-mcp-and-plugin",
"uv-run-resyncs-local-sdk"
],
@@ -1449,6 +1917,14 @@
"mcp-stdio-tool-call"
]
},
{
"id": "telemetry-proxy-noise",
"title": "Telemetry posting fails through the proxy while the target flow succeeds",
"category": "env_issue",
"related_cases": [
"langbot-space-debug-chat-concurrency-smoke"
]
},
{
"id": "tool-name-collision-between-mcp-and-plugin",
"title": "MCP and plugin expose the same tool name",
+17
View File
@@ -26,6 +26,23 @@ LANGBOT_NO_PROXY=localhost,127.0.0.1,::1
LANGBOT_PIPELINE_URL=
LANGBOT_PIPELINE_NAME=
# Optional fake OpenAI-compatible provider controls for Debug Chat load tests.
# Leave URL empty to let setup automation start a local provider and write the
# selected URL to skills/.env.local.
LANGBOT_FAKE_PROVIDER_URL=
LANGBOT_FAKE_PROVIDER_HOST=127.0.0.1
LANGBOT_FAKE_PROVIDER_PORT=
LANGBOT_FAKE_PROVIDER_MODEL_NAME=gpt-4o-mini
LANGBOT_FAKE_PROVIDER_RESPONSE_TEXT=OK
LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS=25
LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS=10
LANGBOT_FAKE_PROVIDER_CHUNK_COUNT=0
LANGBOT_FAKE_PROVIDER_FAIL_FIRST_N=0
LANGBOT_FAKE_PROVIDER_FAIL_EVERY_N=0
LANGBOT_FAKE_PROVIDER_FAULT_STATUS=500
LANGBOT_FAKE_PROVIDER_FAIL_AFTER_FIRST_CHUNK=false
LANGBOT_FAKE_PROVIDER_DYNAMIC_RESPONSE=true
# Optional case-specific runner targets. Prefer these for runner-specific cases
# so the automation cannot silently test the wrong runner.
LANGBOT_LOCAL_AGENT_PIPELINE_URL=
@@ -53,7 +53,7 @@ Start the new frontend from the web repo:
```bash
cd "$LANGBOT_WEB_REPO"
npm run dev
VITE_API_BASE_URL="$LANGBOT_BACKEND_URL" pnpm dev --host 0.0.0.0
```
Healthy startup includes:
@@ -68,6 +68,10 @@ Quick check:
curl -I --max-time 3 "$LANGBOT_FRONTEND_URL"
```
If `VITE_API_BASE_URL` is missing, Vite still serves the page but frontend API
calls may go to the frontend port instead of the backend port. That produces
false browser failures in login, wizard, pipeline, and Debug Chat cases.
## Completion Signal
Environment setup is not complete until the required frontend/backend URLs are reachable and the chosen browser-control path can open the WebUI.
+3
View File
@@ -21,6 +21,7 @@ Use this skill when an agent needs to verify LangBot behavior through the WebUI
- **Sandbox-backed skill authoring**: read `references/sandbox-skill-authoring.md`.
- **LangRAG knowledge bases**: read `references/langrag-knowledge-base.md`.
- **MCP stdio tool testing**: read `references/mcp-stdio-testing.md`.
- **Performance, reliability, or chaos probes**: read `references/performance-reliability-testing.md`.
- **Drive a live instance over MCP (not raw HTTP)**: use the `langbot-mcp-ops` skill — the instance exposes an MCP server at `http://<host>:5300/mcp` (reuses API keys). Useful for setting up bots/pipelines/models as test fixtures programmatically.
- **Known failures and fixes**: read `references/troubleshooting.md`.
- **Reusable test groups**: run `bin/lbs suite list` and `bin/lbs suite plan <suite-id>` before manually assembling a case set.
@@ -36,6 +37,8 @@ Use this skill when an agent needs to verify LangBot behavior through the WebUI
- Use an authenticated browser profile prepared by `langbot-env-setup`.
- Do not expose API keys, OAuth secrets, tokens, or localStorage token values in output.
- A WebUI test is not complete until the visible UI result is checked against backend logs or network behavior.
- A performance result is not complete without `metrics` evidence and a clear split between LangBot overhead and external provider/tool/network time.
- A chaos or reliability result is not complete until the fault scope, cleanup, and recovery checks are recorded.
- For a suite, use `bin/lbs suite start <suite-id>` to create the suite evidence root, per-case directories, and `suite-start.json`/`suite-start.md` handoff files; use `bin/lbs test result <case-id>` to write final per-case `result.json`, then run `bin/lbs suite report <suite-id> --evidence-dir <dir>`.
- Do not mark a case `pass` until `test result --evidence` covers every value in the case's `evidence_required`.
- For runner-specific Debug Chat cases, use the case-specific pipeline env declared by `automation_pipeline_url_env` / `automation_pipeline_name_env`; do not silently reuse a generic `LANGBOT_PIPELINE_URL`.
@@ -0,0 +1,84 @@
id: langbot-fake-provider-debug-chat-cross-pipeline-isolation
title: "LangBot Debug Chat fake-provider cross-pipeline isolation probe"
mode: probe
area: reliability
type: reliability
priority: p1
risk: high
ci_eligible: false
tags:
- reliability
- debug-chat
- websocket
- fake-provider
- isolation
- concurrency
- metrics
skills:
- langbot-env-setup
- langbot-testing
env:
- LANGBOT_BACKEND_URL
- LANGBOT_FRONTEND_URL
- LANGBOT_E2E_LOGIN_USER
automation: skills/langbot-testing/probes/langbot-debug-chat-cross-pipeline-isolation.mjs
automation_env:
- LANGBOT_BACKEND_URL
- LANGBOT_E2E_LOGIN_USER
- LANGBOT_FAKE_PROVIDER_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_A_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME
- LANGBOT_FAKE_PROVIDER_PIPELINE_B_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME
automation_debug_chat_load_requests: "6"
automation_debug_chat_load_concurrency: "4"
automation_debug_chat_load_timeout_ms: "30000"
automation_debug_chat_load_response_p95_ms: "5000"
automation_debug_chat_load_max_error_rate: "0"
automation_debug_chat_load_prompt_template: '请只回复 "{expected}",不要解释,不要添加其他字符。'
automation_debug_chat_load_stream: "true"
automation_debug_chat_load_reset: "true"
metrics_thresholds_json: '{"cross_pipeline_leak_count":{"max":0},"response_p95_ms":{"max":5000},"error_rate":{"max":0}}'
load_profile_json: '{"requests_per_pipeline":6,"pipelines":2,"concurrency":4,"path":"Pipeline Debug Chat WebSocket","provider":"controlled fake OpenAI-compatible provider","metric":"cross-pipeline response isolation and send-to-final-assistant-response"}'
setup_automation:
- "node:scripts/e2e/ensure-fake-provider-cross-pipelines.mjs --write-env"
setup_provides_env:
- LANGBOT_FAKE_PROVIDER_URL
- LANGBOT_FAKE_PROVIDER_BASE_URL
- LANGBOT_FAKE_PROVIDER_PID
- LANGBOT_FAKE_PROVIDER_PIPELINE_A_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME
- LANGBOT_FAKE_PROVIDER_PIPELINE_B_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME
steps:
- "Start or reuse the local fake OpenAI-compatible provider."
- "Create or update two local-agent pipelines that both point at the controlled fake provider."
- "Reset both Debug Chat sessions and the fake-provider request log."
- "Open concurrent WebSocket Debug Chat connections to both pipelines and send unique pipeline-scoped response tokens."
checks:
- "automation-result.json status is pass only when every request receives its own expected token and cross_pipeline_leak_count is zero."
- "metrics_summary includes by_pipeline status counts, fake-provider request count, and LangBot/provider timing estimates."
- "samples.json contains per-request pipeline labels so any leak can be attributed to the receiving pipeline."
evidence_required:
- metrics
- network
- api_diagnostic
- filesystem
diagnostics:
- "This probe targets Debug Chat isolation under concurrent traffic from two pipelines."
- "It is designed to expose regressions where global pipeline state causes one pipeline's assistant response to be delivered to another pipeline's Debug Chat session."
- "Same-pipeline foreign responses are tolerated because Debug Chat intentionally broadcasts within the same pipeline/session; cross-pipeline tokens are never tolerated."
- "Known product bug: current releases may fail this probe because Debug Chat replies can read singleton WebSocket proxy pipeline state after another pipeline overwrites it. See https://github.com/langbot-app/LangBot/issues/2286."
expected_failures:
- "https://github.com/langbot-app/LangBot/issues/2286"
success_patterns:
- "Debug Chat cross-pipeline isolation probe passed"
failure_patterns:
- "cross_pipeline_leak"
- "Timed out after"
- "WebSocket connection error"
- "Final assistant response did not include"
troubleshooting:
- backend-not-listening
- debug-chat-history-contaminates-automation
- local-agent-model-route-unavailable
@@ -0,0 +1,95 @@
id: langbot-fake-provider-debug-chat-fault-recovery
title: "LangBot Debug Chat fake-provider fault recovery probe"
mode: probe
area: reliability
type: chaos
priority: p1
risk: high
ci_eligible: false
tags:
- reliability
- chaos
- debug-chat
- websocket
- fake-provider
- fault-injection
- metrics
skills:
- langbot-env-setup
- langbot-testing
env:
- LANGBOT_BACKEND_URL
- LANGBOT_FRONTEND_URL
- LANGBOT_E2E_LOGIN_USER
automation: skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs
automation_env:
- LANGBOT_BACKEND_URL
- LANGBOT_E2E_LOGIN_USER
- LANGBOT_FAKE_PROVIDER_PIPELINE_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
automation_pipeline_url_env: LANGBOT_FAKE_PROVIDER_PIPELINE_URL
automation_pipeline_name_env: LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
automation_debug_chat_load_requests: "6"
automation_debug_chat_load_concurrency: "1"
automation_debug_chat_load_timeout_ms: "15000"
automation_debug_chat_load_response_p95_ms: "5000"
automation_debug_chat_load_max_error_rate: "0"
automation_debug_chat_load_min_ok_count: "6"
automation_debug_chat_load_min_provider_fault_count: "2"
automation_debug_chat_load_expected_prefix: "FAULTQA"
automation_debug_chat_load_prompt_template: '请只回复 "{expected}",不要解释,不要添加其他字符。'
automation_debug_chat_load_stream: "true"
automation_debug_chat_load_reset: "true"
automation_debug_chat_load_fail_on_final_mismatch: "true"
automation_fake_provider_first_token_delay_ms: "25"
automation_fake_provider_chunk_delay_ms: "10"
automation_fake_provider_chunk_count: "0"
automation_fake_provider_fail_first_n: "2"
automation_fake_provider_fail_every_n: "0"
automation_fake_provider_fault_status: "503"
metrics_thresholds_json: '{"response_p95_ms":{"max":5000},"error_rate":{"max":0},"ok_count_min":{"min":6},"fake_provider_fault_count_min":{"min":2}}'
fault_model_json: '{"provider_fault":"HTTP 503 for first 2 fake-provider chat completions after reset","expected_behavior":"LangBot retries or otherwise recovers from bounded provider failures so every Debug Chat request receives its expected response without backend crash."}'
load_profile_json: '{"requests":6,"concurrency":1,"path":"Pipeline Debug Chat WebSocket","provider":"controlled fake OpenAI-compatible provider","classification":"fault-recovery-not-throughput-benchmark"}'
setup_automation:
- "node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
setup_provides_env:
- LANGBOT_FAKE_PROVIDER_URL
- LANGBOT_FAKE_PROVIDER_BASE_URL
- LANGBOT_FAKE_PROVIDER_PID
- LANGBOT_FAKE_PROVIDER_PROVIDER_UUID
- LANGBOT_FAKE_PROVIDER_MODEL_UUID
- LANGBOT_FAKE_PROVIDER_PIPELINE_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
steps:
- "Configure the local fake provider to return HTTP 503 for the first two chat completions after reset."
- "Create or update the LangBot provider, model, and local-agent pipeline that points at the fake provider."
- "Reset the target Debug Chat session and fake-provider request counter."
- "Send a sequential Debug Chat batch and verify later requests recover after the injected provider faults."
checks:
- "automation-result.json status is pass when the fake provider records at least two injected faults, every Debug Chat request succeeds, and total user-visible error rate stays at zero."
- "metrics_summary includes fake_provider_fault_count and status_counts for the same run window."
- "backend logs show request handling for the same run window without unexpected Traceback or task-leak findings."
evidence_required:
- metrics
- network
- api_diagnostic
- filesystem
diagnostics:
- "This is a fault-recovery probe, not a throughput benchmark."
- "Provider faults may be retried inside the provider/requester path; judge this case by fake_provider_fault_count plus user-visible success/error metrics."
- "The profile uses concurrency 1 because Debug Chat broadcasts assistant responses to every connection in a session, and failed responses do not carry the unique success token needed for concurrent attribution."
success_patterns:
- "Debug Chat WebSocket concurrency probe passed"
- "Streaming completed"
failure_patterns:
- "fake_provider_fault"
- "HTTP 503"
- "Timed out after"
- "All models failed during streaming setup"
expected_failures:
- "fake_provider_fault"
- "HTTP 503"
troubleshooting:
- backend-not-listening
- debug-chat-history-contaminates-automation
- local-agent-model-route-unavailable
@@ -0,0 +1,81 @@
id: langbot-fake-provider-debug-chat-load
title: "LangBot Debug Chat controlled fake-provider load probe"
mode: probe
area: performance
type: performance
priority: p1
risk: medium
ci_eligible: false
tags:
- performance
- debug-chat
- websocket
- fake-provider
- load
- metrics
skills:
- langbot-env-setup
- langbot-testing
env:
- LANGBOT_BACKEND_URL
- LANGBOT_FRONTEND_URL
- LANGBOT_E2E_LOGIN_USER
automation: skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs
automation_env:
- LANGBOT_BACKEND_URL
- LANGBOT_E2E_LOGIN_USER
- LANGBOT_FAKE_PROVIDER_PIPELINE_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
automation_pipeline_url_env: LANGBOT_FAKE_PROVIDER_PIPELINE_URL
automation_pipeline_name_env: LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
automation_debug_chat_load_requests: "12"
automation_debug_chat_load_concurrency: "4"
automation_debug_chat_load_timeout_ms: "30000"
automation_debug_chat_load_response_p95_ms: "5000"
automation_debug_chat_load_first_response_p95_ms: "3000"
automation_debug_chat_load_max_error_rate: "0"
automation_debug_chat_load_expected_prefix: "FAKEQA"
automation_debug_chat_load_prompt_template: '请只回复 "{expected}",不要解释,不要添加其他字符。'
automation_debug_chat_load_stream: "true"
automation_debug_chat_load_reset: "true"
metrics_thresholds_json: '{"response_p95_ms":{"max":5000},"first_response_p95_ms":{"max":3000},"error_rate":{"max":0}}'
load_profile_json: '{"requests":12,"concurrency":4,"path":"Pipeline Debug Chat WebSocket","provider":"controlled fake OpenAI-compatible provider","metric":"send-to-final-assistant-response"}'
setup_automation:
- "node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
setup_provides_env:
- LANGBOT_FAKE_PROVIDER_URL
- LANGBOT_FAKE_PROVIDER_BASE_URL
- LANGBOT_FAKE_PROVIDER_PID
- LANGBOT_FAKE_PROVIDER_PROVIDER_UUID
- LANGBOT_FAKE_PROVIDER_MODEL_UUID
- LANGBOT_FAKE_PROVIDER_PIPELINE_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
steps:
- "Start or reuse the local fake OpenAI-compatible provider."
- "Create or update the LangBot provider, model, and local-agent pipeline that points at the fake provider."
- "Reset the target Debug Chat session."
- "Open concurrent WebSocket Debug Chat connections and send unique deterministic prompts through the real backend pipeline."
checks:
- "automation-result.json status is pass when every request receives its own expected assistant response."
- "metrics_summary includes request count, concurrency, p50/p95 response latency, first response latency, throughput, and error rate."
- "thresholds_summary shows response_p95_ms, first_response_p95_ms, and error_rate pass."
evidence_required:
- metrics
- network
- api_diagnostic
- filesystem
diagnostics:
- "This probe removes external model latency from the measurement; it still exercises the live LangBot backend, provider requester, local-agent runner, pipeline, and Debug Chat WebSocket adapter."
- "Use this as the repeatable message-path baseline before comparing against Space or another real provider."
success_patterns:
- "Debug Chat WebSocket concurrency probe passed"
- "Streaming completed"
failure_patterns:
- "WebSocket connection error"
- "Timed out after"
- "Final assistant response did not include"
- "All models failed during streaming setup"
troubleshooting:
- backend-not-listening
- debug-chat-history-contaminates-automation
- local-agent-model-route-unavailable
@@ -0,0 +1,88 @@
id: langbot-fake-provider-debug-chat-slow-load
title: "LangBot Debug Chat slow fake-provider load probe"
mode: probe
area: performance
type: performance
priority: p1
risk: medium
ci_eligible: false
tags:
- performance
- debug-chat
- websocket
- fake-provider
- slow-provider
- load
- metrics
skills:
- langbot-env-setup
- langbot-testing
env:
- LANGBOT_BACKEND_URL
- LANGBOT_FRONTEND_URL
- LANGBOT_E2E_LOGIN_USER
automation: skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs
automation_env:
- LANGBOT_BACKEND_URL
- LANGBOT_E2E_LOGIN_USER
- LANGBOT_FAKE_PROVIDER_PIPELINE_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
automation_pipeline_url_env: LANGBOT_FAKE_PROVIDER_PIPELINE_URL
automation_pipeline_name_env: LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
automation_debug_chat_load_requests: "8"
automation_debug_chat_load_concurrency: "4"
automation_debug_chat_load_timeout_ms: "45000"
automation_debug_chat_load_response_p95_ms: "10000"
automation_debug_chat_load_first_response_p95_ms: "7000"
automation_debug_chat_load_max_error_rate: "0"
automation_debug_chat_load_expected_prefix: "SLOWQA"
automation_debug_chat_load_prompt_template: '请只回复 "{expected}",不要解释,不要添加其他字符。'
automation_debug_chat_load_stream: "true"
automation_debug_chat_load_reset: "true"
automation_fake_provider_first_token_delay_ms: "1000"
automation_fake_provider_chunk_delay_ms: "250"
automation_fake_provider_chunk_count: "4"
automation_fake_provider_fail_first_n: "0"
automation_fake_provider_fail_every_n: "0"
automation_fake_provider_fault_status: "500"
metrics_thresholds_json: '{"response_p95_ms":{"max":10000},"first_response_p95_ms":{"max":7000},"error_rate":{"max":0}}'
load_profile_json: '{"requests":8,"concurrency":4,"path":"Pipeline Debug Chat WebSocket","provider":"controlled slow fake OpenAI-compatible provider","metric":"send-to-final-assistant-response","provider_profile":{"first_token_delay_ms":1000,"chunk_delay_ms":250,"chunk_count":4}}'
setup_automation:
- "node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
setup_provides_env:
- LANGBOT_FAKE_PROVIDER_URL
- LANGBOT_FAKE_PROVIDER_BASE_URL
- LANGBOT_FAKE_PROVIDER_PID
- LANGBOT_FAKE_PROVIDER_PROVIDER_UUID
- LANGBOT_FAKE_PROVIDER_MODEL_UUID
- LANGBOT_FAKE_PROVIDER_PIPELINE_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
steps:
- "Configure the local fake provider with deterministic slow streaming latency."
- "Create or update the LangBot provider, model, and local-agent pipeline that points at the fake provider."
- "Reset the target Debug Chat session."
- "Open concurrent WebSocket Debug Chat connections and send unique deterministic prompts through the real backend pipeline."
checks:
- "automation-result.json status is pass when every request receives its own expected assistant response."
- "metrics_summary shows zero errors under the slow-provider profile."
- "thresholds_summary shows response_p95_ms, first_response_p95_ms, and error_rate pass."
evidence_required:
- metrics
- network
- api_diagnostic
- filesystem
diagnostics:
- "This probe keeps the model deterministic while injecting provider latency, so it catches backend timeout, streaming, and WebSocket backpressure issues without Space variability."
- "Compare with langbot-fake-provider-debug-chat-load to separate fixed LangBot overhead from provider-latency amplification."
success_patterns:
- "Debug Chat WebSocket concurrency probe passed"
- "Streaming completed"
failure_patterns:
- "WebSocket connection error"
- "Timed out after"
- "Final assistant response did not include"
- "All models failed during streaming setup"
troubleshooting:
- backend-not-listening
- debug-chat-history-contaminates-automation
- local-agent-model-route-unavailable
@@ -0,0 +1,35 @@
id: langbot-fault-taxonomy-contract
title: "LangBot fault taxonomy and cleanup contract"
mode: probe
area: reliability
type: chaos
priority: p1
risk: medium
ci_eligible: true
tags:
- reliability
- chaos
- contract
- synthetic
skills:
- langbot-testing
automation: skills/langbot-testing/probes/langbot-fault-taxonomy-contract.mjs
fault_model_json: '{"kind":"taxonomy-contract","destructive":false,"scenarios":["provider-timeout","plugin-runtime-disconnect","mcp-stdio-server-exit","operator-missing-login","transient-marketplace-timeout"]}'
steps:
- "Run `rtk bin/lbs test run langbot-fault-taxonomy-contract --dry-run` first; remove `--dry-run` after checking the evidence directory."
- "Automation validates that representative fault scenarios declare target, injected fault, expected status, recovery check, and cleanup."
- "Review metrics.json, fault-model.json, and automation-result.json under LBS_EVIDENCE_DIR."
checks:
- "automation-result.json status is pass."
- "Every scenario has an expected status in pass, fail, blocked, env_issue, or flaky."
- "Every scenario declares a cleanup action and recovery check."
evidence_required:
- metrics
- filesystem
diagnostics:
- "This is a non-destructive taxonomy contract probe; it does not inject real runtime faults."
- "Use it as a gate before adding live chaos cases that kill runtimes, route traffic through a proxy, or disrupt a backend dependency."
success_patterns:
- "Fault taxonomy contract declares status"
failure_patterns:
- "missing required scenario fields"
@@ -0,0 +1,42 @@
id: langbot-live-backend-latency
title: "LangBot live backend basic latency probe"
mode: probe
area: performance
type: performance
priority: p1
risk: medium
ci_eligible: false
tags:
- performance
- live-backend
- latency
- metrics
skills:
- langbot-testing
env:
- LANGBOT_BACKEND_URL
automation: skills/langbot-testing/probes/langbot-live-backend-latency.mjs
metrics_thresholds_json: '{"backend_p95_ms":{"max":1000},"error_rate":{"max":0}}'
load_profile_json: '{"requests":12,"concurrency":2,"endpoints":["/healthz"]}'
steps:
- "Confirm the selected LangBot backend is the intended test target."
- "Run `rtk bin/lbs test run langbot-live-backend-latency --dry-run` first; remove `--dry-run` after checking LANGBOT_BACKEND_URL and evidence directory."
- "Automation sends a small request batch to LANGBOT_BACKEND_URL/healthz and records latency, status counts, and network errors."
checks:
- "automation-result.json status is pass when the backend responds and p95/error-rate thresholds pass."
- "automation-result.json status is env_issue when the backend is not reachable."
- "metrics.json and network.log are written under LBS_EVIDENCE_DIR."
evidence_required:
- metrics
- network
- api_diagnostic
- filesystem
diagnostics:
- "This probe measures backend health endpoint reachability latency only; it does not cover model/provider, browser, Debug Chat, RAG, or plugin runtime latency."
success_patterns:
- "Live backend latency probe passed"
failure_patterns:
- "Backend did not respond"
- "breached latency or error-rate thresholds"
troubleshooting:
- socks-proxy-without-socksio
@@ -0,0 +1,45 @@
id: langbot-live-backend-log-health
title: "LangBot live backend log health probe"
mode: probe
area: reliability
type: reliability
priority: p1
risk: medium
ci_eligible: false
tags:
- reliability
- live-backend
- backend-log
- metrics
skills:
- langbot-testing
env:
- LANGBOT_BACKEND_URL
automation: skills/langbot-testing/probes/langbot-live-backend-log-health.mjs
metrics_thresholds_json: '{"fail_count":{"max":0}}'
load_profile_json: '{"lookback_seconds":300,"log_source":"LANGBOT_BACKEND_LOG or latest LANGBOT_REPO/data/logs/langbot-*.log"}'
steps:
- "Confirm the selected LangBot backend log belongs to the intended test target."
- "Run `rtk bin/lbs test run langbot-live-backend-log-health --dry-run` first; remove `--dry-run` after checking evidence directory and log source."
- "Automation scans the recent backend log window for fail-severity runtime findings such as Traceback, ImportError, ERROR, unclosed sessions, and unawaited coroutines."
checks:
- "automation-result.json status is pass only when fail_count is 0."
- "metrics_summary includes scanned_line_count, fail_count, warning_count, and finding_count."
- "findings.json and scanned-backend.log are written under LBS_EVIDENCE_DIR."
evidence_required:
- metrics
- backend_log
- filesystem
diagnostics:
- "Set LANGBOT_BACKEND_LOG to an explicit log path when the latest log file is not the run target."
- "Set LANGBOT_BACKEND_LOG_SINCE or LANGBOT_BACKEND_LOG_LOOKBACK_SECONDS to control the scan window."
- "This probe measures runtime log health; it does not prove user-facing Debug Chat, plugin, model, or RAG behavior."
success_patterns:
- "Live backend log health passed"
failure_patterns:
- "Traceback"
- "ImportError"
- "ERROR"
- "unclosed"
troubleshooting:
- socks-proxy-without-socksio
@@ -0,0 +1,44 @@
id: langbot-live-control-plane-api
title: "LangBot live control-plane API probe"
mode: probe
area: performance
type: performance
priority: p1
risk: medium
ci_eligible: false
tags:
- performance
- reliability
- live-backend
- control-plane
- metrics
skills:
- langbot-testing
env:
- LANGBOT_BACKEND_URL
automation: skills/langbot-testing/probes/langbot-live-control-plane-api.mjs
metrics_thresholds_json: '{"error_rate":{"max":0},"response_shape_failures":{"max":0},"healthz_p95_ms":{"max":500},"system_info_p95_ms":{"max":1000}}'
load_profile_json: '{"requests":20,"concurrency":4,"endpoints":["/healthz","/api/v1/system/info"],"auth_required":false}'
steps:
- "Confirm the selected LangBot backend is the intended test target."
- "Run `rtk bin/lbs test run langbot-live-control-plane-api --dry-run` first; remove `--dry-run` after checking LANGBOT_BACKEND_URL and evidence directory."
- "Automation sends a small request batch to /healthz and /api/v1/system/info, then validates status code, JSON shape, and latency budgets."
checks:
- "automation-result.json status is pass when every control-plane request returns HTTP 200, JSON code 0, and required response fields."
- "metrics_summary includes per-endpoint p50/p95 latency, error rate, status counts, and response_shape_failures."
- "thresholds_summary shows error_rate, response_shape_failures, healthz_p95_ms, and system_info_p95_ms all pass."
evidence_required:
- metrics
- network
- api_diagnostic
- filesystem
diagnostics:
- "This probe measures unauthenticated backend control-plane readiness; it does not cover authenticated UI flows, Debug Chat, model calls, plugins, or RAG."
- "A system_info shape failure usually means the API contract or startup state changed and should be investigated before treating latency as healthy."
success_patterns:
- "Live control-plane API probe passed"
failure_patterns:
- "Backend did not respond"
- "breached shape, latency, or error-rate thresholds"
troubleshooting:
- socks-proxy-without-socksio
@@ -0,0 +1,37 @@
id: langbot-overhead-accounting-contract
title: "LangBot overhead accounting metrics contract"
mode: probe
area: performance
type: performance
priority: p1
risk: medium
ci_eligible: true
tags:
- performance
- metrics
- contract
- synthetic
skills:
- langbot-testing
automation: skills/langbot-testing/probes/langbot-overhead-accounting-contract.mjs
metrics_thresholds_json: '{"sample_count":{"min":50},"langbot_overhead_p95_ms":{"max":25},"accounting_gap_max_ms":{"max":0.001}}'
load_profile_json: '{"kind":"synthetic-overhead-accounting","samples":80,"external_latency_segments":["provider","external_tool","network"]}'
steps:
- "Run `rtk bin/lbs test run langbot-overhead-accounting-contract --dry-run` first; remove `--dry-run` after checking the evidence directory."
- "Automation generates deterministic message-path latency samples and separates LangBot overhead from provider/tool/network latency."
- "Review metrics.json, thresholds.json, resource-log.json, and automation-result.json under LBS_EVIDENCE_DIR."
checks:
- "automation-result.json status is pass."
- "metrics_summary includes sample_count, langbot_overhead_p95_ms, e2e_latency_p95_ms, external_latency_p95_ms, and accounting_gap_max_ms."
- "thresholds_summary shows sample_count, langbot_overhead_p95_ms, and accounting_gap_max_ms all pass."
evidence_required:
- metrics
- resource_log
- filesystem
diagnostics:
- "This is a synthetic contract probe for the QA harness; it is not live product performance."
- "Use it to verify that reports can carry overhead accounting metrics before running live backend or browser performance probes."
success_patterns:
- "Overhead accounting contract passed"
failure_patterns:
- "breached one or more thresholds"
@@ -0,0 +1,84 @@
id: langbot-space-debug-chat-concurrency-smoke
title: "LangBot Debug Chat real Space-provider concurrency smoke"
mode: probe
area: performance
type: performance
priority: p1
risk: high
ci_eligible: false
tags:
- performance
- debug-chat
- websocket
- space
- live-provider
- smoke
- metrics
skills:
- langbot-env-setup
- langbot-testing
env:
- LANGBOT_BACKEND_URL
- LANGBOT_FRONTEND_URL
- LANGBOT_E2E_LOGIN_USER
automation: skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs
automation_env:
- LANGBOT_BACKEND_URL
- LANGBOT_E2E_LOGIN_USER
- LANGBOT_LOCAL_AGENT_PIPELINE_URL
- LANGBOT_LOCAL_AGENT_PIPELINE_NAME
automation_pipeline_url_env: LANGBOT_LOCAL_AGENT_PIPELINE_URL
automation_pipeline_name_env: LANGBOT_LOCAL_AGENT_PIPELINE_NAME
automation_debug_chat_load_requests: "3"
automation_debug_chat_load_concurrency: "2"
automation_debug_chat_load_timeout_ms: "120000"
automation_debug_chat_load_response_p95_ms: "120000"
automation_debug_chat_load_max_error_rate: "0"
automation_debug_chat_load_expected_prefix: "SPACEQA"
automation_debug_chat_load_prompt_template: '请只回复 "{expected}",不要解释,不要添加其他字符。'
automation_debug_chat_load_stream: "true"
automation_debug_chat_load_reset: "true"
metrics_thresholds_json: '{"response_p95_ms":{"max":120000},"error_rate":{"max":0}}'
load_profile_json: '{"requests":3,"concurrency":2,"path":"Pipeline Debug Chat WebSocket","provider":"LangBot Space model route","metric":"send-to-final-assistant-response","classification":"smoke-not-benchmark"}'
setup_automation:
- "node:scripts/e2e/ensure-local-agent-pipeline.mjs --write-env"
setup_provides_env:
- LANGBOT_PIPELINE_URL
- LANGBOT_PIPELINE_NAME
- LANGBOT_LOCAL_AGENT_PIPELINE_URL
- LANGBOT_LOCAL_AGENT_PIPELINE_NAME
- LANGBOT_LOCAL_AGENT_MODEL_UUID
- LANGBOT_E2E_MODEL_UUID
preconditions:
- "The selected local LangBot instance is safe for a low-volume real Space model smoke run."
- "Treat Space/provider/network failures as environment or dependency findings until fake-provider baseline evidence separates LangBot overhead."
steps:
- "Prepare a local-agent pipeline with a tested Space model and fallback models."
- "Reset the target Debug Chat session."
- "Open a small number of concurrent WebSocket Debug Chat connections and send unique deterministic prompts through the live Space provider path."
checks:
- "automation-result.json status is pass when every request receives its own expected assistant response."
- "metrics_summary includes request count, concurrency, p95 response latency, throughput, and error rate."
- "The report classifies the result as a live-provider smoke, not a stable LangBot overhead benchmark."
evidence_required:
- metrics
- network
- api_diagnostic
- filesystem
diagnostics:
- "This probe measures real user-path latency through Space and includes provider latency, model behavior, and network effects."
- "Compare with langbot-fake-provider-debug-chat-load before attributing slow or failed runs to LangBot itself."
success_patterns:
- "Debug Chat WebSocket concurrency probe passed"
- "Streaming completed"
failure_patterns:
- "invalid api key"
- "WebSocket connection error"
- "Timed out after"
- "Final assistant response did not include"
- "All models failed during streaming setup"
troubleshooting:
- local-agent-model-route-unavailable
- marketplace-network-flaky
- proxy-env-mismatch
- telemetry-proxy-noise
@@ -0,0 +1,80 @@
id: pipeline-debug-chat-performance
title: "Pipeline Debug Chat user-path performance probe"
mode: agent-browser
area: pipeline
type: performance
priority: p1
risk: medium
ci_eligible: false
tags:
- performance
- pipeline
- debug-chat
- user-path
- metrics
skills:
- langbot-env-setup
- langbot-testing
env:
- LANGBOT_FRONTEND_URL
- LANGBOT_BACKEND_URL
env_any:
- LANGBOT_PIPELINE_URL|LANGBOT_PIPELINE_NAME
automation: scripts/e2e/pipeline-debug-chat.mjs
automation_env:
- LANGBOT_FRONTEND_URL
- LANGBOT_BACKEND_URL
- LANGBOT_BROWSER_PROFILE
- LANGBOT_CHROMIUM_EXECUTABLE
- LANGBOT_E2E_PROMPT
- LANGBOT_E2E_EXPECTED_TEXT
- LANGBOT_E2E_RESPONSE_TIMEOUT_MS
automation_env_any:
- LANGBOT_PIPELINE_URL|LANGBOT_PIPELINE_NAME
automation_prompt: "请只回复 OK,用于性能测试。"
automation_expected_text: "OK"
automation_response_timeout_ms: "120000"
automation_reset_debug_chat: "true"
automation_debug_chat_response_p95_ms: "120000"
automation_debug_chat_max_error_rate: "0"
metrics_thresholds_json: '{"response_p95_ms":{"max":120000},"error_rate":{"max":0}}'
load_profile_json: '{"prompts":1,"browser":true,"path":"Pipeline Debug Chat","metric":"send-to-visible-completion"}'
setup_automation:
- "node:scripts/e2e/ensure-local-agent-pipeline.mjs --write-env"
setup_provides_env:
- LANGBOT_PIPELINE_URL
- LANGBOT_PIPELINE_NAME
preconditions:
- "LANGBOT_PIPELINE_URL or LANGBOT_PIPELINE_NAME points to the pipeline intended for this Debug Chat performance run."
- "The target pipeline is safe to reset Debug Chat history for this run."
- "The target pipeline has a known-good runner/model; provider latency should be interpreted separately from LangBot overhead."
steps:
- "Open LANGBOT_FRONTEND_URL with the prepared browser profile."
- "Open the target pipeline and select Debug Chat."
- "Reset Debug Chat history through the backend API when configured."
- "Send the deterministic prompt and wait for the expected assistant response."
checks:
- "automation-result.json status is pass when the expected assistant response appears."
- "metrics_summary includes response_p50_ms, response_p95_ms, error_rate, and total_duration_ms."
- "thresholds_summary shows response_p95_ms and error_rate pass."
evidence_required:
- ui
- screenshot
- console
- network
- metrics
diagnostics:
- "This case measures browser-visible send-to-completion latency; it does not split provider latency from LangBot overhead."
- "Use backend logs and provider diagnostics to explain slow runs before calling them LangBot regressions."
success_patterns:
- "Processing request from person_websocket"
- "Streaming completed"
failure_patterns:
- "Action invoke_llm_stream call timed out"
- "Task exception was never retrieved"
- "All models failed during streaming setup"
troubleshooting:
- debug-chat-history-contaminates-automation
- local-agent-model-route-unavailable
- plugin-runtime-timeout
- proxy-env-mismatch
@@ -1 +1,3 @@
dist/
dist/*
!dist/
!dist/qa-plugin-smoke-0.1.0.lbpkg
@@ -0,0 +1,837 @@
#!/usr/bin/env node
import crypto from "node:crypto";
import net from "node:net";
import tls from "node:tls";
import { mkdir, writeFile } from "node:fs/promises";
import { join, resolve } from "node:path";
import { env, exit } from "node:process";
import {
apiJson,
appendLine,
ensureEvidence,
evidencePaths,
loadEnvFiles,
localIsoWithOffset,
redact,
resetAndAuthLocalUser,
writeResult,
} from "../../../scripts/e2e/lib/langbot-e2e.mjs";
import {
buildProviderTimingMetrics,
summarizeFakeProviderState,
} from "./lib/fake-provider-timing.mjs";
const DEFAULT_LOCAL_PASSWORD = "LangBotE2ELocalPass!2026";
await loadEnvFiles();
const caseId = env.LBS_CASE_ID || "langbot-debug-chat-concurrency";
const paths = evidencePaths(caseId);
await ensureEvidence(paths);
const startedAt = new Date();
const metricsPath = resolve(paths.evidenceDir, "metrics.json");
const samplesPath = resolve(paths.evidenceDir, "samples.json");
const fakeProviderStatePath = resolve(paths.evidenceDir, "fake-provider-state.json");
const resetDiagnosticPath = resolve(paths.evidenceDir, "debug-chat-reset-diagnostic.json");
const backendUrl = env.LANGBOT_BACKEND_URL || "";
const fakeProviderUrl = env.LANGBOT_FAKE_PROVIDER_URL || "";
const pipelineUrl = env.LANGBOT_E2E_PIPELINE_URL || env.LANGBOT_PIPELINE_URL || "";
const pipelineName = env.LANGBOT_E2E_PIPELINE_NAME || env.LANGBOT_PIPELINE_NAME || "";
const sessionType = env.LANGBOT_DEBUG_CHAT_LOAD_SESSION_TYPE || env.LANGBOT_E2E_DEBUG_CHAT_SESSION_TYPE || "person";
const totalRequests = positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_REQUESTS, defaultRequests(caseId));
const concurrency = Math.min(totalRequests, positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_CONCURRENCY, defaultConcurrency(caseId)));
const timeoutMs = positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_TIMEOUT_MS, defaultTimeout(caseId));
const expectedPrefix = env.LANGBOT_DEBUG_CHAT_LOAD_EXPECTED_PREFIX || "LBQA";
const promptTemplate = env.LANGBOT_DEBUG_CHAT_LOAD_PROMPT_TEMPLATE
|| "请只回复 \"{expected}\",不要解释,不要添加其他字符。";
const stream = bool(env.LANGBOT_DEBUG_CHAT_LOAD_STREAM, true);
const resetBeforeRun = bool(env.LANGBOT_DEBUG_CHAT_LOAD_RESET, true);
const responseP95BudgetMs = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_RESPONSE_P95_MS, defaultP95Budget(caseId));
const firstResponseP95BudgetMs = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_FIRST_RESPONSE_P95_MS, 0);
const maxErrorRate = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_MAX_ERROR_RATE, 0);
const minErrorRate = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_MIN_ERROR_RATE, 0);
const minErrorCount = nonNegativeInteger(env.LANGBOT_DEBUG_CHAT_LOAD_MIN_ERROR_COUNT, 0);
const minOkCount = nonNegativeInteger(env.LANGBOT_DEBUG_CHAT_LOAD_MIN_OK_COUNT, 0);
const minProviderFaultCount = nonNegativeInteger(env.LANGBOT_DEBUG_CHAT_LOAD_MIN_PROVIDER_FAULT_COUNT, 0);
const failOnFinalMismatch = bool(env.LANGBOT_DEBUG_CHAT_LOAD_FAIL_ON_FINAL_MISMATCH, false);
const failureSignals = textList(env.LANGBOT_E2E_FAILURE_SIGNALS || env.LANGBOT_DEBUG_CHAT_LOAD_FAILURE_SIGNALS || "");
const result = {
source: "automation",
case_id: caseId,
run_id: paths.runId,
status: "fail",
reason: "",
started_at: startedAt.toISOString(),
started_at_local: localIsoWithOffset(startedAt),
finished_at: "",
finished_at_local: "",
duration_ms: 0,
backend_url: backendUrl,
pipeline_url: pipelineUrl,
pipeline_name: pipelineName,
pipeline_id: "",
session_type: sessionType,
load_profile: {
requests: totalRequests,
concurrency,
timeout_ms: timeoutMs,
stream,
reset_before_run: resetBeforeRun,
fail_on_final_mismatch: failOnFinalMismatch,
},
evidence: {
network_log: paths.networkLog,
metrics_json: metricsPath,
samples_json: samplesPath,
fake_provider_state_json: fakeProviderStatePath,
debug_chat_reset_diagnostic_json: resetDiagnosticPath,
automation_result_json: paths.automationResultJson,
result_json: paths.resultJson,
},
evidence_collected: ["metrics", "network", "api_diagnostic", "filesystem"],
};
try {
if (!backendUrl) {
result.status = "env_issue";
throw new Error("LANGBOT_BACKEND_URL is not configured.");
}
if (!["person", "group"].includes(sessionType)) {
throw new Error(`LANGBOT_DEBUG_CHAT_LOAD_SESSION_TYPE must be person or group, got ${sessionType}.`);
}
const backendReady = await backendReachable(backendUrl);
if (!backendReady) {
result.status = "env_issue";
throw new Error(`Backend did not respond at ${backendUrl}.`);
}
const user = env.LANGBOT_E2E_LOGIN_USER || "";
const password = env.LANGBOT_E2E_LOGIN_PASSWORD || DEFAULT_LOCAL_PASSWORD;
if (!user) {
result.status = "env_issue";
throw new Error("LANGBOT_E2E_LOGIN_USER is required so this probe can resolve/reset the Debug Chat session.");
}
const auth = await resetAndAuthLocalUser({ backendUrl, user, password });
const pipeline = await resolvePipeline({ backendUrl, token: auth.token, pipelineUrl, pipelineName });
result.pipeline_id = pipeline.id;
result.pipeline_name = pipeline.name || pipelineName;
if (!result.pipeline_url && env.LANGBOT_FRONTEND_URL) {
result.pipeline_url = `${env.LANGBOT_FRONTEND_URL.replace(/\/$/, "")}/home/pipelines?id=${encodeURIComponent(pipeline.id)}`;
}
if (resetBeforeRun) {
const reset = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(pipeline.id)}/ws/reset/${encodeURIComponent(sessionType)}`, {
method: "POST",
token: auth.token,
});
const resetDiagnostic = {
status: isApiFailure(reset) ? "fail" : "ready",
http_status: reset.status,
code: reset.json.code ?? null,
reason: isApiFailure(reset) ? reset.json.msg || "Debug Chat reset failed." : "Debug Chat session reset.",
};
await writeFile(resetDiagnosticPath, `${JSON.stringify(resetDiagnostic, null, 2)}\n`, "utf8");
if (resetDiagnostic.status === "fail") {
throw new Error(resetDiagnostic.reason);
}
}
const wsUrl = websocketUrl(backendUrl, pipeline.id, sessionType);
const loadStartedAt = performance.now();
const samples = await runLoad({
wsUrl,
totalRequests,
concurrency,
timeoutMs,
promptTemplate,
expectedPrefix,
stream,
failOnFinalMismatch,
failureSignals,
});
const loadDurationMs = performance.now() - loadStartedAt;
const fakeProviderState = await readFakeProviderState(fakeProviderUrl);
if (fakeProviderState) {
await writeFile(fakeProviderStatePath, `${JSON.stringify(fakeProviderState, null, 2)}\n`, "utf8");
}
const metrics = buildMetrics({
samples,
totalRequests,
concurrency,
timeoutMs,
loadDurationMs,
backendUrl,
pipelineId: pipeline.id,
sessionType,
fakeProviderState,
});
const thresholds = buildThresholds(metrics);
const passed = Object.values(thresholds).every((item) => item.pass);
result.status = passed ? "pass" : "fail";
result.reason = passed
? "Debug Chat WebSocket concurrency probe passed all thresholds."
: "Debug Chat WebSocket concurrency probe breached latency or error-rate thresholds.";
result.metrics_summary = {
requests: metrics.total_requests,
concurrency: metrics.concurrency,
ok_count: metrics.ok_count,
error_count: metrics.error_count,
timeout_count: metrics.timeout_count,
error_rate: metrics.error_rate,
response_p50_ms: metrics.response_duration_ms.p50,
response_p95_ms: metrics.response_duration_ms.p95,
first_assistant_event_p95_ms: metrics.first_assistant_event_ms.p95,
first_assistant_content_p95_ms: metrics.first_assistant_content_ms.p95,
first_response_p95_ms: metrics.first_response_ms.p95,
throughput_rps: metrics.throughput_rps,
status_counts: metrics.status_counts,
fake_provider_request_count: metrics.fake_provider?.request_count ?? null,
fake_provider_fault_count: metrics.fake_provider?.fault_count ?? null,
fake_provider_duration_p95_ms: metrics.provider_timing?.provider_duration_ms.p95 ?? null,
langbot_overhead_estimate_p95_ms: metrics.provider_timing?.langbot_overhead_estimate_ms.p95 ?? null,
send_to_provider_start_p95_ms: metrics.provider_timing?.send_to_provider_start_ms.p95 ?? null,
provider_finish_to_ws_final_p95_ms: metrics.provider_timing?.provider_finish_to_ws_final_ms.p95 ?? null,
provider_timing_matched_request_count: metrics.provider_timing?.matched_request_count ?? null,
};
result.thresholds_summary = thresholds;
result.artifacts = {
metrics_json: metricsPath,
samples_json: samplesPath,
fake_provider_state_json: fakeProviderState ? fakeProviderStatePath : "",
network_log: paths.networkLog,
automation_result_json: paths.automationResultJson,
result_json: paths.resultJson,
};
await writeFile(metricsPath, `${JSON.stringify({ ...metrics, thresholds }, null, 2)}\n`, "utf8");
await writeFile(samplesPath, `${JSON.stringify(samples, null, 2)}\n`, "utf8");
} catch (error) {
if (!["env_issue", "blocked"].includes(result.status)) {
result.status = looksLikeEnvIssue(error) ? "env_issue" : "fail";
}
result.reason = result.reason || safeReason(error.message);
} finally {
const finishedAt = new Date();
result.finished_at = finishedAt.toISOString();
result.finished_at_local = localIsoWithOffset(finishedAt);
result.duration_ms = finishedAt.getTime() - startedAt.getTime();
await mkdir(paths.evidenceDir, { recursive: true });
await writeResult(paths, result);
console.log(JSON.stringify(result, null, 2));
}
exit(result.status === "pass" ? 0 : result.status === "env_issue" || result.status === "blocked" ? 2 : 1);
function defaultRequests(id) {
return id.includes("space") ? 3 : 12;
}
function defaultConcurrency(id) {
return id.includes("space") ? 1 : 4;
}
function defaultTimeout(id) {
return id.includes("space") ? 120_000 : 30_000;
}
function defaultP95Budget(id) {
return id.includes("space") ? 120_000 : 5_000;
}
function positiveInteger(value, fallback) {
const parsed = Number.parseInt(String(value || ""), 10);
return Number.isInteger(parsed) && parsed > 0 ? parsed : fallback;
}
function nonNegativeInteger(value, fallback) {
const parsed = Number.parseInt(String(value ?? ""), 10);
return Number.isInteger(parsed) && parsed >= 0 ? parsed : fallback;
}
function positiveNumber(value, fallback) {
const parsed = Number(value || "");
return Number.isFinite(parsed) && parsed >= 0 ? parsed : fallback;
}
function bool(value, fallback) {
if (value === undefined || value === "") return fallback;
if (/^(1|true|yes|on)$/i.test(String(value))) return true;
if (/^(0|false|no|off)$/i.test(String(value))) return false;
return fallback;
}
function textList(value) {
return String(value || "")
.split(/\r?\n|,/)
.map((item) => item.trim())
.filter(Boolean);
}
async function backendReachable(baseUrl) {
try {
const response = await fetch(`${baseUrl.replace(/\/$/, "")}/healthz`, {
signal: AbortSignal.timeout(3000),
});
return response.status < 500;
} catch {
return false;
}
}
async function readFakeProviderState(rootUrl) {
if (!rootUrl) return null;
try {
const response = await fetch(`${normalizeProviderRootUrl(rootUrl)}/__qa/config`, {
signal: AbortSignal.timeout(3000),
});
const json = await response.json().catch(() => ({}));
return {
status: response.ok && json.ok === true ? "loaded" : "unavailable",
url: normalizeProviderRootUrl(rootUrl),
http_status: response.status,
model: json.model || "",
config: json.config || {},
request_count: Number.isFinite(json.request_count) ? json.request_count : null,
recent_requests: Array.isArray(json.recent_requests) ? json.recent_requests : [],
};
} catch (error) {
return {
status: "unavailable",
url: normalizeProviderRootUrl(rootUrl),
reason: safeReason(error.message),
request_count: null,
recent_requests: [],
};
}
}
function normalizeProviderRootUrl(value) {
const trimmed = String(value || "").trim().replace(/\/$/, "");
return trimmed.endsWith("/v1") ? trimmed.slice(0, -3) : trimmed;
}
function pipelineIdFromUrl(url) {
if (!url) return "";
try {
const parsed = new URL(url);
return parsed.searchParams.get("id") || "";
} catch {
return "";
}
}
async function resolvePipeline({ backendUrl, token, pipelineUrl, pipelineName }) {
const idFromUrl = pipelineIdFromUrl(pipelineUrl);
if (idFromUrl) {
const response = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(idFromUrl)}`, { token });
const pipeline = response.json.data?.pipeline;
if (isApiFailure(response) || !pipeline?.uuid) {
throw new Error(response.json.msg || `Could not load pipeline ${idFromUrl}.`);
}
return { id: pipeline.uuid, name: pipeline.name || "" };
}
if (!pipelineName) {
throw new Error("Set LANGBOT_E2E_PIPELINE_URL or LANGBOT_E2E_PIPELINE_NAME before running this probe.");
}
const response = await apiJson(backendUrl, "/api/v1/pipelines", { token });
if (isApiFailure(response)) {
throw new Error(response.json.msg || "Failed to list pipelines.");
}
const pipeline = (response.json.data?.pipelines || []).find((item) => item.name === pipelineName);
if (!pipeline?.uuid) {
throw new Error(`Could not find pipeline named ${pipelineName}.`);
}
return { id: pipeline.uuid, name: pipeline.name || pipelineName };
}
function isApiFailure(response) {
return response.status >= 400 || (response.json.code !== undefined && response.json.code !== 0);
}
function websocketUrl(baseUrl, pipelineId, sessionType) {
const parsed = new URL(baseUrl);
parsed.protocol = parsed.protocol === "https:" ? "wss:" : "ws:";
parsed.pathname = `/api/v1/pipelines/${encodeURIComponent(pipelineId)}/ws/connect`;
parsed.search = `?session_type=${encodeURIComponent(sessionType)}`;
return parsed.toString();
}
async function runLoad(options) {
const samples = [];
let nextIndex = 0;
const workers = Array.from({ length: options.concurrency }, async () => {
while (nextIndex < options.totalRequests) {
const index = nextIndex;
nextIndex += 1;
const sample = await runSingleRequest({ ...options, index });
samples.push(sample);
}
});
await Promise.all(workers);
return samples.sort((left, right) => left.index - right.index);
}
function expectedForIndex(prefix, index) {
return `${prefix}-${String(index + 1).padStart(4, "0")}`;
}
function promptForIndex(template, expected) {
return template.replaceAll("{expected}", expected);
}
function runSingleRequest({
wsUrl,
index,
timeoutMs,
promptTemplate,
expectedPrefix,
stream,
failOnFinalMismatch,
failureSignals,
}) {
return new Promise((resolve) => {
const expected = expectedForIndex(expectedPrefix, index);
const prompt = promptForIndex(promptTemplate, expected);
const sample = {
index,
status: "running",
ok: false,
expected_text: expected,
prompt,
response_text: "",
started_at: new Date().toISOString(),
started_epoch_ms: Date.now(),
connected_at: null,
connected_epoch_ms: null,
sent_at: null,
sent_epoch_ms: null,
first_assistant_event_at: null,
first_assistant_event_epoch_ms: null,
first_assistant_event_ms: null,
first_assistant_content_at: null,
first_assistant_content_epoch_ms: null,
first_assistant_content_ms: null,
first_response_at: null,
first_response_epoch_ms: null,
connected_ms: null,
first_response_ms: null,
response_duration_ms: null,
finished_at: null,
finished_epoch_ms: null,
event_count: 0,
foreign_response_count: 0,
last_foreign_response_text: "",
error: "",
close_code: null,
close_reason: "",
};
let closed = false;
let connectedAt = 0;
let sentAt = 0;
const startedAt = performance.now();
let client = null;
const timer = setTimeout(() => {
finish("timeout", `Timed out after ${timeoutMs} ms.`);
}, timeoutMs);
client = openRawWebSocket(wsUrl, {
onOpen() {
connectedAt = performance.now();
const now = Date.now();
sample.connected_at = new Date(now).toISOString();
sample.connected_epoch_ms = now;
sample.connected_ms = rounded(connectedAt - startedAt);
},
onMessage(text) {
sample.event_count += 1;
let data;
try {
data = JSON.parse(String(text || ""));
} catch (error) {
finish("error", `Invalid WebSocket JSON: ${error.message}`);
return;
}
appendLine(paths.networkLog, JSON.stringify({
request_index: index,
type: data.type,
session_type: data.session_type || "",
role: data.data?.role || "",
is_final: data.data?.is_final ?? null,
content_preview: redact(String(data.data?.content || data.message || "").slice(0, 200)),
})).catch(() => {});
if (data.type === "connected") {
sentAt = performance.now();
const now = Date.now();
sample.sent_at = new Date(now).toISOString();
sample.sent_epoch_ms = now;
client.send(JSON.stringify({
type: "message",
message: [{ type: "Plain", text: prompt }],
stream,
}));
return;
}
if (data.type === "error") {
finish("error", data.message || "WebSocket error message.");
return;
}
if (data.type !== "response" || data.data?.role !== "assistant") return;
const content = String(data.data.content || "");
markFirstAssistantEvent(sample, sentAt);
if (content) sample.response_text = content;
if (content) markFirstAssistantContent(sample, sentAt);
if (content.includes(expected) && sample.first_response_ms === null && sentAt > 0) {
const now = Date.now();
sample.first_response_at = new Date(now).toISOString();
sample.first_response_epoch_ms = now;
sample.first_response_ms = rounded(performance.now() - sentAt);
}
if (data.data.is_final === true) {
const ok = sample.response_text.includes(expected);
if (ok) {
if (sample.first_response_ms === null && sentAt > 0) {
sample.first_response_ms = rounded(performance.now() - sentAt);
}
finish("pass", "");
} else if (matchesFailureSignal(sample.response_text, failureSignals)) {
finish("app_error", `Assistant final response matched a failure signal: ${sample.response_text}`);
} else if (failOnFinalMismatch && !containsLoadToken(sample.response_text, expectedPrefix)) {
finish("mismatch", `Final assistant response did not include ${expected}: ${sample.response_text}`);
} else {
sample.foreign_response_count += 1;
sample.last_foreign_response_text = sample.response_text;
}
}
},
onError(error) {
finish("connection_error", `WebSocket connection error: ${error.message}`);
},
onClose(event) {
sample.close_code = event.code;
sample.close_reason = event.reason || "";
if (!closed) finish("closed", `WebSocket closed before final assistant response: ${event.code}`);
},
});
function finish(status, reason) {
if (closed) return;
closed = true;
clearTimeout(timer);
sample.status = status;
sample.ok = status === "pass";
sample.error = status === "timeout" && sample.foreign_response_count > 0
? `${reason || ""} Saw ${sample.foreign_response_count} foreign assistant response(s); last=${sample.last_foreign_response_text}`
: reason || "";
if (sentAt > 0) sample.response_duration_ms = rounded(performance.now() - sentAt);
else sample.response_duration_ms = rounded(performance.now() - startedAt);
const now = Date.now();
sample.finished_at = new Date(now).toISOString();
sample.finished_epoch_ms = now;
try {
client?.close();
} catch {
// Closing a failed socket should not hide the sample result.
}
resolve(sample);
}
});
}
function markFirstAssistantEvent(sample, sentAt) {
if (sample.first_assistant_event_ms !== null || sentAt <= 0) return;
const now = Date.now();
sample.first_assistant_event_at = new Date(now).toISOString();
sample.first_assistant_event_epoch_ms = now;
sample.first_assistant_event_ms = rounded(performance.now() - sentAt);
}
function markFirstAssistantContent(sample, sentAt) {
if (sample.first_assistant_content_ms !== null || sentAt <= 0) return;
const now = Date.now();
sample.first_assistant_content_at = new Date(now).toISOString();
sample.first_assistant_content_epoch_ms = now;
sample.first_assistant_content_ms = rounded(performance.now() - sentAt);
}
function containsLoadToken(text, prefix) {
const escaped = String(prefix).replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
return new RegExp(`${escaped}-\\d{4}`).test(String(text || ""));
}
function matchesFailureSignal(text, signals) {
const lower = String(text || "").toLowerCase();
return signals.some((signal) => lower.includes(signal.toLowerCase()));
}
function openRawWebSocket(wsUrl, handlers) {
const parsed = new URL(wsUrl);
const secure = parsed.protocol === "wss:";
const port = Number(parsed.port || (secure ? 443 : 80));
const host = parsed.hostname;
const path = `${parsed.pathname}${parsed.search}`;
const key = crypto.randomBytes(16).toString("base64");
const socket = secure
? tls.connect({ host, port, servername: host })
: net.connect({ host, port });
let opened = false;
let closed = false;
let buffer = Buffer.alloc(0);
socket.setNoDelay(true);
socket.on("connect", () => {
const originProtocol = secure ? "https" : "http";
const request = [
`GET ${path} HTTP/1.1`,
`Host: ${parsed.host}`,
"Upgrade: websocket",
"Connection: Upgrade",
`Sec-WebSocket-Key: ${key}`,
"Sec-WebSocket-Version: 13",
`Origin: ${originProtocol}://${parsed.host}`,
"",
"",
].join("\r\n");
socket.write(request);
});
socket.on("data", (chunk) => {
buffer = Buffer.concat([buffer, chunk]);
if (!opened) {
const headerEnd = buffer.indexOf("\r\n\r\n");
if (headerEnd === -1) return;
const headerText = buffer.slice(0, headerEnd).toString("utf8");
buffer = buffer.slice(headerEnd + 4);
if (!/^HTTP\/1\.1 101\b/i.test(headerText)) {
handlers.onError(new Error(`Handshake failed: ${headerText.split("\r\n")[0] || "missing status"}`));
socket.destroy();
return;
}
opened = true;
handlers.onOpen();
}
processFrames();
});
socket.on("error", (error) => {
if (!closed) handlers.onError(error);
});
socket.on("close", () => {
if (closed) return;
closed = true;
handlers.onClose({ code: null, reason: "" });
});
function processFrames() {
while (true) {
const frame = readFrame(buffer);
if (!frame) return;
buffer = buffer.slice(frame.consumed);
if (frame.opcode === 0x1) {
handlers.onMessage(frame.payload.toString("utf8"));
} else if (frame.opcode === 0x8) {
const code = frame.payload.length >= 2 ? frame.payload.readUInt16BE(0) : null;
const reason = frame.payload.length > 2 ? frame.payload.slice(2).toString("utf8") : "";
closed = true;
handlers.onClose({ code, reason });
socket.end();
return;
} else if (frame.opcode === 0x9) {
writeFrame(socket, 0xA, frame.payload);
}
}
}
return {
send(text) {
if (closed || !opened) return;
writeFrame(socket, 0x1, Buffer.from(text, "utf8"));
},
close() {
if (closed) return;
closed = true;
if (!socket.destroyed) {
if (opened) writeFrame(socket, 0x8, Buffer.alloc(0));
setTimeout(() => socket.end(), 50).unref();
}
},
};
}
function readFrame(buffer) {
if (buffer.length < 2) return null;
const first = buffer[0];
const second = buffer[1];
const opcode = first & 0x0f;
const masked = Boolean(second & 0x80);
let length = second & 0x7f;
let offset = 2;
if (length === 126) {
if (buffer.length < offset + 2) return null;
length = buffer.readUInt16BE(offset);
offset += 2;
} else if (length === 127) {
if (buffer.length < offset + 8) return null;
const high = buffer.readUInt32BE(offset);
const low = buffer.readUInt32BE(offset + 4);
length = high * 2 ** 32 + low;
offset += 8;
}
let mask = null;
if (masked) {
if (buffer.length < offset + 4) return null;
mask = buffer.slice(offset, offset + 4);
offset += 4;
}
if (buffer.length < offset + length) return null;
let payload = buffer.slice(offset, offset + length);
if (mask) {
payload = Buffer.from(payload);
for (let index = 0; index < payload.length; index += 1) {
payload[index] ^= mask[index % 4];
}
}
return {
opcode,
payload,
consumed: offset + length,
};
}
function writeFrame(socket, opcode, payload) {
const body = Buffer.isBuffer(payload) ? payload : Buffer.from(payload || "");
const mask = crypto.randomBytes(4);
const headerLength = body.length < 126 ? 2 : body.length <= 0xffff ? 4 : 10;
const header = Buffer.alloc(headerLength);
header[0] = 0x80 | opcode;
if (body.length < 126) {
header[1] = 0x80 | body.length;
} else if (body.length <= 0xffff) {
header[1] = 0x80 | 126;
header.writeUInt16BE(body.length, 2);
} else {
header[1] = 0x80 | 127;
header.writeUInt32BE(Math.floor(body.length / 2 ** 32), 2);
header.writeUInt32BE(body.length >>> 0, 6);
}
const masked = Buffer.from(body);
for (let index = 0; index < masked.length; index += 1) {
masked[index] ^= mask[index % 4];
}
socket.write(Buffer.concat([header, mask, masked]));
}
function rounded(value) {
return Number(value.toFixed(3));
}
function percentile(values, percentileValue) {
if (values.length === 0) return 0;
const sorted = [...values].sort((a, b) => a - b);
const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
return rounded(sorted[index]);
}
function stats(values) {
if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
return {
min: rounded(Math.min(...values)),
p50: percentile(values, 50),
p95: percentile(values, 95),
p99: percentile(values, 99),
max: rounded(Math.max(...values)),
};
}
function buildMetrics({ samples, totalRequests, concurrency, timeoutMs, loadDurationMs, backendUrl, pipelineId, sessionType, fakeProviderState }) {
const okSamples = samples.filter((sample) => sample.ok);
const statusCounts = {};
for (const sample of samples) {
statusCounts[sample.status] = (statusCounts[sample.status] || 0) + 1;
}
const errorCount = samples.length - okSamples.length;
return {
probe: caseId,
backend_url: backendUrl,
pipeline_id: pipelineId,
session_type: sessionType,
total_requests: totalRequests,
completed_requests: samples.length,
concurrency,
timeout_ms: timeoutMs,
ok_count: okSamples.length,
error_count: errorCount,
timeout_count: samples.filter((sample) => sample.status === "timeout").length,
error_rate: samples.length === 0 ? 1 : rounded(errorCount / samples.length),
load_duration_ms: rounded(loadDurationMs),
throughput_rps: loadDurationMs <= 0 ? 0 : rounded(okSamples.length / (loadDurationMs / 1000)),
status_counts: statusCounts,
connected_ms: stats(samples.map((sample) => sample.connected_ms).filter(Number.isFinite)),
first_assistant_event_ms: stats(samples.map((sample) => sample.first_assistant_event_ms).filter(Number.isFinite)),
first_assistant_content_ms: stats(samples.map((sample) => sample.first_assistant_content_ms).filter(Number.isFinite)),
first_response_ms: stats(okSamples.map((sample) => sample.first_response_ms).filter(Number.isFinite)),
response_duration_ms: stats(okSamples.map((sample) => sample.response_duration_ms).filter(Number.isFinite)),
fake_provider: summarizeFakeProviderState(fakeProviderState),
provider_timing: buildProviderTimingMetrics(samples, fakeProviderState),
samples,
};
}
function buildThresholds(metrics) {
const thresholds = {
error_rate: { actual: metrics.error_rate, max: maxErrorRate, pass: metrics.error_rate <= maxErrorRate },
response_p95_ms: {
actual: metrics.response_duration_ms.p95,
max: responseP95BudgetMs,
pass: metrics.ok_count > 0 && metrics.response_duration_ms.p95 <= responseP95BudgetMs,
},
};
if (minErrorRate > 0) {
thresholds.error_rate_min = {
actual: metrics.error_rate,
min: minErrorRate,
pass: metrics.error_rate >= minErrorRate,
};
}
if (minErrorCount > 0) {
thresholds.error_count_min = {
actual: metrics.error_count,
min: minErrorCount,
pass: metrics.error_count >= minErrorCount,
};
}
if (minOkCount > 0) {
thresholds.ok_count_min = {
actual: metrics.ok_count,
min: minOkCount,
pass: metrics.ok_count >= minOkCount,
};
}
if (minProviderFaultCount > 0) {
const actual = metrics.fake_provider?.fault_count ?? 0;
thresholds.fake_provider_fault_count_min = {
actual,
min: minProviderFaultCount,
pass: actual >= minProviderFaultCount,
};
}
if (firstResponseP95BudgetMs > 0) {
thresholds.first_response_p95_ms = {
actual: metrics.first_response_ms.p95,
max: firstResponseP95BudgetMs,
pass: metrics.ok_count > 0 && metrics.first_response_ms.p95 <= firstResponseP95BudgetMs,
};
}
return thresholds;
}
function looksLikeEnvIssue(error) {
const message = String(error?.message || error || "");
return /fetch failed|ECONNREFUSED|ENOTFOUND|LANGBOT_.*not configured|Could not read recovery_key|Backend did not respond/i.test(message);
}
function safeReason(value) {
return redact(String(value || "")).slice(0, 1000);
}
@@ -0,0 +1,861 @@
#!/usr/bin/env node
import crypto from "node:crypto";
import net from "node:net";
import tls from "node:tls";
import { mkdir, writeFile } from "node:fs/promises";
import { resolve } from "node:path";
import { env, exit } from "node:process";
import {
apiJson,
appendLine,
ensureEvidence,
evidencePaths,
loadEnvFiles,
localIsoWithOffset,
redact,
resetAndAuthLocalUser,
writeResult,
} from "../../../scripts/e2e/lib/langbot-e2e.mjs";
import {
buildProviderTimingMetrics,
summarizeFakeProviderState,
} from "./lib/fake-provider-timing.mjs";
const DEFAULT_LOCAL_PASSWORD = "LangBotE2ELocalPass!2026";
await loadEnvFiles();
const caseId = env.LBS_CASE_ID || "langbot-debug-chat-cross-pipeline-isolation";
const paths = evidencePaths(caseId);
await ensureEvidence(paths);
const startedAt = new Date();
const metricsPath = resolve(paths.evidenceDir, "metrics.json");
const samplesPath = resolve(paths.evidenceDir, "samples.json");
const fakeProviderStatePath = resolve(paths.evidenceDir, "fake-provider-state.json");
const resetDiagnosticPath = resolve(paths.evidenceDir, "debug-chat-reset-diagnostic.json");
const backendUrl = env.LANGBOT_BACKEND_URL || "";
const fakeProviderUrl = env.LANGBOT_FAKE_PROVIDER_URL || "";
const sessionType = env.LANGBOT_DEBUG_CHAT_LOAD_SESSION_TYPE || env.LANGBOT_E2E_DEBUG_CHAT_SESSION_TYPE || "person";
const requestsPerPipeline = positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_REQUESTS, 6);
const concurrency = Math.min(requestsPerPipeline * 2, positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_CONCURRENCY, 4));
const timeoutMs = positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_TIMEOUT_MS, 30_000);
const stream = bool(env.LANGBOT_DEBUG_CHAT_LOAD_STREAM, true);
const resetBeforeRun = bool(env.LANGBOT_DEBUG_CHAT_LOAD_RESET, true);
const responseP95BudgetMs = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_RESPONSE_P95_MS, 5_000);
const maxErrorRate = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_MAX_ERROR_RATE, 0);
const promptTemplate = env.LANGBOT_DEBUG_CHAT_LOAD_PROMPT_TEMPLATE
|| "请只回复 \"{expected}\",不要解释,不要添加其他字符。";
const failureSignals = textList(env.LANGBOT_E2E_FAILURE_SIGNALS || env.LANGBOT_DEBUG_CHAT_LOAD_FAILURE_SIGNALS || "");
const pipelineTargets = [
{
label: "A",
expectedPrefix: "PIPEA",
otherPrefix: "PIPEB",
url: env.LANGBOT_FAKE_PROVIDER_PIPELINE_A_URL || "",
name: env.LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME || "",
},
{
label: "B",
expectedPrefix: "PIPEB",
otherPrefix: "PIPEA",
url: env.LANGBOT_FAKE_PROVIDER_PIPELINE_B_URL || "",
name: env.LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME || "",
},
];
const result = {
source: "automation",
case_id: caseId,
run_id: paths.runId,
status: "fail",
reason: "",
started_at: startedAt.toISOString(),
started_at_local: localIsoWithOffset(startedAt),
finished_at: "",
finished_at_local: "",
duration_ms: 0,
backend_url: backendUrl,
session_type: sessionType,
pipelines: [],
load_profile: {
requests_per_pipeline: requestsPerPipeline,
total_requests: requestsPerPipeline * 2,
concurrency,
timeout_ms: timeoutMs,
stream,
reset_before_run: resetBeforeRun,
},
evidence: {
network_log: paths.networkLog,
metrics_json: metricsPath,
samples_json: samplesPath,
fake_provider_state_json: fakeProviderStatePath,
debug_chat_reset_diagnostic_json: resetDiagnosticPath,
automation_result_json: paths.automationResultJson,
result_json: paths.resultJson,
},
evidence_collected: ["metrics", "network", "api_diagnostic", "filesystem"],
};
try {
if (!backendUrl) {
result.status = "env_issue";
throw new Error("LANGBOT_BACKEND_URL is not configured.");
}
if (!["person", "group"].includes(sessionType)) {
throw new Error(`LANGBOT_DEBUG_CHAT_LOAD_SESSION_TYPE must be person or group, got ${sessionType}.`);
}
for (const target of pipelineTargets) {
if (!target.url && !target.name) {
result.status = "env_issue";
throw new Error(`Set LANGBOT_FAKE_PROVIDER_PIPELINE_${target.label}_URL or LANGBOT_FAKE_PROVIDER_PIPELINE_${target.label}_NAME.`);
}
}
const backendReady = await backendReachable(backendUrl);
if (!backendReady) {
result.status = "env_issue";
throw new Error(`Backend did not respond at ${backendUrl}.`);
}
const user = env.LANGBOT_E2E_LOGIN_USER || "";
const password = env.LANGBOT_E2E_LOGIN_PASSWORD || DEFAULT_LOCAL_PASSWORD;
if (!user) {
result.status = "env_issue";
throw new Error("LANGBOT_E2E_LOGIN_USER is required so this probe can resolve/reset Debug Chat sessions.");
}
const auth = await resetAndAuthLocalUser({ backendUrl, user, password });
const pipelines = [];
for (const target of pipelineTargets) {
const pipeline = await resolvePipeline({
backendUrl,
token: auth.token,
pipelineUrl: target.url,
pipelineName: target.name,
});
pipelines.push({
...target,
id: pipeline.id,
name: pipeline.name || target.name,
wsUrl: websocketUrl(backendUrl, pipeline.id, sessionType),
});
}
result.pipelines = pipelines.map((pipeline) => ({
label: pipeline.label,
id: pipeline.id,
name: pipeline.name,
url: pipeline.url,
}));
if (resetBeforeRun) {
const resetDiagnostics = [];
for (const pipeline of pipelines) {
const reset = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(pipeline.id)}/ws/reset/${encodeURIComponent(sessionType)}`, {
method: "POST",
token: auth.token,
});
resetDiagnostics.push({
pipeline_label: pipeline.label,
pipeline_id: pipeline.id,
status: isApiFailure(reset) ? "fail" : "ready",
http_status: reset.status,
code: reset.json.code ?? null,
reason: isApiFailure(reset) ? reset.json.msg || "Debug Chat reset failed." : "Debug Chat session reset.",
});
}
await writeFile(resetDiagnosticPath, `${JSON.stringify(resetDiagnostics, null, 2)}\n`, "utf8");
const failedReset = resetDiagnostics.find((item) => item.status === "fail");
if (failedReset) throw new Error(failedReset.reason);
}
await resetFakeProvider(fakeProviderUrl);
const jobs = [];
for (let index = 0; index < requestsPerPipeline; index += 1) {
for (const pipeline of pipelines) {
jobs.push({ ...pipeline, index });
}
}
const loadStartedAt = performance.now();
const samples = await runLoad({
jobs,
concurrency,
timeoutMs,
promptTemplate,
stream,
failureSignals,
});
const loadDurationMs = performance.now() - loadStartedAt;
const fakeProviderState = await readFakeProviderState(fakeProviderUrl);
if (fakeProviderState) {
await writeFile(fakeProviderStatePath, `${JSON.stringify(fakeProviderState, null, 2)}\n`, "utf8");
}
const metrics = buildMetrics({
samples,
requestsPerPipeline,
concurrency,
timeoutMs,
loadDurationMs,
backendUrl,
sessionType,
fakeProviderState,
});
const thresholds = buildThresholds(metrics);
const passed = Object.values(thresholds).every((item) => item.pass);
result.status = passed ? "pass" : "fail";
result.reason = passed
? "Debug Chat cross-pipeline isolation probe passed all thresholds."
: "Debug Chat cross-pipeline isolation probe found leaks, errors, or latency threshold breaches.";
result.metrics_summary = {
requests_per_pipeline: metrics.requests_per_pipeline,
total_requests: metrics.total_requests,
concurrency: metrics.concurrency,
ok_count: metrics.ok_count,
error_count: metrics.error_count,
cross_pipeline_leak_count: metrics.cross_pipeline_leak_count,
timeout_count: metrics.timeout_count,
error_rate: metrics.error_rate,
response_p95_ms: metrics.response_duration_ms.p95,
first_response_p95_ms: metrics.first_response_ms.p95,
throughput_rps: metrics.throughput_rps,
status_counts: metrics.status_counts,
by_pipeline: metrics.by_pipeline,
fake_provider_request_count: metrics.fake_provider?.request_count ?? null,
fake_provider_duration_p95_ms: metrics.provider_timing?.provider_duration_ms.p95 ?? null,
langbot_overhead_estimate_p95_ms: metrics.provider_timing?.langbot_overhead_estimate_ms.p95 ?? null,
send_to_provider_start_p95_ms: metrics.provider_timing?.send_to_provider_start_ms.p95 ?? null,
provider_finish_to_ws_final_p95_ms: metrics.provider_timing?.provider_finish_to_ws_final_ms.p95 ?? null,
};
result.thresholds_summary = thresholds;
result.artifacts = {
metrics_json: metricsPath,
samples_json: samplesPath,
fake_provider_state_json: fakeProviderState ? fakeProviderStatePath : "",
network_log: paths.networkLog,
automation_result_json: paths.automationResultJson,
result_json: paths.resultJson,
};
await writeFile(metricsPath, `${JSON.stringify({ ...metrics, thresholds }, null, 2)}\n`, "utf8");
await writeFile(samplesPath, `${JSON.stringify(samples, null, 2)}\n`, "utf8");
} catch (error) {
if (!["env_issue", "blocked"].includes(result.status)) {
result.status = looksLikeEnvIssue(error) ? "env_issue" : "fail";
}
result.reason = result.reason || safeReason(error.message);
} finally {
const finishedAt = new Date();
result.finished_at = finishedAt.toISOString();
result.finished_at_local = localIsoWithOffset(finishedAt);
result.duration_ms = finishedAt.getTime() - startedAt.getTime();
await mkdir(paths.evidenceDir, { recursive: true });
await writeResult(paths, result);
console.log(JSON.stringify(result, null, 2));
}
exit(result.status === "pass" ? 0 : result.status === "env_issue" || result.status === "blocked" ? 2 : 1);
async function backendReachable(baseUrl) {
try {
const response = await fetch(`${baseUrl.replace(/\/$/, "")}/healthz`, {
signal: AbortSignal.timeout(3000),
});
return response.status < 500;
} catch {
return false;
}
}
async function resetFakeProvider(rootUrl) {
if (!rootUrl) return;
try {
await fetch(`${normalizeProviderRootUrl(rootUrl)}/__qa/reset`, {
method: "POST",
signal: AbortSignal.timeout(3000),
});
} catch {
// Missing fake-provider diagnostics should not hide the isolation result.
}
}
async function readFakeProviderState(rootUrl) {
if (!rootUrl) return null;
try {
const response = await fetch(`${normalizeProviderRootUrl(rootUrl)}/__qa/config`, {
signal: AbortSignal.timeout(3000),
});
const json = await response.json().catch(() => ({}));
return {
status: response.ok && json.ok === true ? "loaded" : "unavailable",
url: normalizeProviderRootUrl(rootUrl),
http_status: response.status,
model: json.model || "",
config: json.config || {},
request_count: Number.isFinite(json.request_count) ? json.request_count : null,
recent_requests: Array.isArray(json.recent_requests) ? json.recent_requests : [],
};
} catch (error) {
return {
status: "unavailable",
url: normalizeProviderRootUrl(rootUrl),
reason: safeReason(error.message),
request_count: null,
recent_requests: [],
};
}
}
function normalizeProviderRootUrl(value) {
const trimmed = String(value || "").trim().replace(/\/$/, "");
return trimmed.endsWith("/v1") ? trimmed.slice(0, -3) : trimmed;
}
function pipelineIdFromUrl(url) {
if (!url) return "";
try {
const parsed = new URL(url);
return parsed.searchParams.get("id") || "";
} catch {
return "";
}
}
async function resolvePipeline({ backendUrl, token, pipelineUrl, pipelineName }) {
const idFromUrl = pipelineIdFromUrl(pipelineUrl);
if (idFromUrl) {
const response = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(idFromUrl)}`, { token });
const pipeline = response.json.data?.pipeline;
if (isApiFailure(response) || !pipeline?.uuid) {
throw new Error(response.json.msg || `Could not load pipeline ${idFromUrl}.`);
}
return { id: pipeline.uuid, name: pipeline.name || "" };
}
if (!pipelineName) {
throw new Error("Set pipeline URL or name before running this probe.");
}
const response = await apiJson(backendUrl, "/api/v1/pipelines", { token });
if (isApiFailure(response)) {
throw new Error(response.json.msg || "Failed to list pipelines.");
}
const pipeline = (response.json.data?.pipelines || []).find((item) => item.name === pipelineName);
if (!pipeline?.uuid) {
throw new Error(`Could not find pipeline named ${pipelineName}.`);
}
return { id: pipeline.uuid, name: pipeline.name || pipelineName };
}
function isApiFailure(response) {
return response.status >= 400 || (response.json.code !== undefined && response.json.code !== 0);
}
function websocketUrl(baseUrl, pipelineId, sessionTypeValue) {
const parsed = new URL(baseUrl);
parsed.protocol = parsed.protocol === "https:" ? "wss:" : "ws:";
parsed.pathname = `/api/v1/pipelines/${encodeURIComponent(pipelineId)}/ws/connect`;
parsed.search = `?session_type=${encodeURIComponent(sessionTypeValue)}`;
return parsed.toString();
}
async function runLoad(options) {
const samples = [];
const queue = [...options.jobs];
const workers = Array.from({ length: options.concurrency }, async () => {
while (queue.length > 0) {
const job = queue.shift();
if (!job) continue;
const sample = await runSingleRequest({ ...options, job });
samples.push(sample);
}
});
await Promise.all(workers);
return samples.sort((left, right) => (
left.pipeline_label.localeCompare(right.pipeline_label) || left.index - right.index
));
}
function expectedForIndex(prefix, index) {
return `${prefix}-${String(index + 1).padStart(4, "0")}`;
}
function promptForIndex(template, expected) {
return template.replaceAll("{expected}", expected);
}
function runSingleRequest({
job,
timeoutMs,
promptTemplate,
stream,
failureSignals,
}) {
return new Promise((resolvePromise) => {
const expected = expectedForIndex(job.expectedPrefix, job.index);
const prompt = promptForIndex(promptTemplate, expected);
const sample = {
index: job.index,
pipeline_label: job.label,
pipeline_id: job.id,
pipeline_name: job.name,
status: "running",
ok: false,
expected_text: expected,
expected_prefix: job.expectedPrefix,
other_prefix: job.otherPrefix,
prompt,
response_text: "",
started_at: new Date().toISOString(),
started_epoch_ms: Date.now(),
connected_at: null,
connected_epoch_ms: null,
sent_at: null,
sent_epoch_ms: null,
first_assistant_event_at: null,
first_assistant_event_epoch_ms: null,
first_assistant_event_ms: null,
first_assistant_content_at: null,
first_assistant_content_epoch_ms: null,
first_assistant_content_ms: null,
first_response_at: null,
first_response_epoch_ms: null,
connected_ms: null,
first_response_ms: null,
response_duration_ms: null,
finished_at: null,
finished_epoch_ms: null,
event_count: 0,
same_pipeline_foreign_response_count: 0,
cross_pipeline_leak_count: 0,
last_foreign_response_text: "",
error: "",
close_code: null,
close_reason: "",
};
let closed = false;
let connectedAt = 0;
let sentAt = 0;
const startedPerf = performance.now();
let client = null;
const timer = setTimeout(() => {
finish("timeout", `Timed out after ${timeoutMs} ms.`);
}, timeoutMs);
client = openRawWebSocket(job.wsUrl, {
onOpen() {
connectedAt = performance.now();
const now = Date.now();
sample.connected_at = new Date(now).toISOString();
sample.connected_epoch_ms = now;
sample.connected_ms = rounded(connectedAt - startedPerf);
},
onMessage(text) {
sample.event_count += 1;
let data;
try {
data = JSON.parse(String(text || ""));
} catch (error) {
finish("error", `Invalid WebSocket JSON: ${error.message}`);
return;
}
appendLine(paths.networkLog, JSON.stringify({
pipeline_label: job.label,
request_index: job.index,
type: data.type,
session_type: data.session_type || "",
role: data.data?.role || "",
is_final: data.data?.is_final ?? null,
content_preview: redact(String(data.data?.content || data.message || "").slice(0, 200)),
})).catch(() => {});
if (data.type === "connected") {
sentAt = performance.now();
const now = Date.now();
sample.sent_at = new Date(now).toISOString();
sample.sent_epoch_ms = now;
client.send(JSON.stringify({
type: "message",
message: [{ type: "Plain", text: prompt }],
stream,
}));
return;
}
if (data.type === "error") {
finish("error", data.message || "WebSocket error message.");
return;
}
if (data.type !== "response" || data.data?.role !== "assistant") return;
const content = String(data.data.content || "");
markFirstAssistantEvent(sample, sentAt);
if (content) sample.response_text = content;
if (content) markFirstAssistantContent(sample, sentAt);
if (containsPipelineToken(content, job.otherPrefix)) {
sample.cross_pipeline_leak_count += 1;
finish("cross_pipeline_leak", `Pipeline ${job.label} received response from ${job.otherPrefix}: ${content}`);
return;
}
if (content.includes(expected) && sample.first_response_ms === null && sentAt > 0) {
const now = Date.now();
sample.first_response_at = new Date(now).toISOString();
sample.first_response_epoch_ms = now;
sample.first_response_ms = rounded(performance.now() - sentAt);
}
if (data.data.is_final === true) {
const ok = sample.response_text.includes(expected);
if (ok) {
if (sample.first_response_ms === null && sentAt > 0) {
const now = Date.now();
sample.first_response_at = new Date(now).toISOString();
sample.first_response_epoch_ms = now;
sample.first_response_ms = rounded(performance.now() - sentAt);
}
finish("pass", "");
} else if (matchesFailureSignal(sample.response_text, failureSignals)) {
finish("app_error", `Assistant final response matched a failure signal: ${sample.response_text}`);
} else if (containsPipelineToken(sample.response_text, job.expectedPrefix)) {
sample.same_pipeline_foreign_response_count += 1;
sample.last_foreign_response_text = sample.response_text;
} else {
finish("mismatch", `Final assistant response did not include ${expected}: ${sample.response_text}`);
}
}
},
onError(error) {
finish("connection_error", `WebSocket connection error: ${error.message}`);
},
onClose(event) {
sample.close_code = event.code;
sample.close_reason = event.reason || "";
if (!closed) finish("closed", `WebSocket closed before final assistant response: ${event.code}`);
},
});
function finish(status, reason) {
if (closed) return;
closed = true;
clearTimeout(timer);
sample.status = status;
sample.ok = status === "pass";
sample.error = status === "timeout" && sample.same_pipeline_foreign_response_count > 0
? `${reason || ""} Saw ${sample.same_pipeline_foreign_response_count} same-pipeline foreign assistant response(s); last=${sample.last_foreign_response_text}`
: reason || "";
if (sentAt > 0) sample.response_duration_ms = rounded(performance.now() - sentAt);
else sample.response_duration_ms = rounded(performance.now() - startedPerf);
const now = Date.now();
sample.finished_at = new Date(now).toISOString();
sample.finished_epoch_ms = now;
try {
client?.close();
} catch {
// Closing a failed socket should not hide the sample result.
}
resolvePromise(sample);
}
});
}
function markFirstAssistantEvent(sample, sentAt) {
if (sample.first_assistant_event_ms !== null || sentAt <= 0) return;
const now = Date.now();
sample.first_assistant_event_at = new Date(now).toISOString();
sample.first_assistant_event_epoch_ms = now;
sample.first_assistant_event_ms = rounded(performance.now() - sentAt);
}
function markFirstAssistantContent(sample, sentAt) {
if (sample.first_assistant_content_ms !== null || sentAt <= 0) return;
const now = Date.now();
sample.first_assistant_content_at = new Date(now).toISOString();
sample.first_assistant_content_epoch_ms = now;
sample.first_assistant_content_ms = rounded(performance.now() - sentAt);
}
function containsPipelineToken(text, prefix) {
const escaped = String(prefix).replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
return new RegExp(`${escaped}-\\d{4}`).test(String(text || ""));
}
function matchesFailureSignal(text, signals) {
const lower = String(text || "").toLowerCase();
return signals.some((signal) => lower.includes(signal.toLowerCase()));
}
function openRawWebSocket(wsUrl, handlers) {
const parsed = new URL(wsUrl);
const secure = parsed.protocol === "wss:";
const port = Number(parsed.port || (secure ? 443 : 80));
const host = parsed.hostname;
const path = `${parsed.pathname}${parsed.search}`;
const key = crypto.randomBytes(16).toString("base64");
const socket = secure
? tls.connect({ host, port, servername: host })
: net.connect({ host, port });
let opened = false;
let closed = false;
let buffer = Buffer.alloc(0);
socket.setNoDelay(true);
socket.on("connect", () => {
const originProtocol = secure ? "https" : "http";
const request = [
`GET ${path} HTTP/1.1`,
`Host: ${parsed.host}`,
"Upgrade: websocket",
"Connection: Upgrade",
`Sec-WebSocket-Key: ${key}`,
"Sec-WebSocket-Version: 13",
`Origin: ${originProtocol}://${parsed.host}`,
"",
"",
].join("\r\n");
socket.write(request);
});
socket.on("data", (chunk) => {
buffer = Buffer.concat([buffer, chunk]);
if (!opened) {
const headerEnd = buffer.indexOf("\r\n\r\n");
if (headerEnd === -1) return;
const headerText = buffer.slice(0, headerEnd).toString("utf8");
buffer = buffer.slice(headerEnd + 4);
if (!/^HTTP\/1\.1 101\b/i.test(headerText)) {
handlers.onError(new Error(`Handshake failed: ${headerText.split("\r\n")[0] || "missing status"}`));
socket.destroy();
return;
}
opened = true;
handlers.onOpen();
}
processFrames();
});
socket.on("error", (error) => {
if (!closed) handlers.onError(error);
});
socket.on("close", () => {
if (closed) return;
closed = true;
handlers.onClose({ code: null, reason: "" });
});
function processFrames() {
while (true) {
const frame = readFrame(buffer);
if (!frame) return;
buffer = buffer.slice(frame.consumed);
if (frame.opcode === 0x1) {
handlers.onMessage(frame.payload.toString("utf8"));
} else if (frame.opcode === 0x8) {
const code = frame.payload.length >= 2 ? frame.payload.readUInt16BE(0) : null;
const reason = frame.payload.length > 2 ? frame.payload.slice(2).toString("utf8") : "";
closed = true;
handlers.onClose({ code, reason });
socket.end();
return;
} else if (frame.opcode === 0x9) {
writeFrame(socket, 0xA, frame.payload);
}
}
}
return {
send(text) {
if (closed || !opened) return;
writeFrame(socket, 0x1, Buffer.from(text, "utf8"));
},
close() {
if (closed) return;
closed = true;
if (!socket.destroyed) {
if (opened) writeFrame(socket, 0x8, Buffer.alloc(0));
setTimeout(() => socket.end(), 50).unref();
}
},
};
}
function readFrame(buffer) {
if (buffer.length < 2) return null;
const first = buffer[0];
const second = buffer[1];
const opcode = first & 0x0f;
const masked = Boolean(second & 0x80);
let length = second & 0x7f;
let offset = 2;
if (length === 126) {
if (buffer.length < offset + 2) return null;
length = buffer.readUInt16BE(offset);
offset += 2;
} else if (length === 127) {
if (buffer.length < offset + 8) return null;
const high = buffer.readUInt32BE(offset);
const low = buffer.readUInt32BE(offset + 4);
length = high * 2 ** 32 + low;
offset += 8;
}
let mask = null;
if (masked) {
if (buffer.length < offset + 4) return null;
mask = buffer.slice(offset, offset + 4);
offset += 4;
}
if (buffer.length < offset + length) return null;
let payload = buffer.slice(offset, offset + length);
if (mask) {
payload = Buffer.from(payload);
for (let index = 0; index < payload.length; index += 1) {
payload[index] ^= mask[index % 4];
}
}
return {
opcode,
payload,
consumed: offset + length,
};
}
function writeFrame(socket, opcode, payload) {
const body = Buffer.isBuffer(payload) ? payload : Buffer.from(payload || "");
const mask = crypto.randomBytes(4);
const headerLength = body.length < 126 ? 2 : body.length <= 0xffff ? 4 : 10;
const header = Buffer.alloc(headerLength);
header[0] = 0x80 | opcode;
if (body.length < 126) {
header[1] = 0x80 | body.length;
} else if (body.length <= 0xffff) {
header[1] = 0x80 | 126;
header.writeUInt16BE(body.length, 2);
} else {
header[1] = 0x80 | 127;
header.writeUInt32BE(Math.floor(body.length / 2 ** 32), 2);
header.writeUInt32BE(body.length >>> 0, 6);
}
const masked = Buffer.from(body);
for (let index = 0; index < masked.length; index += 1) {
masked[index] ^= mask[index % 4];
}
socket.write(Buffer.concat([header, mask, masked]));
}
function buildMetrics({ samples, requestsPerPipeline, concurrency, timeoutMs, loadDurationMs, backendUrl, sessionType, fakeProviderState }) {
const okSamples = samples.filter((sample) => sample.ok);
const statusCounts = {};
const byPipeline = {};
for (const sample of samples) {
statusCounts[sample.status] = (statusCounts[sample.status] || 0) + 1;
if (!byPipeline[sample.pipeline_label]) {
byPipeline[sample.pipeline_label] = {
ok_count: 0,
error_count: 0,
cross_pipeline_leak_count: 0,
timeout_count: 0,
};
}
if (sample.ok) byPipeline[sample.pipeline_label].ok_count += 1;
else byPipeline[sample.pipeline_label].error_count += 1;
byPipeline[sample.pipeline_label].cross_pipeline_leak_count += sample.cross_pipeline_leak_count || 0;
if (sample.status === "timeout") byPipeline[sample.pipeline_label].timeout_count += 1;
}
const errorCount = samples.length - okSamples.length;
return {
probe: caseId,
backend_url: backendUrl,
session_type: sessionType,
requests_per_pipeline: requestsPerPipeline,
total_requests: requestsPerPipeline * 2,
completed_requests: samples.length,
concurrency,
timeout_ms: timeoutMs,
ok_count: okSamples.length,
error_count: errorCount,
timeout_count: samples.filter((sample) => sample.status === "timeout").length,
cross_pipeline_leak_count: samples.reduce((count, sample) => count + (sample.cross_pipeline_leak_count || 0), 0),
error_rate: samples.length === 0 ? 1 : rounded(errorCount / samples.length),
load_duration_ms: rounded(loadDurationMs),
throughput_rps: loadDurationMs <= 0 ? 0 : rounded(okSamples.length / (loadDurationMs / 1000)),
status_counts: statusCounts,
by_pipeline: byPipeline,
connected_ms: stats(samples.map((sample) => sample.connected_ms).filter(Number.isFinite)),
first_assistant_event_ms: stats(samples.map((sample) => sample.first_assistant_event_ms).filter(Number.isFinite)),
first_assistant_content_ms: stats(samples.map((sample) => sample.first_assistant_content_ms).filter(Number.isFinite)),
first_response_ms: stats(okSamples.map((sample) => sample.first_response_ms).filter(Number.isFinite)),
response_duration_ms: stats(okSamples.map((sample) => sample.response_duration_ms).filter(Number.isFinite)),
fake_provider: summarizeFakeProviderState(fakeProviderState),
provider_timing: buildProviderTimingMetrics(samples, fakeProviderState),
samples,
};
}
function buildThresholds(metrics) {
return {
cross_pipeline_leak_count: {
actual: metrics.cross_pipeline_leak_count,
max: 0,
pass: metrics.cross_pipeline_leak_count === 0,
},
error_rate: {
actual: metrics.error_rate,
max: maxErrorRate,
pass: metrics.error_rate <= maxErrorRate,
},
response_p95_ms: {
actual: metrics.response_duration_ms.p95,
max: responseP95BudgetMs,
pass: metrics.ok_count > 0 && metrics.response_duration_ms.p95 <= responseP95BudgetMs,
},
};
}
function positiveInteger(value, fallback) {
const parsed = Number.parseInt(String(value || ""), 10);
return Number.isInteger(parsed) && parsed > 0 ? parsed : fallback;
}
function positiveNumber(value, fallback) {
const parsed = Number(value || "");
return Number.isFinite(parsed) && parsed >= 0 ? parsed : fallback;
}
function bool(value, fallback) {
if (value === undefined || value === "") return fallback;
if (/^(1|true|yes|on)$/i.test(String(value))) return true;
if (/^(0|false|no|off)$/i.test(String(value))) return false;
return fallback;
}
function textList(value) {
return String(value || "")
.split(/\r?\n|,/)
.map((item) => item.trim())
.filter(Boolean);
}
function rounded(value) {
return Number(value.toFixed(3));
}
function percentile(values, percentileValue) {
if (values.length === 0) return 0;
const sorted = [...values].sort((a, b) => a - b);
const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
return rounded(sorted[index]);
}
function stats(values) {
if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
return {
min: rounded(Math.min(...values)),
p50: percentile(values, 50),
p95: percentile(values, 95),
p99: percentile(values, 99),
max: rounded(Math.max(...values)),
};
}
function looksLikeEnvIssue(error) {
const message = String(error?.message || error || "");
return /fetch failed|ECONNREFUSED|ENOTFOUND|LANGBOT_.*not configured|Could not read recovery_key|Backend did not respond/i.test(message);
}
function safeReason(value) {
return redact(String(value || "")).slice(0, 1000);
}
@@ -0,0 +1,159 @@
#!/usr/bin/env node
import { mkdir, writeFile } from "node:fs/promises";
import { join, resolve } from "node:path";
import { env, exit } from "node:process";
function pad(value, size = 2) {
return String(value).padStart(size, "0");
}
function localIsoWithOffset(date = new Date()) {
const offsetMinutes = -date.getTimezoneOffset();
const sign = offsetMinutes >= 0 ? "+" : "-";
const absolute = Math.abs(offsetMinutes);
return [
`${date.getFullYear()}-${pad(date.getMonth() + 1)}-${pad(date.getDate())}`,
`T${pad(date.getHours())}:${pad(date.getMinutes())}:${pad(date.getSeconds())}.${pad(date.getMilliseconds(), 3)}`,
`${sign}${pad(Math.floor(absolute / 60))}:${pad(absolute % 60)}`,
].join("");
}
function timestampSlug(date = new Date()) {
return date.toISOString().replace(/\.\d{3}Z$/, "Z").replace(/[^0-9A-Za-z]+/g, "-").replace(/^-|-$/g, "");
}
const scenarios = [
{
id: "provider-timeout",
target: "provider",
injected_fault: "fake provider request exceeds the configured timeout",
expected_status: "env_issue",
recovery_check: "provider route is reachable or the case remains outside product pass/fail",
cleanup: "stop fake provider or reset proxy route",
},
{
id: "plugin-runtime-disconnect",
target: "plugin-runtime",
injected_fault: "runtime control channel disconnects during an action",
expected_status: "fail",
recovery_check: "runtime reconnects and a deterministic plugin action succeeds",
cleanup: "restart the local plugin runtime process",
},
{
id: "mcp-stdio-server-exit",
target: "mcp",
injected_fault: "stdio server exits mid-call",
expected_status: "fail",
recovery_check: "server can be registered again and exposes the expected tool",
cleanup: "remove temporary MCP server registration",
},
{
id: "operator-missing-login",
target: "webui",
injected_fault: "browser profile is not authenticated",
expected_status: "blocked",
recovery_check: "authenticated profile can open the same WebUI origin",
cleanup: "no product cleanup; refresh local login state",
},
{
id: "transient-marketplace-timeout",
target: "marketplace",
injected_fault: "marketplace request times out once and then succeeds",
expected_status: "flaky",
recovery_check: "rerun passes with the same product revision and no code change",
cleanup: "clear retry-only evidence and keep the run classified as flaky",
},
];
function validateScenario(scenario) {
const missing = ["id", "target", "injected_fault", "expected_status", "recovery_check", "cleanup"]
.filter((key) => !scenario[key]);
const allowedStatuses = new Set(["pass", "fail", "blocked", "env_issue", "flaky"]);
return {
id: scenario.id,
pass: missing.length === 0 && allowedStatuses.has(scenario.expected_status),
missing,
expected_status: scenario.expected_status,
};
}
async function main() {
const root = resolve(env.LBS_ROOT || process.cwd());
const caseId = "langbot-fault-taxonomy-contract";
const runId = env.LBS_RUN_ID || `${timestampSlug()}-${caseId}`;
const evidenceDir = resolve(env.LBS_EVIDENCE_DIR || join(root, "reports", "evidence", runId));
await mkdir(evidenceDir, { recursive: true });
const startedAt = new Date();
const validations = scenarios.map(validateScenario);
const statusCounts = {};
for (const scenario of scenarios) {
statusCounts[scenario.expected_status] = (statusCounts[scenario.expected_status] || 0) + 1;
}
const metrics = {
probe: caseId,
scenario_count: scenarios.length,
status_counts: statusCounts,
scenarios,
validations,
};
const thresholds = {
scenario_count: { actual: scenarios.length, min: 5, pass: scenarios.length >= 5 },
invalid_scenario_count: {
actual: validations.filter((item) => !item.pass).length,
max: 0,
pass: validations.every((item) => item.pass),
},
cleanup_declared_count: {
actual: scenarios.filter((item) => item.cleanup).length,
min: scenarios.length,
pass: scenarios.every((item) => item.cleanup),
},
};
const status = Object.values(thresholds).every((item) => item.pass) ? "pass" : "fail";
const metricsPath = join(evidenceDir, "metrics.json");
const faultModelPath = join(evidenceDir, "fault-model.json");
const automationResultPath = join(evidenceDir, "automation-result.json");
const resultPath = join(evidenceDir, "result.json");
await writeFile(metricsPath, `${JSON.stringify(metrics, null, 2)}\n`, "utf8");
await writeFile(faultModelPath, `${JSON.stringify({ scenarios }, null, 2)}\n`, "utf8");
const finishedAt = new Date();
const result = {
source: "automation",
case_id: caseId,
run_id: runId,
status,
reason: status === "pass"
? "Fault taxonomy contract declares status, recovery, and cleanup for every scenario."
: "Fault taxonomy contract is missing required scenario fields.",
started_at: startedAt.toISOString(),
started_at_local: localIsoWithOffset(startedAt),
finished_at: finishedAt.toISOString(),
finished_at_local: localIsoWithOffset(finishedAt),
duration_ms: finishedAt.getTime() - startedAt.getTime(),
metrics_summary: {
scenario_count: metrics.scenario_count,
status_counts: metrics.status_counts,
invalid_scenario_count: thresholds.invalid_scenario_count.actual,
},
thresholds_summary: thresholds,
artifacts: {
metrics_json: metricsPath,
fault_model_json: faultModelPath,
automation_result_json: automationResultPath,
result_json: resultPath,
},
evidence_collected: ["metrics", "filesystem"],
};
const resultText = `${JSON.stringify(result, null, 2)}\n`;
await writeFile(automationResultPath, resultText, "utf8");
await writeFile(resultPath, resultText, "utf8");
console.log(JSON.stringify(result, null, 2));
exit(status === "pass" ? 0 : 1);
}
await main();
@@ -0,0 +1,212 @@
#!/usr/bin/env node
import { mkdir, writeFile } from "node:fs/promises";
import { join, resolve } from "node:path";
import { env, exit } from "node:process";
function pad(value, size = 2) {
return String(value).padStart(size, "0");
}
function localIsoWithOffset(date = new Date()) {
const offsetMinutes = -date.getTimezoneOffset();
const sign = offsetMinutes >= 0 ? "+" : "-";
const absolute = Math.abs(offsetMinutes);
return [
`${date.getFullYear()}-${pad(date.getMonth() + 1)}-${pad(date.getDate())}`,
`T${pad(date.getHours())}:${pad(date.getMinutes())}:${pad(date.getSeconds())}.${pad(date.getMilliseconds(), 3)}`,
`${sign}${pad(Math.floor(absolute / 60))}:${pad(absolute % 60)}`,
].join("");
}
function timestampSlug(date = new Date()) {
return date.toISOString().replace(/\.\d{3}Z$/, "Z").replace(/[^0-9A-Za-z]+/g, "-").replace(/^-|-$/g, "");
}
function percentile(values, percentileValue) {
if (values.length === 0) return 0;
const sorted = [...values].sort((a, b) => a - b);
const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
return Number(sorted[index].toFixed(3));
}
function stats(values) {
if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
return {
min: Number(Math.min(...values).toFixed(3)),
p50: percentile(values, 50),
p95: percentile(values, 95),
p99: percentile(values, 99),
max: Number(Math.max(...values).toFixed(3)),
};
}
function parseJsonList(value, fallback) {
if (!value) return fallback;
try {
const parsed = JSON.parse(value);
return Array.isArray(parsed) && parsed.every((item) => typeof item === "string") ? parsed : fallback;
} catch {
return fallback;
}
}
function joinUrl(baseUrl, path) {
const base = baseUrl.replace(/\/+$/, "");
const suffix = path.startsWith("/") ? path : `/${path}`;
return `${base}${suffix}`;
}
async function fetchOnce(url, timeoutMs) {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), timeoutMs);
const started = performance.now();
try {
const response = await fetch(url, { method: "GET", signal: controller.signal });
await response.arrayBuffer();
const latencyMs = performance.now() - started;
return {
url,
ok: response.status < 500,
status: response.status,
latency_ms: Number(latencyMs.toFixed(3)),
error: "",
};
} catch (error) {
const latencyMs = performance.now() - started;
return {
url,
ok: false,
status: 0,
latency_ms: Number(latencyMs.toFixed(3)),
error: error instanceof Error ? error.message : String(error),
};
} finally {
clearTimeout(timeout);
}
}
async function runBatches(urls, totalRequests, concurrency, timeoutMs) {
const queue = Array.from({ length: totalRequests }, (_, index) => urls[index % urls.length]);
const results = [];
while (queue.length > 0) {
const batch = queue.splice(0, concurrency);
results.push(...await Promise.all(batch.map((url) => fetchOnce(url, timeoutMs))));
}
return results;
}
async function main() {
const root = resolve(env.LBS_ROOT || process.cwd());
const caseId = "langbot-live-backend-latency";
const runId = env.LBS_RUN_ID || `${timestampSlug()}-${caseId}`;
const evidenceDir = resolve(env.LBS_EVIDENCE_DIR || join(root, "reports", "evidence", runId));
await mkdir(evidenceDir, { recursive: true });
const startedAt = new Date();
const backendUrl = env.LANGBOT_BACKEND_URL || "";
const endpoints = parseJsonList(env.LANGBOT_PERF_ENDPOINTS_JSON, ["/healthz"]);
const totalRequests = Number(env.LANGBOT_PERF_REQUESTS || "12");
const concurrency = Number(env.LANGBOT_PERF_CONCURRENCY || "2");
const timeoutMs = Number(env.LANGBOT_PERF_TIMEOUT_MS || "5000");
const p95BudgetMs = Number(env.LANGBOT_PERF_BACKEND_P95_MS || "1000");
const maxErrorRate = Number(env.LANGBOT_PERF_MAX_ERROR_RATE || "0");
const metricsPath = join(evidenceDir, "metrics.json");
const networkLogPath = join(evidenceDir, "network.log");
const automationResultPath = join(evidenceDir, "automation-result.json");
const resultPath = join(evidenceDir, "result.json");
let status = "fail";
let reason = "";
let results = [];
if (!backendUrl) {
status = "env_issue";
reason = "LANGBOT_BACKEND_URL is not configured.";
} else {
const urls = endpoints.map((path) => joinUrl(backendUrl, path));
results = await runBatches(urls, totalRequests, concurrency, timeoutMs);
const okCount = results.filter((item) => item.ok).length;
const errorCount = results.length - okCount;
const errorRate = results.length === 0 ? 1 : errorCount / results.length;
const latencies = results.filter((item) => item.ok).map((item) => item.latency_ms);
const latencyStats = stats(latencies);
const allConnectionFailures = results.length > 0 && results.every((item) => item.status === 0);
if (allConnectionFailures) {
status = "env_issue";
reason = `Backend did not respond at ${backendUrl}.`;
} else if (latencyStats.p95 <= p95BudgetMs && errorRate <= maxErrorRate) {
status = "pass";
reason = "Live backend latency probe passed all thresholds.";
} else {
status = "fail";
reason = "Live backend latency probe breached latency or error-rate thresholds.";
}
}
const statusCounts = {};
for (const item of results) {
const key = item.status === 0 ? "network_error" : String(item.status);
statusCounts[key] = (statusCounts[key] || 0) + 1;
}
const okResults = results.filter((item) => item.ok);
const metrics = {
probe: caseId,
backend_url: backendUrl,
endpoints,
total_requests: totalRequests,
concurrency,
timeout_ms: timeoutMs,
ok_count: okResults.length,
error_count: results.length - okResults.length,
error_rate: results.length === 0 ? 1 : Number(((results.length - okResults.length) / results.length).toFixed(4)),
latency_ms: stats(okResults.map((item) => item.latency_ms)),
status_counts: statusCounts,
};
const thresholds = {
backend_p95_ms: { actual: metrics.latency_ms.p95, max: p95BudgetMs, pass: metrics.latency_ms.p95 <= p95BudgetMs },
error_rate: { actual: metrics.error_rate, max: maxErrorRate, pass: metrics.error_rate <= maxErrorRate },
};
await writeFile(metricsPath, `${JSON.stringify({ ...metrics, samples: results }, null, 2)}\n`, "utf8");
await writeFile(networkLogPath, results.map((item) => JSON.stringify(item)).join("\n") + (results.length > 0 ? "\n" : ""), "utf8");
const finishedAt = new Date();
const result = {
source: "automation",
case_id: caseId,
run_id: runId,
status,
reason,
started_at: startedAt.toISOString(),
started_at_local: localIsoWithOffset(startedAt),
finished_at: finishedAt.toISOString(),
finished_at_local: localIsoWithOffset(finishedAt),
duration_ms: finishedAt.getTime() - startedAt.getTime(),
url: backendUrl,
metrics_summary: {
requests: metrics.total_requests,
concurrency: metrics.concurrency,
ok_count: metrics.ok_count,
error_rate: metrics.error_rate,
latency_p50_ms: metrics.latency_ms.p50,
latency_p95_ms: metrics.latency_ms.p95,
status_counts: metrics.status_counts,
},
thresholds_summary: thresholds,
artifacts: {
metrics_json: metricsPath,
network_log: networkLogPath,
automation_result_json: automationResultPath,
result_json: resultPath,
},
evidence_collected: ["metrics", "network", "api_diagnostic", "filesystem"],
};
const resultText = `${JSON.stringify(result, null, 2)}\n`;
await writeFile(automationResultPath, resultText, "utf8");
await writeFile(resultPath, resultText, "utf8");
console.log(JSON.stringify(result, null, 2));
exit(status === "pass" ? 0 : status === "env_issue" ? 2 : 1);
}
await main();
@@ -0,0 +1,205 @@
#!/usr/bin/env node
import { existsSync, readdirSync, statSync } from "node:fs";
import { mkdir, readFile, writeFile } from "node:fs/promises";
import { join, resolve } from "node:path";
import { env, exit } from "node:process";
function pad(value, size = 2) {
return String(value).padStart(size, "0");
}
function localIsoWithOffset(date = new Date()) {
const offsetMinutes = -date.getTimezoneOffset();
const sign = offsetMinutes >= 0 ? "+" : "-";
const absolute = Math.abs(offsetMinutes);
return [
`${date.getFullYear()}-${pad(date.getMonth() + 1)}-${pad(date.getDate())}`,
`T${pad(date.getHours())}:${pad(date.getMinutes())}:${pad(date.getSeconds())}.${pad(date.getMilliseconds(), 3)}`,
`${sign}${pad(Math.floor(absolute / 60))}:${pad(absolute % 60)}`,
].join("");
}
function timestampSlug(date = new Date()) {
return date.toISOString().replace(/\.\d{3}Z$/, "Z").replace(/[^0-9A-Za-z]+/g, "-").replace(/^-|-$/g, "");
}
function repoRootFromEnv(root) {
return env.LANGBOT_REPO ? resolve(env.LANGBOT_REPO) : resolve(root, "..");
}
function latestBackendLog(root) {
const explicit = env.LANGBOT_BACKEND_LOG;
if (explicit) return resolve(explicit);
const logsDir = join(repoRootFromEnv(root), "data", "logs");
if (!existsSync(logsDir)) return "";
const candidates = readdirSync(logsDir)
.filter((name) => /^langbot-.*\.log$/.test(name))
.map((name) => join(logsDir, name))
.filter((path) => {
try {
return statSync(path).isFile();
} catch {
return false;
}
})
.sort((left, right) => statSync(right).mtimeMs - statSync(left).mtimeMs);
return candidates[0] || "";
}
function parseSince(startedAt) {
if (env.LANGBOT_BACKEND_LOG_SINCE) return new Date(env.LANGBOT_BACKEND_LOG_SINCE);
const lookbackSeconds = Number(env.LANGBOT_BACKEND_LOG_LOOKBACK_SECONDS || "300");
return new Date(startedAt.getTime() - lookbackSeconds * 1000);
}
function parseTimestamp(line, year) {
const localMatch = line.match(/^\[(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})\.(\d{3})\]/);
if (localMatch) {
const [, month, day, hour, minute, second, millisecond] = localMatch;
return new Date(`${year}-${month}-${day}T${hour}:${minute}:${second}.${millisecond}+08:00`);
}
const accessMatch = line.match(/^\[(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2}) ([+-]\d{4})\]/);
if (accessMatch) {
const [, fullYear, month, day, hour, minute, second, offset] = accessMatch;
const normalizedOffset = `${offset.slice(0, 3)}:${offset.slice(3)}`;
return new Date(`${fullYear}-${month}-${day}T${hour}:${minute}:${second}${normalizedOffset}`);
}
return null;
}
function findingForLine(line, number) {
const rules = [
{ severity: "fail", kind: "python_traceback", pattern: /\bTraceback(?: \(most recent call last\))?/i },
{ severity: "fail", kind: "unretrieved_task_exception", pattern: /Task exception was never retrieved/i },
{ severity: "fail", kind: "unawaited_coroutine", pattern: /RuntimeWarning:\s+coroutine .* was never awaited/i },
{ severity: "fail", kind: "unclosed_client_session", pattern: /Unclosed client session/i },
{ severity: "fail", kind: "unclosed_connector", pattern: /Unclosed connector/i },
{ severity: "fail", kind: "import_error", pattern: /\bImportError\b/i },
{ severity: "fail", kind: "error_log", pattern: /\b(?:ERROR|CRITICAL)\b/ },
{ severity: "warning", kind: "warning_log", pattern: /\bWARNING\b/ },
];
for (const rule of rules) {
if (rule.pattern.test(line)) {
return {
severity: rule.severity,
kind: rule.kind,
line: number,
excerpt: line,
};
}
}
return null;
}
function scanLines(text, since, year) {
const findings = [];
const scanned = [];
let includeContinuation = false;
const lines = text.split(/\r?\n/);
for (const [index, line] of lines.entries()) {
const number = index + 1;
const timestamp = parseTimestamp(line, year);
if (timestamp) includeContinuation = timestamp >= since;
if (!includeContinuation) continue;
scanned.push({ number, text: line });
const finding = findingForLine(line, number);
if (finding) findings.push(finding);
}
return { findings, scanned, total_lines: lines.length };
}
async function main() {
const root = resolve(env.LBS_ROOT || process.cwd());
const caseId = "langbot-live-backend-log-health";
const runId = env.LBS_RUN_ID || `${timestampSlug()}-${caseId}`;
const evidenceDir = resolve(env.LBS_EVIDENCE_DIR || join(root, "reports", "evidence", runId));
await mkdir(evidenceDir, { recursive: true });
const startedAt = new Date();
const since = parseSince(startedAt);
const logPath = latestBackendLog(root);
const metricsPath = join(evidenceDir, "metrics.json");
const findingsPath = join(evidenceDir, "findings.json");
const scannedLogPath = join(evidenceDir, "scanned-backend.log");
const automationResultPath = join(evidenceDir, "automation-result.json");
const resultPath = join(evidenceDir, "result.json");
let status = "fail";
let reason = "";
let scan = { findings: [], scanned: [], total_lines: 0 };
if (!logPath || !existsSync(logPath)) {
status = "env_issue";
reason = "No LangBot backend log file was found. Set LANGBOT_BACKEND_LOG or LANGBOT_REPO.";
} else {
const text = await readFile(logPath, "utf8");
scan = scanLines(text, since, startedAt.getFullYear());
const failCount = scan.findings.filter((item) => item.severity === "fail").length;
status = failCount === 0 ? "pass" : "fail";
reason = status === "pass"
? "Live backend log health passed; no fail-severity findings in the scanned window."
: "Live backend log health found fail-severity backend log findings.";
}
const warningCount = scan.findings.filter((item) => item.severity === "warning").length;
const failCount = scan.findings.filter((item) => item.severity === "fail").length;
const metrics = {
probe: caseId,
backend_log: logPath,
since: since.toISOString(),
scanned_line_count: scan.scanned.length,
total_line_count: scan.total_lines,
fail_count: failCount,
warning_count: warningCount,
finding_count: scan.findings.length,
};
const thresholds = {
fail_count: { actual: failCount, max: 0, pass: failCount === 0 },
};
await writeFile(metricsPath, `${JSON.stringify(metrics, null, 2)}\n`, "utf8");
await writeFile(findingsPath, `${JSON.stringify(scan.findings, null, 2)}\n`, "utf8");
await writeFile(scannedLogPath, scan.scanned.map((item) => `${item.number}: ${item.text}`).join("\n") + (scan.scanned.length > 0 ? "\n" : ""), "utf8");
const finishedAt = new Date();
const result = {
source: "automation",
case_id: caseId,
run_id: runId,
status,
reason,
started_at: startedAt.toISOString(),
started_at_local: localIsoWithOffset(startedAt),
finished_at: finishedAt.toISOString(),
finished_at_local: localIsoWithOffset(finishedAt),
duration_ms: finishedAt.getTime() - startedAt.getTime(),
url: logPath,
metrics_summary: {
scanned_line_count: metrics.scanned_line_count,
fail_count: metrics.fail_count,
warning_count: metrics.warning_count,
finding_count: metrics.finding_count,
},
thresholds_summary: thresholds,
artifacts: {
metrics_json: metricsPath,
findings_json: findingsPath,
scanned_backend_log: scannedLogPath,
automation_result_json: automationResultPath,
result_json: resultPath,
},
evidence_collected: ["metrics", "backend_log", "filesystem"],
};
const resultText = `${JSON.stringify(result, null, 2)}\n`;
await writeFile(automationResultPath, resultText, "utf8");
await writeFile(resultPath, resultText, "utf8");
console.log(JSON.stringify(result, null, 2));
exit(status === "pass" ? 0 : status === "env_issue" ? 2 : 1);
}
await main();
@@ -0,0 +1,311 @@
#!/usr/bin/env node
import { mkdir, writeFile } from "node:fs/promises";
import { join, resolve } from "node:path";
import { env, exit } from "node:process";
function pad(value, size = 2) {
return String(value).padStart(size, "0");
}
function localIsoWithOffset(date = new Date()) {
const offsetMinutes = -date.getTimezoneOffset();
const sign = offsetMinutes >= 0 ? "+" : "-";
const absolute = Math.abs(offsetMinutes);
return [
`${date.getFullYear()}-${pad(date.getMonth() + 1)}-${pad(date.getDate())}`,
`T${pad(date.getHours())}:${pad(date.getMinutes())}:${pad(date.getSeconds())}.${pad(date.getMilliseconds(), 3)}`,
`${sign}${pad(Math.floor(absolute / 60))}:${pad(absolute % 60)}`,
].join("");
}
function timestampSlug(date = new Date()) {
return date.toISOString().replace(/\.\d{3}Z$/, "Z").replace(/[^0-9A-Za-z]+/g, "-").replace(/^-|-$/g, "");
}
function percentile(values, percentileValue) {
if (values.length === 0) return 0;
const sorted = [...values].sort((a, b) => a - b);
const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
return Number(sorted[index].toFixed(3));
}
function stats(values) {
if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
return {
min: Number(Math.min(...values).toFixed(3)),
p50: percentile(values, 50),
p95: percentile(values, 95),
p99: percentile(values, 99),
max: Number(Math.max(...values).toFixed(3)),
};
}
function joinUrl(baseUrl, path) {
const base = baseUrl.replace(/\/+$/, "");
const suffix = path.startsWith("/") ? path : `/${path}`;
return `${base}${suffix}`;
}
function parseJsonObject(value, fallback) {
if (!value) return fallback;
try {
const parsed = JSON.parse(value);
return parsed && typeof parsed === "object" && !Array.isArray(parsed) ? parsed : fallback;
} catch {
return fallback;
}
}
function controlPlaneEndpoints() {
return [
{
id: "healthz",
path: "/healthz",
expected_status: 200,
expected_code: 0,
p95_budget_ms: Number(env.LANGBOT_PERF_HEALTHZ_P95_MS || "500"),
required_data_fields: [],
},
{
id: "system_info",
path: "/api/v1/system/info",
expected_status: 200,
expected_code: 0,
p95_budget_ms: Number(env.LANGBOT_PERF_SYSTEM_INFO_P95_MS || "1000"),
required_data_fields: ["version", "edition", "enable_marketplace"],
},
];
}
async function fetchEndpoint(backendUrl, endpoint, timeoutMs) {
const url = joinUrl(backendUrl, endpoint.path);
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), timeoutMs);
const started = performance.now();
let bodyText = "";
let json = null;
let jsonValid = false;
let error = "";
try {
const response = await fetch(url, {
method: "GET",
headers: { "accept": "application/json" },
signal: controller.signal,
});
bodyText = await response.text();
try {
json = bodyText ? JSON.parse(bodyText) : null;
jsonValid = json !== null;
} catch (parseError) {
error = parseError instanceof Error ? parseError.message : String(parseError);
}
const data = json && typeof json === "object" && json.data && typeof json.data === "object" ? json.data : {};
const missingFields = endpoint.required_data_fields.filter((field) => !(field in data));
const statusOk = response.status === endpoint.expected_status;
const codeOk = !json || typeof json !== "object" ? false : json.code === endpoint.expected_code;
const shapeOk = jsonValid && missingFields.length === 0;
const latencyMs = performance.now() - started;
return {
endpoint_id: endpoint.id,
path: endpoint.path,
url,
status: response.status,
ok: statusOk && codeOk && shapeOk,
status_ok: statusOk,
code_ok: codeOk,
json_valid: jsonValid,
missing_fields: missingFields,
response_code: json && typeof json === "object" ? json.code : null,
latency_ms: Number(latencyMs.toFixed(3)),
error,
};
} catch (fetchError) {
const latencyMs = performance.now() - started;
return {
endpoint_id: endpoint.id,
path: endpoint.path,
url,
status: 0,
ok: false,
status_ok: false,
code_ok: false,
json_valid: false,
missing_fields: endpoint.required_data_fields,
response_code: null,
latency_ms: Number(latencyMs.toFixed(3)),
error: fetchError instanceof Error ? fetchError.message : String(fetchError),
};
} finally {
clearTimeout(timeout);
}
}
async function runBatches(backendUrl, endpoints, totalRequests, concurrency, timeoutMs) {
const queue = Array.from({ length: totalRequests }, (_, index) => endpoints[index % endpoints.length]);
const results = [];
while (queue.length > 0) {
const batch = queue.splice(0, concurrency);
results.push(...await Promise.all(batch.map((endpoint) => fetchEndpoint(backendUrl, endpoint, timeoutMs))));
}
return results;
}
function endpointMetrics(endpoints, results) {
return Object.fromEntries(endpoints.map((endpoint) => {
const samples = results.filter((item) => item.endpoint_id === endpoint.id);
const okSamples = samples.filter((item) => item.ok);
return [
endpoint.id,
{
path: endpoint.path,
requests: samples.length,
ok_count: okSamples.length,
error_rate: samples.length === 0 ? 1 : Number(((samples.length - okSamples.length) / samples.length).toFixed(4)),
latency_ms: stats(okSamples.map((item) => item.latency_ms)),
p95_budget_ms: endpoint.p95_budget_ms,
},
];
}));
}
async function main() {
const root = resolve(env.LBS_ROOT || process.cwd());
const caseId = "langbot-live-control-plane-api";
const runId = env.LBS_RUN_ID || `${timestampSlug()}-${caseId}`;
const evidenceDir = resolve(env.LBS_EVIDENCE_DIR || join(root, "reports", "evidence", runId));
await mkdir(evidenceDir, { recursive: true });
const startedAt = new Date();
const backendUrl = env.LANGBOT_BACKEND_URL || "";
const endpoints = controlPlaneEndpoints();
const configuredBudgets = parseJsonObject(env.LANGBOT_CONTROL_PLANE_P95_BUDGETS_JSON, {});
for (const endpoint of endpoints) {
const budget = configuredBudgets[endpoint.id];
if (typeof budget === "number" && Number.isFinite(budget)) endpoint.p95_budget_ms = budget;
}
const totalRequests = Number(env.LANGBOT_CONTROL_PLANE_REQUESTS || "20");
const concurrency = Number(env.LANGBOT_CONTROL_PLANE_CONCURRENCY || "4");
const timeoutMs = Number(env.LANGBOT_CONTROL_PLANE_TIMEOUT_MS || "5000");
const maxErrorRate = Number(env.LANGBOT_CONTROL_PLANE_MAX_ERROR_RATE || "0");
const metricsPath = join(evidenceDir, "metrics.json");
const endpointsPath = join(evidenceDir, "endpoints.json");
const networkLogPath = join(evidenceDir, "network.log");
const automationResultPath = join(evidenceDir, "automation-result.json");
const resultPath = join(evidenceDir, "result.json");
let status = "fail";
let reason = "";
let results = [];
if (!backendUrl) {
status = "env_issue";
reason = "LANGBOT_BACKEND_URL is not configured.";
} else {
results = await runBatches(backendUrl, endpoints, totalRequests, concurrency, timeoutMs);
const allConnectionFailures = results.length > 0 && results.every((item) => item.status === 0);
if (allConnectionFailures) {
status = "env_issue";
reason = `Backend did not respond at ${backendUrl}.`;
}
}
const okResults = results.filter((item) => item.ok);
const statusCounts = {};
for (const item of results) {
const key = item.status === 0 ? "network_error" : String(item.status);
statusCounts[key] = (statusCounts[key] || 0) + 1;
}
const perEndpoint = endpointMetrics(endpoints, results);
const responseShapeFailures = results.filter((item) => !item.json_valid || item.missing_fields.length > 0 || !item.code_ok).length;
const errorRate = results.length === 0 ? 1 : Number(((results.length - okResults.length) / results.length).toFixed(4));
const thresholds = {
error_rate: { actual: errorRate, max: maxErrorRate, pass: errorRate <= maxErrorRate },
response_shape_failures: { actual: responseShapeFailures, max: 0, pass: responseShapeFailures === 0 },
};
for (const endpoint of endpoints) {
const actual = perEndpoint[endpoint.id].latency_ms.p95;
thresholds[`${endpoint.id}_p95_ms`] = {
actual,
max: endpoint.p95_budget_ms,
pass: actual <= endpoint.p95_budget_ms,
};
}
if (status !== "env_issue") {
const passed = Object.values(thresholds).every((item) => item.pass);
status = passed ? "pass" : "fail";
reason = passed
? "Live control-plane API probe passed all thresholds."
: "Live control-plane API probe breached shape, latency, or error-rate thresholds.";
}
const metrics = {
probe: caseId,
backend_url: backendUrl,
total_requests: totalRequests,
concurrency,
timeout_ms: timeoutMs,
ok_count: okResults.length,
error_count: results.length - okResults.length,
error_rate: errorRate,
status_counts: statusCounts,
response_shape_failures: responseShapeFailures,
endpoints: perEndpoint,
};
await writeFile(metricsPath, `${JSON.stringify({ ...metrics, samples: results }, null, 2)}\n`, "utf8");
await writeFile(endpointsPath, `${JSON.stringify(endpoints, null, 2)}\n`, "utf8");
await writeFile(networkLogPath, results.map((item) => JSON.stringify(item)).join("\n") + (results.length > 0 ? "\n" : ""), "utf8");
const finishedAt = new Date();
const result = {
source: "automation",
case_id: caseId,
run_id: runId,
status,
reason,
started_at: startedAt.toISOString(),
started_at_local: localIsoWithOffset(startedAt),
finished_at: finishedAt.toISOString(),
finished_at_local: localIsoWithOffset(finishedAt),
duration_ms: finishedAt.getTime() - startedAt.getTime(),
url: backendUrl,
metrics_summary: {
requests: metrics.total_requests,
concurrency: metrics.concurrency,
ok_count: metrics.ok_count,
error_rate: metrics.error_rate,
response_shape_failures: metrics.response_shape_failures,
endpoints: Object.fromEntries(Object.entries(metrics.endpoints).map(([id, value]) => [
id,
{
path: value.path,
ok_count: value.ok_count,
error_rate: value.error_rate,
latency_p50_ms: value.latency_ms.p50,
latency_p95_ms: value.latency_ms.p95,
},
])),
status_counts: metrics.status_counts,
},
thresholds_summary: thresholds,
artifacts: {
metrics_json: metricsPath,
endpoints_json: endpointsPath,
network_log: networkLogPath,
automation_result_json: automationResultPath,
result_json: resultPath,
},
evidence_collected: ["metrics", "network", "api_diagnostic", "filesystem"],
};
const resultText = `${JSON.stringify(result, null, 2)}\n`;
await writeFile(automationResultPath, resultText, "utf8");
await writeFile(resultPath, resultText, "utf8");
console.log(JSON.stringify(result, null, 2));
exit(status === "pass" ? 0 : status === "env_issue" ? 2 : 1);
}
await main();
@@ -0,0 +1,162 @@
#!/usr/bin/env node
import { mkdir, writeFile } from "node:fs/promises";
import { join, resolve } from "node:path";
import { env, exit } from "node:process";
function pad(value, size = 2) {
return String(value).padStart(size, "0");
}
function localIsoWithOffset(date = new Date()) {
const offsetMinutes = -date.getTimezoneOffset();
const sign = offsetMinutes >= 0 ? "+" : "-";
const absolute = Math.abs(offsetMinutes);
return [
`${date.getFullYear()}-${pad(date.getMonth() + 1)}-${pad(date.getDate())}`,
`T${pad(date.getHours())}:${pad(date.getMinutes())}:${pad(date.getSeconds())}.${pad(date.getMilliseconds(), 3)}`,
`${sign}${pad(Math.floor(absolute / 60))}:${pad(absolute % 60)}`,
].join("");
}
function timestampSlug(date = new Date()) {
return date.toISOString().replace(/\.\d{3}Z$/, "Z").replace(/[^0-9A-Za-z]+/g, "-").replace(/^-|-$/g, "");
}
function percentile(values, percentileValue) {
if (values.length === 0) return 0;
const sorted = [...values].sort((a, b) => a - b);
const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
return Number(sorted[index].toFixed(3));
}
function stats(values) {
return {
min: Number(Math.min(...values).toFixed(3)),
p50: percentile(values, 50),
p95: percentile(values, 95),
p99: percentile(values, 99),
max: Number(Math.max(...values).toFixed(3)),
};
}
function threshold(actual, limit, operator) {
const pass = operator === "<=" ? actual <= limit : actual >= limit;
return { actual, [operator === "<=" ? "max" : "min"]: limit, pass };
}
function makeSample(index) {
const ingress = 1 + (index % 5) * 0.22;
const pipeline = 2.8 + (index % 7) * 0.31;
const persistence = 1.1 + (index % 4) * 0.2;
const pluginIpc = 1.9 + (index % 6) * 0.27;
const rag = index % 3 === 0 ? 4.4 : 0.8 + (index % 5) * 0.18;
const streaming = 1.5 + (index % 8) * 0.24;
const provider = 80 + (index % 13) * 11;
const externalTool = index % 4 === 0 ? 25 + (index % 9) * 3 : 0;
const network = 8 + (index % 10) * 1.7;
const overhead = ingress + pipeline + persistence + pluginIpc + rag + streaming;
const external = provider + externalTool + network;
const total = overhead + external;
return {
index,
segments_ms: {
ingress,
pipeline,
persistence,
plugin_ipc: pluginIpc,
rag,
streaming,
provider,
external_tool: externalTool,
network,
},
langbot_overhead_ms: Number(overhead.toFixed(3)),
external_latency_ms: Number(external.toFixed(3)),
e2e_latency_ms: Number(total.toFixed(3)),
accounting_gap_ms: Number((total - external - overhead).toFixed(6)),
};
}
async function main() {
const root = resolve(env.LBS_ROOT || process.cwd());
const caseId = "langbot-overhead-accounting-contract";
const runId = env.LBS_RUN_ID || `${timestampSlug()}-${caseId}`;
const evidenceDir = resolve(env.LBS_EVIDENCE_DIR || join(root, "reports", "evidence", runId));
await mkdir(evidenceDir, { recursive: true });
const startedAt = new Date();
const sampleCount = Number(env.LANGBOT_PERF_CONTRACT_SAMPLES || "80");
const overheadP95BudgetMs = Number(env.LANGBOT_PERF_OVERHEAD_P95_MS || "25");
const samples = Array.from({ length: sampleCount }, (_, index) => makeSample(index));
const overheads = samples.map((sample) => sample.langbot_overhead_ms);
const e2e = samples.map((sample) => sample.e2e_latency_ms);
const external = samples.map((sample) => sample.external_latency_ms);
const gaps = samples.map((sample) => Math.abs(sample.accounting_gap_ms));
const memory = process.memoryUsage();
const metrics = {
probe: caseId,
sample_count: sampleCount,
langbot_overhead_ms: stats(overheads),
e2e_latency_ms: stats(e2e),
external_latency_ms: stats(external),
accounting_gap_max_ms: Number(Math.max(...gaps).toFixed(6)),
samples,
};
const thresholds = {
sample_count: threshold(sampleCount, 50, ">="),
langbot_overhead_p95_ms: threshold(metrics.langbot_overhead_ms.p95, overheadP95BudgetMs, "<="),
accounting_gap_max_ms: threshold(metrics.accounting_gap_max_ms, 0.001, "<="),
};
const status = Object.values(thresholds).every((item) => item.pass) ? "pass" : "fail";
const metricsPath = join(evidenceDir, "metrics.json");
const thresholdsPath = join(evidenceDir, "thresholds.json");
const resourceLogPath = join(evidenceDir, "resource-log.json");
const automationResultPath = join(evidenceDir, "automation-result.json");
const resultPath = join(evidenceDir, "result.json");
await writeFile(metricsPath, `${JSON.stringify(metrics, null, 2)}\n`, "utf8");
await writeFile(thresholdsPath, `${JSON.stringify(thresholds, null, 2)}\n`, "utf8");
await writeFile(resourceLogPath, `${JSON.stringify({ memory, pid: process.pid }, null, 2)}\n`, "utf8");
const finishedAt = new Date();
const result = {
source: "automation",
case_id: caseId,
run_id: runId,
status,
reason: status === "pass"
? "Overhead accounting contract passed all thresholds."
: "Overhead accounting contract breached one or more thresholds.",
started_at: startedAt.toISOString(),
started_at_local: localIsoWithOffset(startedAt),
finished_at: finishedAt.toISOString(),
finished_at_local: localIsoWithOffset(finishedAt),
duration_ms: finishedAt.getTime() - startedAt.getTime(),
metrics_summary: {
sample_count: metrics.sample_count,
langbot_overhead_p95_ms: metrics.langbot_overhead_ms.p95,
e2e_latency_p95_ms: metrics.e2e_latency_ms.p95,
external_latency_p95_ms: metrics.external_latency_ms.p95,
accounting_gap_max_ms: metrics.accounting_gap_max_ms,
},
thresholds_summary: thresholds,
artifacts: {
metrics_json: metricsPath,
thresholds_json: thresholdsPath,
resource_log_json: resourceLogPath,
automation_result_json: automationResultPath,
result_json: resultPath,
},
evidence_collected: ["metrics", "resource_log", "filesystem"],
};
const resultText = `${JSON.stringify(result, null, 2)}\n`;
await writeFile(automationResultPath, resultText, "utf8");
await writeFile(resultPath, resultText, "utf8");
console.log(JSON.stringify(result, null, 2));
exit(status === "pass" ? 0 : 1);
}
await main();
@@ -0,0 +1,134 @@
export function summarizeFakeProviderState(state) {
if (!state) return null;
const recentRequests = Array.isArray(state.recent_requests) ? state.recent_requests : [];
const chatRequests = recentRequests.filter((request) => String(request?.path || "").includes("/chat/completions"));
const successfulRequests = chatRequests.filter((request) => request?.status === "ok");
const faultRequests = chatRequests.filter((request) => (
request?.should_fail === true
|| request?.status === "http_fault"
|| (Number.isFinite(request?.http_status) && request.http_status >= 400)
));
return {
status: state.status || "unknown",
url: state.url || "",
request_count: Number.isFinite(state.request_count) ? state.request_count : recentRequests.length,
recent_request_count: recentRequests.length,
chat_request_count: chatRequests.length,
fault_count: faultRequests.length,
streamed_request_count: chatRequests.filter((request) => request?.stream === true).length,
duration_ms: stats(chatRequests.map((request) => numberOrNull(request?.duration_ms)).filter(Number.isFinite)),
successful_duration_ms: stats(successfulRequests.map((request) => numberOrNull(request?.duration_ms)).filter(Number.isFinite)),
first_chunk_ms: stats(successfulRequests.map((request) => numberOrNull(request?.first_chunk_ms)).filter(Number.isFinite)),
first_content_chunk_ms: stats(successfulRequests.map((request) => numberOrNull(request?.first_content_chunk_ms)).filter(Number.isFinite)),
content_chunk_count: stats(successfulRequests.map((request) => numberOrNull(request?.content_chunk_count)).filter(Number.isFinite)),
config: state.config || {},
};
}
export function buildProviderTimingMetrics(samples, state) {
const recentRequests = Array.isArray(state?.recent_requests) ? state.recent_requests : [];
const byExpectedText = new Map();
for (const request of recentRequests) {
const expected = String(request?.expected_text || "");
if (!expected) continue;
if (!byExpectedText.has(expected)) byExpectedText.set(expected, []);
byExpectedText.get(expected).push(request);
}
const segments = [];
const missingExpectedText = [];
for (const sample of samples) {
const expected = String(sample?.expected_text || "");
if (!expected) continue;
const request = (byExpectedText.get(expected) || []).shift();
if (!request) {
missingExpectedText.push(expected);
continue;
}
const segment = buildTimingSegment(sample, request);
if (segment) segments.push(segment);
}
const values = (key) => segments.map((segment) => numberOrNull(segment[key])).filter(Number.isFinite);
return {
matched_request_count: segments.length,
missing_provider_match_count: missingExpectedText.length,
missing_expected_text: missingExpectedText.slice(0, 20),
send_to_provider_start_ms: stats(values("send_to_provider_start_ms")),
provider_duration_ms: stats(values("provider_duration_ms")),
provider_finish_to_ws_final_ms: stats(values("provider_finish_to_ws_final_ms")),
langbot_overhead_estimate_ms: stats(values("langbot_overhead_estimate_ms")),
e2e_minus_provider_ms: stats(values("e2e_minus_provider_ms")),
provider_first_content_to_ws_first_content_ms: stats(values("provider_first_content_to_ws_first_content_ms")),
segments,
};
}
function buildTimingSegment(sample, request) {
const sentEpochMs = numberOrNull(sample.sent_epoch_ms);
const finishedEpochMs = numberOrNull(sample.finished_epoch_ms);
const providerStartedEpochMs = numberOrNull(request.started_epoch_ms);
const providerFinishedEpochMs = numberOrNull(request.finished_epoch_ms);
const providerFirstContentEpochMs = numberOrNull(request.first_content_chunk_epoch_ms);
const wsFirstContentEpochMs = numberOrNull(sample.first_assistant_content_epoch_ms);
const responseDurationMs = numberOrNull(sample.response_duration_ms);
const providerDurationMs = numberOrNull(request.duration_ms);
const sendToProviderStartMs = finiteDelta(providerStartedEpochMs, sentEpochMs);
const providerFinishToWsFinalMs = finiteDelta(finishedEpochMs, providerFinishedEpochMs);
const e2eMinusProviderMs = Number.isFinite(responseDurationMs) && Number.isFinite(providerDurationMs)
? rounded(responseDurationMs - providerDurationMs)
: null;
const overheadEstimateMs = Number.isFinite(sendToProviderStartMs) && Number.isFinite(providerFinishToWsFinalMs)
? rounded(sendToProviderStartMs + providerFinishToWsFinalMs)
: e2eMinusProviderMs;
return {
sample_index: sample.index,
pipeline_label: sample.pipeline_label || "",
expected_text: sample.expected_text || "",
provider_request_id: request.id || "",
provider_request_number: request.request_number ?? null,
response_duration_ms: responseDurationMs,
provider_duration_ms: providerDurationMs,
send_to_provider_start_ms: sendToProviderStartMs,
provider_finish_to_ws_final_ms: providerFinishToWsFinalMs,
langbot_overhead_estimate_ms: overheadEstimateMs,
e2e_minus_provider_ms: e2eMinusProviderMs,
provider_first_content_to_ws_first_content_ms: finiteDelta(wsFirstContentEpochMs, providerFirstContentEpochMs),
provider_status: request.status || "",
provider_http_status: request.http_status ?? null,
};
}
function finiteDelta(left, right) {
return Number.isFinite(left) && Number.isFinite(right) ? rounded(left - right) : null;
}
export function stats(values) {
if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
return {
min: rounded(Math.min(...values)),
p50: percentile(values, 50),
p95: percentile(values, 95),
p99: percentile(values, 99),
max: rounded(Math.max(...values)),
};
}
export function percentile(values, percentileValue) {
if (values.length === 0) return 0;
const sorted = [...values].sort((a, b) => a - b);
const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
return rounded(sorted[index]);
}
export function rounded(value) {
return Number(value.toFixed(3));
}
function numberOrNull(value) {
const number = Number(value);
return Number.isFinite(number) ? number : null;
}
@@ -0,0 +1,285 @@
# Performance And Reliability Testing
Use this reference when a QA request asks whether LangBot is fast enough,
stable under load, or resilient to controlled faults.
These probes are manual/non-required QA gates unless a case or suite explicitly
states otherwise. They depend on a live local backend, mutable QA fixtures, and
operator-selected environment variables, so do not promote them to required CI
checks until fake-provider isolation, ownership markers, and cleanup are in
place.
## Scope
Treat `skills/` as the QA control plane:
- Cases define intent, readiness, thresholds, and required evidence.
- Probe scripts collect metrics, traces, resource logs, and artifacts.
- Reports classify the same run as `pass`, `fail`, `blocked`,
`env_issue`, or `flaky`.
Do not turn `skills/` into a load generator or chaos engine. Call a focused
tool from a `mode: probe` case when the test needs one, for example k6,
Locust, pytest-benchmark, Playwright trace collection, Toxiproxy, Docker, or a
Kubernetes disruption tool.
## LangBot Performance Model
For LangBot, performance is the cost LangBot adds around external systems:
```text
LangBot overhead = end-to-end latency - provider latency - external tool latency - network/fault injection latency
```
Measure user experience and internal composition separately:
- WebUI load and interaction latency.
- Debug Chat send-to-first-visible-token and send-to-completion latency.
- Pipeline, RAG, plugin runtime, MCP, AgentRunner, and persistence segment
latency.
- Queue wait time, concurrency, throughput, timeout rate, and p95/p99 latency.
- Startup, plugin install, knowledge-base ingestion, migration, and recovery
time.
Do not report a single message round-trip time as "LangBot performance" unless
the report also explains external provider/tool/network time.
## Evidence Contract
Performance and reliability cases should declare the evidence they need:
- `metrics`: machine-readable latency, throughput, error-rate, or recovery
metrics, usually `metrics.json`.
- `resource_log`: CPU, memory, process, connection, queue, or file descriptor
samples.
- `trace`: browser, HTTP, database, or runtime trace artifacts.
- `profile`: CPU, memory, or flamegraph profile artifacts.
- `backend_log`, `network`, `api_diagnostic`, and `filesystem` as supporting
evidence when relevant.
Automation should write `automation-result.json` with these fields when
available:
```json
{
"status": "pass",
"reason": "Probe passed all thresholds.",
"metrics_summary": {
"langbot_overhead_p95_ms": 12.4,
"error_rate": 0
},
"thresholds_summary": {
"langbot_overhead_p95_ms": { "actual": 12.4, "max": 50, "pass": true }
},
"artifacts": {
"metrics_json": "/path/to/metrics.json"
},
"evidence_collected": ["metrics", "filesystem"]
}
```
Synthetic contract probes are useful for checking the QA harness, but they are
not live product performance results. Label them as contract probes in the case
title, checks, and report.
## Chaos And Reliability Rules
Chaos tests must be narrow and reversible:
- Declare the fault model in `fault_model_json`.
- Record blast radius, target component, injection method, duration, and abort
conditions.
- Capture recovery checks and cleanup steps in the case.
- Classify unavailable dependencies as `env_issue` unless the target behavior
is LangBot's handling of that dependency failure.
- Do not run destructive fault injection against a shared or production-like
instance without explicit operator approval.
Recommended first fault models:
- Provider timeout or HTTP 429 from a fake provider endpoint.
- Plugin runtime disconnect/reconnect in a local instance.
- MCP stdio server exits mid-call.
- RAG parser fixture fails once and recovers on retry.
- Backend API endpoint returns 5xx from a controlled local proxy.
## Starter Live Probes
The starter gate separates QA-harness contracts from live product checks:
- `langbot-overhead-accounting-contract` verifies that reports can carry
overhead accounting metrics. It uses deterministic synthetic samples and is
not live product performance.
- `langbot-fault-taxonomy-contract` verifies that fault scenarios declare
expected status, recovery, and cleanup before destructive chaos tests are
added.
- `langbot-live-backend-latency` checks the unauthenticated `/healthz`
endpoint for basic backend responsiveness.
- `langbot-live-control-plane-api` checks `/healthz` and
`/api/v1/system/info` for HTTP 200, JSON `code: 0`, response shape, and
per-endpoint p95 latency.
- `langbot-live-backend-log-health` scans the recent backend log window for
fail-severity runtime findings. It is the reliability guard that should fail
the gate when HTTP probes pass but backend logs contain Traceback, ImportError,
ERROR, unclosed sessions, or unawaited coroutine signals.
Do not treat these starter live probes as Debug Chat or model-provider
performance. They are control-plane readiness checks; user-facing performance
needs browser/WebSocket/message-path measurements.
## Debug Chat Load And Fake Provider Baseline
Use `langbot-fake-provider-debug-chat-load` before real-provider load checks.
The setup automation starts a local OpenAI-compatible fake provider, registers
it as a normal LangBot provider/model, configures a local-agent pipeline, resets
Debug Chat, and then drives concurrent WebSocket messages through the live
backend.
This is not a mocked backend test. It still exercises:
- provider/model persistence and runtime reload;
- LiteLLM OpenAI-compatible requester path;
- local-agent runner selection and pipeline execution;
- Debug Chat WebSocket adapter and broadcast behavior;
- backend concurrency, timeout, and error-rate accounting.
The fake provider is deterministic and can inject controlled latency or faults
with `LANGBOT_FAKE_PROVIDER_*` variables, so it is the baseline for LangBot
message-path overhead. A fake-provider process keeps process-global config,
request counters, and recent request history; run fake-provider probes serially
or give each run its own provider instance. Concurrent probes against the same
fake-provider URL can reset or reconfigure each other's metrics.
The probe uses unique expected response tokens per
request because Debug Chat broadcasts messages to every connection in the same
session; unique tokens prevent one connection from counting another
connection's response as its own.
When the fake provider is used, reports also include provider-side timing in
`metrics.json`:
- `fake_provider.duration_ms` and `fake_provider.first_content_chunk_ms`
measure the controlled provider itself.
- `provider_timing.send_to_provider_start_ms` estimates WebSocket ingress,
pipeline dispatch, runner setup, and requester time before the provider
receives the request.
- `provider_timing.provider_finish_to_ws_final_ms` estimates the path from
provider completion back to the final Debug Chat WebSocket response.
- `provider_timing.langbot_overhead_estimate_ms` is the sum of those two
LangBot-side segments when wall-clock timestamps can be matched by the
unique expected response token.
After the baseline passes, run `langbot-fake-provider-debug-chat-slow-load` to
keep the same live backend path while injecting deterministic streaming latency.
Run `langbot-fake-provider-debug-chat-fault-recovery` to inject bounded HTTP
provider failures and require both observed failures and later successful
requests. The fault-recovery case is deliberately sequential because failed
Debug Chat responses do not carry a unique success token that can be attributed
to one concurrent connection.
Run `langbot-fake-provider-debug-chat-cross-pipeline-isolation` separately via
`langbot-debug-chat-isolation-gate`. Current LangBot releases may fail it because
of product bug [#2286](https://github.com/langbot-app/LangBot/issues/2286), where
Debug Chat replies can read singleton WebSocket proxy pipeline state after a
later message overwrites it. Treat that failure as regression evidence for the
product fix rather than as a fake-provider latency finding.
Use `langbot-space-debug-chat-concurrency-smoke` after the fake-provider
baseline. It runs a deliberately small real Space-provider batch and reports
user-visible latency, not pure LangBot overhead. Space/model/network failures
are dependency findings until the fake baseline shows the same symptom.
If a Space smoke passes but log guard finds telemetry posting Tracebacks,
classify that separately as `telemetry-proxy-noise` instead of clearing the
proxy or treating the Debug Chat path as failed.
Useful commands:
```bash
rtk bin/lbs test run langbot-fake-provider-debug-chat-load --run-id langbot-fake-load-local
rtk bin/lbs test run langbot-fake-provider-debug-chat-slow-load --run-id langbot-fake-slow-local
rtk bin/lbs test run langbot-fake-provider-debug-chat-fault-recovery --run-id langbot-fake-fault-local
rtk bin/lbs suite run langbot-debug-chat-isolation-gate --run-id langbot-debug-chat-isolation-local --include-manual-check
rtk bin/lbs test run langbot-space-debug-chat-concurrency-smoke --run-id langbot-space-smoke-local
rtk bin/lbs suite run langbot-debug-chat-load-gate --run-id langbot-debug-chat-load-local --include-manual-check
```
## Gate Layers
Use the smallest gate that answers the quality question:
- `langbot-performance-contract-gate`: fast synthetic checks for report shape,
threshold accounting, and fault taxonomy. Good for PR feedback when no live
service is running.
- `langbot-live-backend-gate`: live backend `/healthz`,
`/api/v1/system/info`, and backend log health. Good after starting a local
LangBot backend.
- `langbot-user-path-performance-gate`: browser-visible user path performance,
starting with Pipeline Debug Chat send-to-visible-completion latency. Run it
only when the browser profile and target pipeline are ready.
- `langbot-debug-chat-load-gate`: manual WebSocket Debug Chat load checks,
starting with controlled fake-provider baseline, slow-provider, and
fault-recovery profiles, plus an optional low-volume real Space-provider
smoke. Run fake-provider cases serially when they share a provider URL.
- `langbot-debug-chat-isolation-gate`: manual cross-pipeline Debug Chat
isolation regression gate. Current releases may fail because of #2286; keep it
separate from the normal load gate until that product fix lands.
- `langbot-performance-reliability-gate`: combined starter gate for synthetic
contracts plus live backend checks.
Keep environment diagnostics separate from product regressions. For example, a
SOCKS proxy without Python `socksio` support should be fixed or clearly
classified by `bin/lbs env doctor`; do not hide the resulting backend
Traceback in reports.
## Debug Chat Performance
`pipeline-debug-chat-performance` reuses the browser Debug Chat automation and
adds `metrics.json`, `metrics_summary`, and `thresholds_summary` to
`automation-result.json`.
Current metric:
```text
response_duration_ms = prompt send -> expected assistant response visible and stable
```
This is a user-path metric, not pure LangBot overhead. If it regresses, inspect
provider latency, model route health, plugin/runtime logs, WebSocket behavior,
and browser console/network evidence before attributing the whole duration to
LangBot.
### User-Path Gate Runbook
1. Start the backend and frontend. The frontend must be launched with
`VITE_API_BASE_URL="$LANGBOT_BACKEND_URL"` so browser API calls reach the
backend.
2. Run `node scripts/e2e/ensure-local-agent-pipeline.mjs --write-env`. The
setup refreshes the local QA login, skips the wizard, prepares a Debug Chat
pipeline, scans Space models, tests candidates, writes tested fallback
models, and writes the selected pipeline/model env values to
`skills/.env.local`.
3. If setup returns `env_issue`, read `model_tests` and provider errors first.
A missing Space key, failed Space scan, or unavailable model route is not a
LangBot performance regression.
4. Run
`bin/lbs suite run langbot-user-path-performance-gate --include-manual-check`.
5. Interpret `response_p95_ms` as browser-visible send-to-completion time. It
includes provider latency; use backend logs and model test evidence to
separate LangBot overhead from the external model route.
The setup keeps a `max-round` value in the generated pipeline config only
because the current backend truncator still reads that field directly. Do not
use it as a quality requirement for future local-agent behavior.
## Running The First Gate
Start with the reusable suite:
```bash
rtk bin/lbs suite plan langbot-performance-reliability-gate
rtk bin/lbs suite start langbot-performance-reliability-gate --run-id langbot-perf-rel-local
```
Run synthetic contract probes first. Run live probes only after the selected
backend/frontend instance is reachable and the run owner accepts any fault
scope.
@@ -0,0 +1,13 @@
id: langbot-debug-chat-isolation-gate
title: "LangBot Debug Chat isolation gate"
description: "Manual/non-required cross-pipeline Debug Chat isolation gate. Current releases may fail this gate because of product bug #2286; use it as regression evidence after the routing fix lands."
type: reliability
priority: p1
tags:
- reliability
- debug-chat
- websocket
- isolation
- concurrency
cases:
- langbot-fake-provider-debug-chat-cross-pipeline-isolation
@@ -0,0 +1,15 @@
id: langbot-debug-chat-load-gate
title: "LangBot Debug Chat load gate"
description: "Manual/non-required message-path load checks for Pipeline Debug Chat: controlled fake-provider baseline, slow-provider and fault-recovery profiles, plus optional real Space-provider smoke. Cross-pipeline isolation is split into langbot-debug-chat-isolation-gate because current releases may fail it due to product bug #2286."
type: performance
priority: p1
tags:
- performance
- debug-chat
- websocket
- load
cases:
- langbot-fake-provider-debug-chat-load
- langbot-fake-provider-debug-chat-slow-load
- langbot-fake-provider-debug-chat-fault-recovery
- langbot-space-debug-chat-concurrency-smoke
@@ -0,0 +1,14 @@
id: langbot-live-backend-gate
title: "LangBot live backend reliability gate"
description: "Live backend control-plane responsiveness and runtime log health checks for a locally running LangBot instance."
type: reliability
priority: p1
tags:
- performance
- reliability
- live-backend
- metrics
cases:
- langbot-live-backend-latency
- langbot-live-control-plane-api
- langbot-live-backend-log-health
@@ -0,0 +1,13 @@
id: langbot-performance-contract-gate
title: "LangBot performance contract gate"
description: "Fast synthetic contract checks for performance metric accounting and non-destructive reliability fault taxonomy."
type: contract
priority: p1
tags:
- performance
- reliability
- contract
- metrics
cases:
- langbot-overhead-accounting-contract
- langbot-fault-taxonomy-contract
@@ -0,0 +1,16 @@
id: langbot-performance-reliability-gate
title: "LangBot performance and reliability starter gate"
description: "Starter gate for LangBot performance accounting, live backend control-plane latency, and non-destructive fault taxonomy checks."
type: reliability
priority: p1
tags:
- performance
- reliability
- metrics
- chaos
cases:
- langbot-overhead-accounting-contract
- langbot-fault-taxonomy-contract
- langbot-live-backend-latency
- langbot-live-control-plane-api
- langbot-live-backend-log-health
@@ -0,0 +1,12 @@
id: langbot-user-path-performance-gate
title: "LangBot user-path performance gate"
description: "Browser-visible performance checks for user-facing LangBot paths such as Pipeline Debug Chat."
type: performance
priority: p1
tags:
- performance
- browser
- debug-chat
- user-path
cases:
- pipeline-debug-chat-performance
@@ -0,0 +1,23 @@
id: telemetry-proxy-noise
title: "Telemetry posting fails through the proxy while the target flow succeeds"
date: 2026-06-25
category: env_issue
symptoms:
- "The target Debug Chat or provider smoke request completes successfully."
- "The same log window contains a Traceback for telemetry posting."
- "The traceback references the Space telemetry endpoint."
patterns:
- "Failed to post telemetry"
- "https://space.langbot.app/api/v1/telemetry"
- "httpx.ConnectError"
likely_causes:
- "The backend process inherited proxy settings that are required for model/provider access but unreliable for telemetry posting."
- "The telemetry endpoint is temporarily unreachable through the local proxy route."
- "TLS or proxy negotiation failed for the non-critical telemetry request."
fix_steps:
- "Keep the proxy configuration needed for model/provider access; do not clear it only to hide telemetry noise."
- "Check that uppercase and lowercase proxy variables are consistent before rerunning a live Space smoke."
- "Classify the target flow and log-health result separately: a successful Debug Chat run can still have an environment log-health finding."
verification: "A rerun shows the target case success patterns and no telemetry Traceback in the scanned log window, or the report explicitly records the telemetry issue as environment noise."
related_cases:
- langbot-space-debug-chat-concurrency-smoke
+35
View File
@@ -1,5 +1,7 @@
import { existsSync } from "node:fs";
import { spawnSync } from "node:child_process";
import { Socket } from "node:net";
import { join } from "node:path";
import type { CommandContext } from "../types.ts";
import { parseOptions } from "../cli.ts";
import { loadEnv } from "../fs.ts";
@@ -88,6 +90,37 @@ function compareProxyPair(env: Record<string, string>, upper: string, lower: str
return null;
}
function envValue(env: Record<string, string>, key: string): string {
return process.env[key] ?? env[key] ?? "";
}
function activeSocksProxy(env: Record<string, string>): { key: string; value: string } | null {
for (const key of ["ALL_PROXY", "all_proxy", "HTTPS_PROXY", "https_proxy", "HTTP_PROXY", "http_proxy"]) {
const value = envValue(env, key);
if (/^socks/i.test(value)) return { key, value };
}
return null;
}
function checkSocksio(env: Record<string, string>): string | null {
const proxy = activeSocksProxy(env);
if (!proxy) return null;
const repo = env.LANGBOT_REPO;
const python = repo ? join(repo, ".venv", "bin", "python") : "";
if (!python || !existsSync(python)) {
return `SOCKS proxy ${proxy.key} is configured (${redactEnvValue(proxy.key, proxy.value)}), but LangBot venv python was not found; after creating the venv, verify it can import socksio.`;
}
const result = spawnSync(python, ["-c", "import socksio"], {
encoding: "utf8",
timeout: 5000,
});
if (result.status === 0) return null;
return `SOCKS proxy ${proxy.key} is configured (${redactEnvValue(proxy.key, proxy.value)}), but ${python} cannot import socksio; run \`${python} -m pip install socksio\` or start LangBot without SOCKS proxy env.`;
}
export async function commandEnvDoctor(ctx: CommandContext): Promise<number> {
const env = loadEnv(ctx.root);
const failures: string[] = [];
@@ -117,6 +150,8 @@ export async function commandEnvDoctor(ctx: CommandContext): Promise<number> {
]) {
if (mismatch) failures.push(mismatch);
}
const socksioFailure = checkSocksio(env);
if (socksioFailure) failures.push(socksioFailure);
for (const [label, result] of await Promise.all([
checkUrl("LANGBOT_BACKEND_URL", env.LANGBOT_BACKEND_URL).then((result) => ["LANGBOT_BACKEND_URL", result] as const),
+44 -3
View File
@@ -465,6 +465,41 @@ function outputTail(value: string | Buffer | null | undefined): string {
return String(value ?? "").trim().slice(-4000);
}
function exitStatusFromResultStatus(status: string): number {
if (status === "pass") return 0;
if (status === "blocked" || status === "env_issue" || status === "flaky") return 2;
return 1;
}
function executionStatusFromExitStatus(status: number): string {
if (status === 0) return "ok";
if (status === 2) return "classified";
return "nonzero";
}
function executionFromCaseResultFile(caseItem: Record<string, unknown>): Record<string, unknown> | null {
const resultPath = join(String(caseItem.evidence_dir), "result.json");
if (!existsSync(resultPath)) return null;
try {
const parsed = JSON.parse(readFileSync(resultPath, "utf8")) as Record<string, unknown>;
if (
parsed.case_id !== caseItem.id ||
parsed.run_id !== caseItem.run_id ||
typeof parsed.status !== "string"
) return null;
const exitStatus = exitStatusFromResultStatus(parsed.status);
return {
status: executionStatusFromExitStatus(exitStatus),
exit_status: exitStatus,
reason: typeof parsed.reason === "string" ? parsed.reason : "result.json completed",
result_status: parsed.status,
result_json: resultPath,
};
} catch {
return null;
}
}
function executionProblemStatus(executions: Array<Record<string, unknown>>): string {
const statuses = executions.map((item) => String(item.status));
if (statuses.includes("nonzero")) return "fail";
@@ -523,12 +558,18 @@ export function commandSuiteRun(ctx: CommandContext): number {
encoding: "utf8",
stdio: options.json === true ? "pipe" : "inherit",
});
const status = result.error ? 1 : result.status ?? 1;
const fileExecution = result.error ? executionFromCaseResultFile(caseItem) : null;
const status = typeof fileExecution?.exit_status === "number"
? fileExecution.exit_status
: result.error ? 1 : result.status ?? 1;
executions.push({
id: caseItem.id,
status: status === 0 ? "ok" : "nonzero",
status: fileExecution?.status ?? executionStatusFromExitStatus(status),
exit_status: status,
reason: result.error?.message || "",
reason: fileExecution?.reason ?? result.error?.message ?? "",
result_status: fileExecution?.result_status,
result_json: fileExecution?.result_json,
spawn_error: fileExecution && result.error ? result.error.message : undefined,
stdout: outputTail(result.stdout),
stderr: outputTail(result.stderr),
});
+95 -14
View File
@@ -271,7 +271,7 @@ function reportTemplate(mode: string): Record<string, string> {
target_tested: "Probe target, endpoint, file, command, or service actually checked",
execution_path: "automation script | shell command | direct API | other",
probe_result: "What the probe observed",
logs_or_artifacts: "Log, filesystem, API, or other artifact paths collected",
metrics_or_artifacts: "Metrics, logs, filesystem artifacts, traces, or profiles collected",
diagnostics: "Extra diagnostics used, if any",
matched_troubleshooting: "Troubleshooting ids matched, if any",
assets_to_update: "New case/reference/troubleshooting entries to add",
@@ -320,7 +320,7 @@ function manualEvidenceTemplate(mode: string): ManualEvidenceTemplate {
target_tested: "TODO: probe target, endpoint, file, command, or service actually checked",
execution_path: "TODO: automation script | shell command | direct API | other",
probe_result: "TODO: observed probe result",
logs_or_artifacts: "TODO: evidence paths or skipped reason",
metrics_or_artifacts: "TODO: metrics, logs, filesystem artifacts, traces, or profiles collected",
diagnostics: "TODO: additional diagnostics used, if any",
matched_troubleshooting: "TODO: troubleshooting ids matched, if any",
assets_to_update: "TODO: case/reference/troubleshooting updates to make",
@@ -1099,6 +1099,41 @@ function executionTail(value: string | Buffer | null | undefined): string {
return String(value ?? "").trim().slice(-4000);
}
function exitStatusFromResultStatus(status: string): number {
if (status === "pass") return 0;
if (status === "blocked" || status === "env_issue" || status === "flaky") return 2;
return 1;
}
function executionStatusFromExitStatus(status: number): string {
if (status === 0) return "ok";
if (status === 2) return "classified";
return "nonzero";
}
function executionFromAutomationResultFile(
evidenceDir: string,
caseId: string,
runId: string,
): { status: string; exit_status: number; reason: string; result_status: string; path: string } | null {
const resultPath = join(evidenceDir, "automation-result.json");
if (!existsSync(resultPath)) return null;
try {
const parsed = JSON.parse(readFileSync(resultPath, "utf8")) as Record<string, unknown>;
if (parsed.case_id !== caseId || parsed.run_id !== runId || typeof parsed.status !== "string") return null;
const exitStatus = exitStatusFromResultStatus(parsed.status);
return {
status: executionStatusFromExitStatus(exitStatus),
exit_status: exitStatus,
reason: typeof parsed.reason === "string" ? parsed.reason : "automation-result.json completed",
result_status: parsed.status,
path: resultPath,
};
} catch {
return null;
}
}
function runSetupAutomation(
ctx: CommandContext,
item: StructuredItem,
@@ -1224,6 +1259,30 @@ export function commandTestRun(ctx: CommandContext): number {
});
if (result.error) {
const fileExecution = executionFromAutomationResultFile(
run.automation.evidence_dir,
String(run.case.id),
run.run_id,
);
if (fileExecution) {
if (options.json !== true) {
console.error(`WARN: automation spawn reported an error, but ${fileExecution.path} completed: ${result.error.message}`);
}
if (options.json === true) {
console.log(JSON.stringify({
run,
setup_executions: setupExecutions,
automation_execution: {
...fileExecution,
spawn_error: result.error.message,
stdout: executionTail(result.stdout),
stderr: executionTail(result.stderr),
},
exit_status: fileExecution.exit_status,
}, null, 2));
}
return fileExecution.exit_status;
}
if (options.json !== true) console.error(`ERROR: failed to run automation: ${result.error.message}`);
if (options.json === true) {
console.log(JSON.stringify({
@@ -1247,7 +1306,7 @@ export function commandTestRun(ctx: CommandContext): number {
run,
setup_executions: setupExecutions,
automation_execution: {
status: status === 0 ? "ok" : "nonzero",
status: executionStatusFromExitStatus(status),
exit_status: status,
stdout: executionTail(result.stdout),
stderr: executionTail(result.stderr),
@@ -1311,6 +1370,7 @@ function renderMarkdownReport(report: TestReport): string {
const environment = report.environment;
const logGuard = report.log_guard;
const troubleshooting = report.troubleshooting;
const automation = report.automation_result;
const lines: string[] = [];
lines.push(`# Test Report: ${reportCase.id}`);
@@ -1323,20 +1383,41 @@ function renderMarkdownReport(report: TestReport): string {
lines.push(`Type: ${reportCase.type}`);
lines.push("");
lines.push("## Result");
lines.push(`- result: ${evidence.result}`);
for (const [key, value] of Object.entries(evidence)) {
if (key !== "result") lines.push(`- ${key}: ${value}`);
if (automation.status === "loaded" && automation.result) {
lines.push(`- result: ${automation.result}`);
if (automation.reason) lines.push(`- reason: ${automation.reason}`);
if (automation.url) lines.push(`- target_tested: ${automation.url}`);
if (automation.path) lines.push(`- automation_result: ${automation.path}`);
if (automation.artifacts) lines.push(`- artifacts: ${JSON.stringify(automation.artifacts)}`);
} else {
lines.push(`- result: ${evidence.result}`);
for (const [key, value] of Object.entries(evidence)) {
if (key !== "result") lines.push(`- ${key}: ${value}`);
}
}
lines.push("");
lines.push("## Automation Result");
lines.push(`- status: ${report.automation_result.status}`);
if (report.automation_result.path) lines.push(`- path: ${report.automation_result.path}`);
if (report.automation_result.result) lines.push(`- result: ${report.automation_result.result}`);
if (report.automation_result.reason) lines.push(`- reason: ${report.automation_result.reason}`);
if (report.automation_result.started_at_local) lines.push(`- started_at_local: ${report.automation_result.started_at_local}`);
if (report.automation_result.finished_at_local) lines.push(`- finished_at_local: ${report.automation_result.finished_at_local}`);
if (report.automation_result.url) lines.push(`- url: ${report.automation_result.url}`);
if (report.automation_result.expected_text) lines.push(`- expected_text: ${report.automation_result.expected_text}`);
lines.push(`- status: ${automation.status}`);
if (automation.path) lines.push(`- path: ${automation.path}`);
if (automation.result) lines.push(`- result: ${automation.result}`);
if (automation.reason) lines.push(`- reason: ${automation.reason}`);
if (automation.duration_ms !== undefined) lines.push(`- duration_ms: ${automation.duration_ms}`);
if (automation.started_at_local) lines.push(`- started_at_local: ${automation.started_at_local}`);
if (automation.finished_at_local) lines.push(`- finished_at_local: ${automation.finished_at_local}`);
if (automation.url) lines.push(`- url: ${automation.url}`);
if (automation.expected_text) lines.push(`- expected_text: ${automation.expected_text}`);
if (automation.metrics_summary) {
lines.push("- metrics_summary:");
lines.push(` ${JSON.stringify(automation.metrics_summary)}`);
}
if (automation.thresholds_summary) {
lines.push("- thresholds_summary:");
lines.push(` ${JSON.stringify(automation.thresholds_summary)}`);
}
if (automation.artifacts) {
lines.push("- artifacts:");
lines.push(` ${JSON.stringify(automation.artifacts)}`);
}
lines.push("");
lines.push("## Environment");
for (const [key, value] of Object.entries(environment)) lines.push(`- ${key}=${value}`);
+55
View File
@@ -126,6 +126,9 @@ function validateCaseItem(root: string, item: StructuredItem, skillNames: Set<st
...validateEnvKeyScalar(item, "automation_pipeline_url_env"),
...validateEnvKeyScalar(item, "automation_pipeline_name_env"),
...validateJsonScalar(item, "automation_filesystem_checks_json"),
...validateJsonScalar(item, "metrics_thresholds_json"),
...validateJsonScalar(item, "load_profile_json"),
...validateJsonScalar(item, "fault_model_json"),
...listValue(item.fields, "setup_automation").flatMap((entry) => (
validateSetupAutomationEntry(root, entry, caseIds).map((error) => `${item.path}: ${error}`)
)),
@@ -183,10 +186,62 @@ function validateCaseItem(root: string, item: StructuredItem, skillNames: Set<st
if (timeout && (!/^\d+$/.test(timeout) || Number.parseInt(timeout, 10) <= 0)) {
errors.push(`${item.path}: 'automation_response_timeout_ms' must be a positive integer string`);
}
for (const key of [
"automation_debug_chat_load_requests",
"automation_debug_chat_load_concurrency",
"automation_debug_chat_load_timeout_ms",
"automation_debug_chat_load_response_p95_ms",
"automation_debug_chat_load_first_response_p95_ms",
]) {
const value = scalar(item.fields, key);
if (value && (!/^\d+$/.test(value) || Number.parseInt(value, 10) <= 0)) {
errors.push(`${item.path}: '${key}' must be a positive integer string`);
}
}
for (const key of [
"automation_debug_chat_load_min_error_count",
"automation_debug_chat_load_min_ok_count",
"automation_debug_chat_load_min_provider_fault_count",
"automation_fake_provider_first_token_delay_ms",
"automation_fake_provider_chunk_delay_ms",
"automation_fake_provider_chunk_count",
"automation_fake_provider_fail_first_n",
"automation_fake_provider_fail_every_n",
]) {
const value = scalar(item.fields, key);
if (value && (!/^\d+$/.test(value) || Number.parseInt(value, 10) < 0)) {
errors.push(`${item.path}: '${key}' must be a non-negative integer string`);
}
}
for (const key of ["automation_debug_chat_load_max_error_rate", "automation_debug_chat_load_min_error_rate"]) {
const value = scalar(item.fields, key);
if (value && (!/^(?:0(?:\.\d+)?|1(?:\.0+)?)$/.test(value))) {
errors.push(`${item.path}: '${key}' must be a number string between 0 and 1`);
}
}
const fakeProviderFaultStatus = scalar(item.fields, "automation_fake_provider_fault_status");
if (fakeProviderFaultStatus) {
const parsed = Number.parseInt(fakeProviderFaultStatus, 10);
if (!/^\d+$/.test(fakeProviderFaultStatus) || parsed < 400 || parsed > 599) {
errors.push(`${item.path}: 'automation_fake_provider_fault_status' must be an HTTP 4xx or 5xx status string`);
}
}
const streamOutput = scalar(item.fields, "automation_stream_output");
if (streamOutput && !["0", "1", "false", "true"].includes(streamOutput)) {
errors.push(`${item.path}: 'automation_stream_output' must be one of 0, 1, false, or true`);
}
for (const key of [
"automation_debug_chat_load_stream",
"automation_debug_chat_load_reset",
"automation_debug_chat_load_fail_on_final_mismatch",
"automation_fake_provider_fail_after_first_chunk",
"automation_fake_provider_dynamic_response",
]) {
const value = scalar(item.fields, key);
if (value && !["0", "1", "false", "true"].includes(value)) {
errors.push(`${item.path}: '${key}' must be one of 0, 1, false, or true`);
}
}
const imageBase64Fixture = scalar(item.fields, "automation_image_base64_fixture");
if (imageBase64Fixture && !existsSync(join(root, imageBase64Fixture))) {
errors.push(`${item.path}: automation image fixture does not exist: ${imageBase64Fixture}`);
+27 -2
View File
@@ -9,7 +9,18 @@ export const requiredEnvKeys = [
];
export const caseModeValues = ["agent-browser", "probe"];
export const caseTypeValues = ["smoke", "regression", "feature", "provider", "exploratory"];
export const caseTypeValues = [
"smoke",
"regression",
"feature",
"provider",
"exploratory",
"contract",
"performance",
"reliability",
"chaos",
"security",
];
export const casePriorityValues = ["p0", "p1", "p2"];
export const caseRiskValues = ["low", "medium", "high"];
export const caseEvidenceValues = [
@@ -21,10 +32,24 @@ export const caseEvidenceValues = [
"frontend_log",
"api_diagnostic",
"filesystem",
"metrics",
"trace",
"profile",
"resource_log",
];
export const testResultStatusValues = ["pass", "fail", "blocked", "env_issue", "flaky"];
export const troubleshootingCategoryValues = ["product", "env_issue", "external_dependency", "blocked", "flaky"];
export const suiteTypeValues = ["smoke", "regression", "release_gate", "exploratory"];
export const suiteTypeValues = [
"smoke",
"regression",
"release_gate",
"exploratory",
"contract",
"performance",
"reliability",
"chaos",
"security",
];
export const suiteRequiredStrings = ["id", "title", "description", "type", "priority"];
export const suiteRequiredLists = ["tags", "cases"];
+20
View File
@@ -91,6 +91,7 @@ export type AutomationResultEvidence = {
path?: string;
result?: string;
reason?: string;
duration_ms?: number;
started_at?: string;
started_at_local?: string;
finished_at?: string;
@@ -98,6 +99,9 @@ export type AutomationResultEvidence = {
url?: string;
prompt?: string;
expected_text?: string;
metrics_summary?: Record<string, unknown>;
thresholds_summary?: Record<string, unknown>;
artifacts?: Record<string, unknown>;
};
type MutableScanState = {
@@ -594,6 +598,18 @@ function stringField(data: Record<string, unknown>, key: string): string | undef
return typeof value === "string" && value.trim() ? value : undefined;
}
function numberField(data: Record<string, unknown>, key: string): number | undefined {
const value = data[key];
return typeof value === "number" && Number.isFinite(value) ? value : undefined;
}
function objectField(data: Record<string, unknown>, key: string): Record<string, unknown> | undefined {
const value = data[key];
return value && typeof value === "object" && !Array.isArray(value)
? value as Record<string, unknown>
: undefined;
}
function evidenceDirFromOptions(options: Record<string, string | boolean>): string | undefined {
const explicit = typeof options["evidence-dir"] === "string" ? options["evidence-dir"] : undefined;
if (explicit) return resolve(explicit);
@@ -628,6 +644,7 @@ export function readAutomationResultEvidence(options: Record<string, string | bo
path: resultPath,
result: stringField(result, "status"),
reason: stringField(result, "reason"),
duration_ms: numberField(result, "duration_ms"),
started_at: stringField(result, "started_at"),
started_at_local: stringField(result, "started_at_local"),
finished_at: stringField(result, "finished_at"),
@@ -635,6 +652,9 @@ export function readAutomationResultEvidence(options: Record<string, string | bo
url: stringField(result, "url"),
prompt: redactSecrets(stringField(result, "prompt") ?? ""),
expected_text: stringField(result, "expected_text"),
metrics_summary: objectField(result, "metrics_summary"),
thresholds_summary: objectField(result, "thresholds_summary"),
artifacts: objectField(result, "artifacts"),
};
} catch (error) {
return { status: "invalid", path: resultPath, reason: String(error) };
+26
View File
@@ -114,6 +114,32 @@ export function automationEnvDefaults(item: StructuredItem, env: EnvSource = pro
["automation_expected_runner_id", "LANGBOT_E2E_EXPECTED_RUNNER_ID"],
["automation_reset_debug_chat", "LANGBOT_E2E_RESET_DEBUG_CHAT"],
["automation_debug_chat_session_type", "LANGBOT_E2E_DEBUG_CHAT_SESSION_TYPE"],
["automation_debug_chat_response_p95_ms", "LANGBOT_E2E_DEBUG_CHAT_RESPONSE_P95_MS"],
["automation_debug_chat_max_error_rate", "LANGBOT_E2E_DEBUG_CHAT_MAX_ERROR_RATE"],
["automation_debug_chat_load_requests", "LANGBOT_DEBUG_CHAT_LOAD_REQUESTS"],
["automation_debug_chat_load_concurrency", "LANGBOT_DEBUG_CHAT_LOAD_CONCURRENCY"],
["automation_debug_chat_load_timeout_ms", "LANGBOT_DEBUG_CHAT_LOAD_TIMEOUT_MS"],
["automation_debug_chat_load_response_p95_ms", "LANGBOT_DEBUG_CHAT_LOAD_RESPONSE_P95_MS"],
["automation_debug_chat_load_first_response_p95_ms", "LANGBOT_DEBUG_CHAT_LOAD_FIRST_RESPONSE_P95_MS"],
["automation_debug_chat_load_max_error_rate", "LANGBOT_DEBUG_CHAT_LOAD_MAX_ERROR_RATE"],
["automation_debug_chat_load_min_error_rate", "LANGBOT_DEBUG_CHAT_LOAD_MIN_ERROR_RATE"],
["automation_debug_chat_load_min_error_count", "LANGBOT_DEBUG_CHAT_LOAD_MIN_ERROR_COUNT"],
["automation_debug_chat_load_min_ok_count", "LANGBOT_DEBUG_CHAT_LOAD_MIN_OK_COUNT"],
["automation_debug_chat_load_min_provider_fault_count", "LANGBOT_DEBUG_CHAT_LOAD_MIN_PROVIDER_FAULT_COUNT"],
["automation_debug_chat_load_expected_prefix", "LANGBOT_DEBUG_CHAT_LOAD_EXPECTED_PREFIX"],
["automation_debug_chat_load_prompt_template", "LANGBOT_DEBUG_CHAT_LOAD_PROMPT_TEMPLATE"],
["automation_debug_chat_load_stream", "LANGBOT_DEBUG_CHAT_LOAD_STREAM"],
["automation_debug_chat_load_reset", "LANGBOT_DEBUG_CHAT_LOAD_RESET"],
["automation_debug_chat_load_fail_on_final_mismatch", "LANGBOT_DEBUG_CHAT_LOAD_FAIL_ON_FINAL_MISMATCH"],
["automation_fake_provider_response_text", "LANGBOT_FAKE_PROVIDER_RESPONSE_TEXT"],
["automation_fake_provider_first_token_delay_ms", "LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS"],
["automation_fake_provider_chunk_delay_ms", "LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS"],
["automation_fake_provider_chunk_count", "LANGBOT_FAKE_PROVIDER_CHUNK_COUNT"],
["automation_fake_provider_fail_first_n", "LANGBOT_FAKE_PROVIDER_FAIL_FIRST_N"],
["automation_fake_provider_fail_every_n", "LANGBOT_FAKE_PROVIDER_FAIL_EVERY_N"],
["automation_fake_provider_fault_status", "LANGBOT_FAKE_PROVIDER_FAULT_STATUS"],
["automation_fake_provider_fail_after_first_chunk", "LANGBOT_FAKE_PROVIDER_FAIL_AFTER_FIRST_CHUNK"],
["automation_fake_provider_dynamic_response", "LANGBOT_FAKE_PROVIDER_DYNAMIC_RESPONSE"],
["automation_filesystem_checks_json", "LANGBOT_E2E_FILESYSTEM_CHECKS_JSON"],
["automation_plugin_package", "LANGBOT_E2E_PLUGIN_PACKAGE"],
["automation_expected_plugin_id", "LANGBOT_E2E_EXPECTED_PLUGIN_ID"],
+159 -1
View File
@@ -1,6 +1,6 @@
import assert from "node:assert/strict";
import { test } from "node:test";
import { appendFileSync, existsSync, mkdtempSync, mkdirSync, readFileSync, rmSync, writeFileSync } from "node:fs";
import { appendFileSync, chmodSync, existsSync, mkdtempSync, mkdirSync, readFileSync, rmSync, writeFileSync } from "node:fs";
import { spawnSync } from "node:child_process";
import { tmpdir } from "node:os";
import { join } from "node:path";
@@ -676,6 +676,82 @@ test("suite run JSON captures failed case output", () => {
}
});
test("suite run preserves classified env_issue automation results", () => {
const tmp = mkdtempSync(join(tmpdir(), "lbs-suite-run-env-issue-"));
try {
const skillDir = join(tmp, "skills", "langbot-testing");
const casesDir = join(skillDir, "cases");
const suitesDir = join(skillDir, "suites");
const scriptsDir = join(tmp, "scripts");
mkdirSync(casesDir, { recursive: true });
mkdirSync(suitesDir, { recursive: true });
mkdirSync(scriptsDir, { recursive: true });
writeFileSync(join(skillDir, "SKILL.md"), "---\nname: langbot-testing\ndescription: Testing.\n---\n\n# Testing\n");
writeFileSync(join(tmp, "skills", ".env"), "");
writeFileSync(
join(casesDir, "env-case.yaml"),
[
"id: env-case",
"title: Env Case",
"mode: probe",
"area: qa",
"type: smoke",
"priority: p2",
"risk: low",
"ci_eligible: true",
"automation: scripts/env-issue.mjs",
"evidence_required:",
" - filesystem",
].join("\n"),
);
writeFileSync(
join(suitesDir, "mini.yaml"),
[
"id: mini",
"title: Mini",
"description: Mini suite.",
"type: smoke",
"priority: p2",
"tags:",
" - qa",
"cases:",
" - env-case",
].join("\n"),
);
writeFileSync(
join(scriptsDir, "env-issue.mjs"),
[
"import { mkdirSync, writeFileSync } from 'node:fs';",
"import { join } from 'node:path';",
"mkdirSync(process.env.LBS_EVIDENCE_DIR, { recursive: true });",
"const result = {",
" case_id: process.env.LBS_CASE_ID,",
" run_id: process.env.LBS_RUN_ID,",
" status: 'env_issue',",
" reason: 'backend not reachable',",
" evidence_collected: ['filesystem']",
"};",
"writeFileSync(join(process.env.LBS_EVIDENCE_DIR, 'result.json'), JSON.stringify(result));",
"writeFileSync(join(process.env.LBS_EVIDENCE_DIR, 'automation-result.json'), JSON.stringify({ ...result, source: 'automation' }));",
"process.exit(2);",
].join("\n"),
);
const result = capture(() => commandSuiteRun({
root: tmp,
args: ["suite", "run", "mini", "--run-id", "mini-run", "--evidence-dir", join(tmp, "evidence"), "--json"],
}));
assert.equal(result.code, 2);
const payload = JSON.parse(result.output);
assert.equal(payload.executions[0].status, "classified");
assert.equal(payload.report.status, "env_issue");
assert.equal(payload.report.execution_status, "ok");
} finally {
rmSync(tmp, { recursive: true, force: true });
}
});
test("suite run failure cannot be masked by stale pass result", () => {
const tmp = mkdtempSync(join(tmpdir(), "lbs-suite-run-stale-pass-"));
try {
@@ -1369,6 +1445,56 @@ test("env doctor does not require proxy variables", async () => {
}
});
test("env doctor reports missing socksio for active SOCKS proxy", async () => {
const tmp = mkdtempSync(join(tmpdir(), "lbs-env-doctor-socksio-"));
const originalAllProxy = process.env.ALL_PROXY;
const originalAllProxyLower = process.env.all_proxy;
try {
delete process.env.ALL_PROXY;
delete process.env.all_proxy;
const skillsDir = join(tmp, "skills");
const repoDir = join(tmp, "LangBot");
const webDir = join(repoDir, "web");
const venvBin = join(repoDir, ".venv", "bin");
const browserProfile = join(tmp, "browser-profile");
const chromium = join(tmp, "chromium");
mkdirSync(skillsDir, { recursive: true });
mkdirSync(webDir, { recursive: true });
mkdirSync(venvBin, { recursive: true });
mkdirSync(browserProfile, { recursive: true });
writeFileSync(chromium, "");
const python = join(venvBin, "python");
writeFileSync(python, "#!/bin/sh\nexit 1\n");
chmodSync(python, 0o755);
writeFileSync(
join(skillsDir, ".env"),
[
"LANGBOT_BACKEND_URL=http://127.0.0.1:59996",
"LANGBOT_FRONTEND_URL=http://127.0.0.1:59996",
"LANGBOT_DEV_FRONTEND_URL=http://127.0.0.1:59996",
`LANGBOT_REPO=${repoDir}`,
`LANGBOT_WEB_REPO=${webDir}`,
`LANGBOT_BROWSER_PROFILE=${browserProfile}`,
`LANGBOT_CHROMIUM_EXECUTABLE=${chromium}`,
"ALL_PROXY=socks5://127.0.0.1:7890",
].join("\n"),
);
const result = await captureAsync(() => commandEnvDoctor({ root: tmp, args: ["env", "doctor"] }));
assert.equal(result.code, 1);
assert.match(result.output, /FAIL: SOCKS proxy ALL_PROXY is configured/);
assert.match(result.output, /cannot import socksio/);
assert.match(result.output, /-m pip install socksio/);
} finally {
if (originalAllProxy === undefined) delete process.env.ALL_PROXY;
else process.env.ALL_PROXY = originalAllProxy;
if (originalAllProxyLower === undefined) delete process.env.all_proxy;
else process.env.all_proxy = originalAllProxyLower;
rmSync(tmp, { recursive: true, force: true });
}
});
test("env show redacts secret-like values by default", () => {
const tmp = mkdtempSync(join(tmpdir(), "lbs-env-show-redact-"));
try {
@@ -2521,6 +2647,38 @@ test("test report renders a reusable evidence template", () => {
assert.match(result.output, /no log files provided/);
});
test("test report promotes loaded automation evidence into result section", () => {
const tmp = mkdtempSync(join(tmpdir(), "lbs-report-automation-"));
try {
writeFileSync(
join(tmp, "automation-result.json"),
JSON.stringify({
status: "pass",
reason: "latency thresholds passed",
url: "http://127.0.0.1:5300",
artifacts: { metrics_json: join(tmp, "metrics.json") },
}),
);
const result = capture(() => commandTestReport(ctx([
"test",
"report",
"langbot-live-backend-latency",
"--evidence-dir",
tmp,
"--no-auto-log",
])));
assert.equal(result.code, 0);
assert.match(result.output, /## Result\n- result: pass\n- reason: latency thresholds passed/);
assert.match(result.output, /- target_tested: http:\/\/127\.0\.0\.1:5300/);
assert.doesNotMatch(result.output, /target_tested: TODO/);
assert.match(result.output, /## Automation Result/);
} finally {
rmSync(tmp, { recursive: true, force: true });
}
});
test("validate rejects dangling case references and missing automation scripts", () => {
const tmp = mkdtempSync(join(tmpdir(), "lbs-validate-strict-"));
try {
@@ -1,6 +1,9 @@
from __future__ import annotations
from langbot.pkg.utils import constants
from .. import group
from .box_visibility import should_hide_box_runtime_status
@group.group_class('box', '/api/v1/box')
@@ -9,6 +12,7 @@ class BoxRouterGroup(group.RouterGroup):
@self.route('/status', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
async def _() -> str:
status = await self.ap.box_service.get_status()
status['hidden'] = should_hide_box_runtime_status(constants.edition, status.get('enabled'))
return self.success(data=status)
@self.route('/sessions', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
@@ -0,0 +1,5 @@
from __future__ import annotations
def should_hide_box_runtime_status(edition: str, box_enabled: bool | None) -> bool:
return edition == 'cloud' and box_enabled is False
@@ -1,3 +1,5 @@
import base64
import quart
from .. import group
@@ -30,6 +32,50 @@ class SurveyRouterGroup(group.RouterGroup):
return self.fail(2, 'Failed to submit response')
return self.fail(3, 'Survey not available')
@self.route('/feedback', methods=['POST'], auth_type=group.AuthType.USER_TOKEN)
async def _feedback(user_email: str) -> str:
"""Submit on-demand user feedback from the sidebar."""
json_data = await quart.request.get_json(silent=True) or {}
content = str(json_data.get('content', '')).strip()
attachments = json_data.get('attachments', [])
if not content:
return self.fail(1, 'content required')
if len(content) > 5000:
return self.fail(2, 'content too long')
if not isinstance(attachments, list):
return self.fail(3, 'attachments must be an array')
if len(attachments) > 3:
return self.fail(4, 'too many attachments')
normalized_attachments = []
for item in attachments:
if not isinstance(item, dict):
continue
data_url = str(item.get('data_url', ''))
mime_type = str(item.get('mime_type', ''))[:128]
name = str(item.get('name', ''))[:255]
if not data_url.startswith('data:image/'):
continue
try:
payload = data_url.split(',', 1)[1]
if len(base64.b64decode(payload, validate=True)) > 1024 * 1024:
return self.fail(5, 'attachment too large')
except Exception:
return self.fail(5, 'attachment too large')
normalized_attachments.append({'name': name, 'mime_type': mime_type, 'data_url': data_url})
if self.ap.survey:
ok = await self.ap.survey.submit_feedback(
content=content,
attachments=normalized_attachments,
user_email=user_email,
)
if ok:
return self.success()
return self.fail(6, 'Failed to submit feedback')
return self.fail(7, 'Survey not available')
@self.route('/dismiss', methods=['POST'], auth_type=group.AuthType.USER_TOKEN)
async def _dismiss() -> str:
"""Dismiss survey."""
+16 -16
View File
@@ -20,6 +20,15 @@ class UserService:
def __init__(self, ap: app.Application) -> None:
self.ap = ap
self._create_user_lock = asyncio.Lock()
self._password_hash_lock = asyncio.Semaphore(1)
async def _hash_password(self, password: str) -> str:
async with self._password_hash_lock:
return await asyncio.to_thread(argon2.PasswordHasher().hash, password)
async def _verify_password(self, hashed_password: str, password: str) -> None:
async with self._password_hash_lock:
await asyncio.to_thread(argon2.PasswordHasher().verify, hashed_password, password)
async def is_initialized(self) -> bool:
result = await self.ap.persistence_mgr.execute_async(sqlalchemy.select(user.User).limit(1))
@@ -28,9 +37,7 @@ class UserService:
return result_list is not None and len(result_list) > 0
async def create_user(self, user_email: str, password: str) -> None:
ph = argon2.PasswordHasher()
hashed_password = ph.hash(password)
hashed_password = await self._hash_password(password)
await self.ap.persistence_mgr.execute_async(
sqlalchemy.insert(user.User).values(user=user_email, password=hashed_password, account_type='local')
@@ -69,9 +76,7 @@ class UserService:
if not user_obj.password:
raise ValueError('请使用 Space 账户登录')
ph = argon2.PasswordHasher()
ph.verify(user_obj.password, password)
await self._verify_password(user_obj.password, password)
return await self.generate_jwt_token(user_email)
@@ -93,17 +98,13 @@ class UserService:
return jwt.decode(token, jwt_secret, algorithms=['HS256'])['user']
async def reset_password(self, user_email: str, new_password: str) -> None:
ph = argon2.PasswordHasher()
hashed_password = ph.hash(new_password)
hashed_password = await self._hash_password(new_password)
await self.ap.persistence_mgr.execute_async(
sqlalchemy.update(user.User).where(user.User.user == user_email).values(password=hashed_password)
)
async def change_password(self, user_email: str, current_password: str, new_password: str) -> None:
ph = argon2.PasswordHasher()
user_obj = await self.get_user_by_email(user_email)
if user_obj is None:
raise ValueError('User not found')
@@ -111,9 +112,9 @@ class UserService:
if not user_obj.password:
raise ValueError('No local password set, please set a password first')
ph.verify(user_obj.password, current_password)
await self._verify_password(user_obj.password, current_password)
hashed_password = ph.hash(new_password)
hashed_password = await self._hash_password(new_password)
await self.ap.persistence_mgr.execute_async(
sqlalchemy.update(user.User).where(user.User.user == user_email).values(password=hashed_password)
@@ -232,7 +233,6 @@ class UserService:
async def set_password(self, user_email: str, new_password: str, current_password: str | None = None) -> None:
"""Set or change password for a user"""
ph = argon2.PasswordHasher()
user_obj = await self.get_user_by_email(user_email)
if user_obj is None:
@@ -243,9 +243,9 @@ class UserService:
if has_password:
if not current_password:
raise ValueError('Current password is required')
ph.verify(user_obj.password, current_password)
await self._verify_password(user_obj.password, current_password)
hashed_password = ph.hash(new_password)
hashed_password = await self._hash_password(new_password)
await self.ap.persistence_mgr.execute_async(
sqlalchemy.update(user.User).where(user.User.user == user_email).values(password=hashed_password)
)
+5 -2
View File
@@ -82,7 +82,6 @@ class BoxService:
return self._enabled
async def initialize(self):
self._ensure_default_workspace()
if not self._enabled:
# Disabled by config: do NOT connect to a remote runtime, do NOT
# fork a stdio subprocess. Every consumer of box_service should
@@ -99,6 +98,7 @@ class BoxService:
await self._runtime_connector.initialize()
else:
await self.client.initialize()
self._ensure_default_workspace()
self._available = True
self._connector_error = ''
self.ap.logger.info(
@@ -1152,6 +1152,9 @@ class BoxService:
if self.default_workspace is None:
return
if not self.shares_filesystem_with_box:
return
if os.path.isdir(self.default_workspace):
return
@@ -1176,7 +1179,7 @@ class BoxService:
return
host_path = os.path.realpath(spec.host_path)
if not os.path.isdir(host_path):
if self.shares_filesystem_with_box and not os.path.isdir(host_path):
raise BoxValidationError('host_path must point to an existing directory on the host')
if not self.allowed_mount_roots:
+43 -3
View File
@@ -159,6 +159,21 @@ class SurveyManager:
"""Clear the pending survey (after user responds or dismisses)."""
self._pending_survey = None
async def _build_base_metadata(self, user_email: str | None = None) -> dict:
metadata = {
'version': constants.semantic_version,
'instance_id': constants.instance_id,
}
if user_email:
metadata['login_account'] = user_email
try:
user_obj = await self.ap.user_service.get_user_by_email(user_email)
metadata['account_type'] = getattr(user_obj, 'account_type', '') or 'local'
metadata['space_account_uuid'] = getattr(user_obj, 'space_account_uuid', '') or ''
except Exception:
pass
return metadata
async def submit_response(self, survey_id: str, answers: dict, completed: bool = True) -> bool:
"""Submit a survey response to Space."""
if not self._is_space_configured():
@@ -169,9 +184,7 @@ class SurveyManager:
'survey_id': survey_id,
'instance_id': constants.instance_id,
'answers': answers,
'metadata': {
'version': constants.semantic_version,
},
'metadata': await self._build_base_metadata(),
'completed': completed,
}
async with httpx.AsyncClient(timeout=httpx.Timeout(10)) as client:
@@ -183,6 +196,33 @@ class SurveyManager:
self.ap.logger.warning(f'Failed to submit survey response: {e}')
return False
async def submit_feedback(
self,
content: str,
attachments: list[dict],
user_email: str | None = None,
) -> bool:
"""Submit an on-demand user feedback item to Space."""
if not self._is_space_configured():
return False
try:
url = f'{self._space_url}/api/v1/survey/feedback'
metadata = await self._build_base_metadata(user_email)
payload = {
'instance_id': constants.instance_id,
'content': content,
'attachments': attachments,
'metadata': metadata,
}
async with httpx.AsyncClient(timeout=httpx.Timeout(30)) as client:
resp = await client.post(url, json=payload)
if resp.status_code == 200:
return True
self.ap.logger.warning(f'Failed to submit feedback: {resp.status_code} {resp.text[:200]}')
except Exception as e:
self.ap.logger.warning(f'Failed to submit feedback: {e}')
return False
async def dismiss_survey(self, survey_id: str) -> bool:
"""Dismiss a survey."""
if not self._is_space_configured():
+2
View File
@@ -144,6 +144,8 @@ box:
- './data/box'
- '/tmp'
workspace_quota_mb: null # Optional disk quota override (>= 0). null = profile default.
docker:
cpu_limit_enabled: true # When false, Docker sandbox containers are started without --cpus. Memory and PID limits still apply.
e2b:
api_key: '' # Can also be set via E2B_API_KEY env var.
api_url: '' # Custom API URL for self-hosted deployments.
@@ -0,0 +1,18 @@
from __future__ import annotations
import pytest
from langbot.pkg.api.http.controller.groups.box_visibility import should_hide_box_runtime_status
@pytest.mark.parametrize(
('edition', 'box_enabled', 'expected'),
[
('cloud', False, True),
('cloud', True, False),
('cloud', None, False),
('community', False, False),
],
)
def test_should_hide_box_runtime_status(edition, box_enabled, expected):
assert should_hide_box_runtime_status(edition, box_enabled) is expected
+27
View File
@@ -256,6 +256,31 @@ class TestSharesFilesystemWithBox:
assert service.shares_filesystem_with_box is False
def test_separated_box_runtime_does_not_create_default_workspace_in_langbot(tmp_path):
logger = Mock()
runtime = BoxRuntime(logger=logger, backends=[FakeBackend(logger)], session_ttl_sec=300)
host_root = tmp_path / 'box'
service = BoxService(make_app(logger, host_root=str(host_root)), client=_InProcessBoxRuntimeClient(logger, runtime))
service._shares_filesystem_with_box_override = False
service._ensure_default_workspace()
assert not (host_root / 'default').exists()
def test_separated_box_runtime_allows_box_owned_missing_host_path(tmp_path):
logger = Mock()
runtime = BoxRuntime(logger=logger, backends=[FakeBackend(logger)], session_ttl_sec=300)
host_root = tmp_path / 'box'
service = BoxService(make_app(logger, host_root=str(host_root)), client=_InProcessBoxRuntimeClient(logger, runtime))
service._shares_filesystem_with_box_override = False
spec = service.build_spec({'cmd': 'echo hi', 'session_id': 'missing-host-path'})
assert spec.host_path == str(host_root / 'default')
assert not (host_root / 'default').exists()
@pytest.mark.asyncio
async def test_box_service_get_sessions_delegates_to_client():
client = Mock()
@@ -500,6 +525,7 @@ async def test_box_service_creates_default_workspace_on_initialize(tmp_path):
app = make_app(logger, [str(allowed_root)])
app.instance_config.data['box']['local']['default_workspace'] = str(default_workspace)
service = BoxService(app, client=_InProcessBoxRuntimeClient(logger, runtime))
service._shares_filesystem_with_box_override = True
await service.initialize()
@@ -514,6 +540,7 @@ async def test_box_service_derives_workspace_and_allowed_root_from_host_root(tmp
shared_root = tmp_path / 'shared-box-root'
app = make_app(logger, host_root=str(shared_root))
service = BoxService(app, client=_InProcessBoxRuntimeClient(logger, runtime))
service._shares_filesystem_with_box_override = True
await service.initialize()
@@ -0,0 +1,206 @@
import { useCallback, useEffect, useRef, useState } from 'react';
import { ImagePlus, Loader2, Paperclip, Send, X } from 'lucide-react';
import { useTranslation } from 'react-i18next';
import { toast } from 'sonner';
import { Button } from '@/components/ui/button';
import { Textarea } from '@/components/ui/textarea';
import { httpClient } from '@/app/infra/http/HttpClient';
const MAX_ATTACHMENTS = 3;
const MAX_IMAGE_BYTES = 1024 * 1024;
type FeedbackAttachment = {
name: string;
mime_type: string;
data_url: string;
};
function readImageFile(file: File): Promise<FeedbackAttachment> {
return new Promise((resolve, reject) => {
if (!file.type.startsWith('image/')) {
reject(new Error('not_image'));
return;
}
if (file.size > MAX_IMAGE_BYTES) {
reject(new Error('too_large'));
return;
}
const reader = new FileReader();
reader.onload = () => {
const dataUrl = String(reader.result || '');
if (!dataUrl.startsWith('data:image/')) {
reject(new Error('not_image'));
return;
}
resolve({
name: file.name || 'pasted-image.png',
mime_type: file.type || 'image/png',
data_url: dataUrl,
});
};
reader.onerror = () => reject(reader.error || new Error('read_failed'));
reader.readAsDataURL(file);
});
}
const FEEDBACK_I18N_PREFIX = 'monitoring.feedback';
export function FeedbackPopoverContent({
onSubmitted,
}: {
onSubmitted?: () => void;
}) {
const { t } = useTranslation();
const tf = useCallback(
(key: string) => t(`${FEEDBACK_I18N_PREFIX}.${key}`),
[t],
);
const [content, setContent] = useState('');
const [attachments, setAttachments] = useState<FeedbackAttachment[]>([]);
const [submitting, setSubmitting] = useState(false);
const fileInputRef = useRef<HTMLInputElement>(null);
const addFiles = useCallback(
async (files: File[]) => {
const slots = MAX_ATTACHMENTS - attachments.length;
if (slots <= 0) {
toast.error(tf('tooManyImages'));
return;
}
const picked = files.slice(0, slots);
const next: FeedbackAttachment[] = [];
for (const file of picked) {
try {
next.push(await readImageFile(file));
} catch (error) {
const msg = error instanceof Error ? error.message : '';
toast.error(
msg === 'too_large' ? tf('imageTooLarge') : tf('imageOnly'),
);
}
}
if (next.length > 0) {
setAttachments((prev) => [...prev, ...next].slice(0, MAX_ATTACHMENTS));
}
},
[attachments.length, tf],
);
useEffect(() => {
const onPaste = (event: ClipboardEvent) => {
const files = Array.from(event.clipboardData?.files || []).filter(
(file) => file.type.startsWith('image/'),
);
if (files.length > 0) {
event.preventDefault();
void addFiles(files);
}
};
window.addEventListener('paste', onPaste);
return () => window.removeEventListener('paste', onPaste);
}, [addFiles]);
const handleSubmit = async () => {
const trimmed = content.trim();
if (!trimmed) {
toast.error(tf('contentRequired'));
return;
}
try {
setSubmitting(true);
await httpClient.submitFeedback({
content: trimmed,
attachments,
});
toast.success(tf('submitSuccess'));
setContent('');
setAttachments([]);
onSubmitted?.();
} catch {
toast.error(tf('submitFailed'));
} finally {
setSubmitting(false);
}
};
return (
<div className="space-y-3" onClick={(e) => e.stopPropagation()}>
<div>
<div className="text-sm font-medium">{tf('title')}</div>
<p className="mt-1 text-xs text-muted-foreground">
{tf('description')}
</p>
</div>
<Textarea
value={content}
onChange={(e) => setContent(e.target.value)}
placeholder={tf('placeholder')}
maxLength={5000}
className="min-h-32 resize-none text-sm"
/>
<div className="flex flex-wrap gap-2">
{attachments.map((item, index) => (
<div
key={`${item.name}-${index}`}
className="relative size-16 overflow-hidden rounded-md border"
>
<img
src={item.data_url}
alt={item.name}
className="h-full w-full object-cover"
/>
<button
type="button"
onClick={() =>
setAttachments((prev) => prev.filter((_, i) => i !== index))
}
className="absolute right-1 top-1 rounded-full bg-black/60 p-0.5 text-white"
aria-label={tf('removeImage')}
>
<X className="size-3" />
</button>
</div>
))}
</div>
<div className="flex items-center justify-between gap-2">
<div className="flex gap-2">
<input
ref={fileInputRef}
type="file"
accept="image/*"
multiple
className="hidden"
onChange={(e) => {
void addFiles(Array.from(e.target.files || []));
e.target.value = '';
}}
/>
<Button
type="button"
size="sm"
variant="outline"
onClick={() => fileInputRef.current?.click()}
>
<ImagePlus className="mr-1 size-4" />
{tf('attachImage')}
</Button>
</div>
<span className="flex items-center gap-1 text-xs text-muted-foreground">
<Paperclip className="size-3" />
{attachments.length}/{MAX_ATTACHMENTS}
</span>
</div>
<Button className="w-full" onClick={handleSubmit} disabled={submitting}>
{submitting ? (
<Loader2 className="mr-2 size-4 animate-spin" />
) : (
<Send className="mr-2 size-4" />
)}
{tf('submit')}
</Button>
<p className="text-[11px] leading-relaxed text-muted-foreground">
{tf('privacyHint')}
</p>
</div>
);
}
@@ -100,6 +100,7 @@ import {
} from '@/components/ui/popover';
import { cn } from '@/lib/utils';
import { useSidebarData, SidebarEntityItem } from './SidebarDataContext';
import { FeedbackPopoverContent } from './FeedbackPopover';
// Compare two version strings, returns true if v1 > v2
function compareVersions(v1: string, v2: string): boolean {
@@ -1569,6 +1570,7 @@ export default function HomeSidebar({
);
const [hasNewVersion, setHasNewVersion] = useState(false);
const [versionDialogOpen, setVersionDialogOpen] = useState(false);
const [feedbackOpen, setFeedbackOpen] = useState(false);
const [userEmail, setUserEmail] = useState<string>('');
const [starCount, setStarCount] = useState<number | null>(null);
const [userMenuOpen, setUserMenuOpen] = useState(false);
@@ -2041,10 +2043,8 @@ export default function HomeSidebar({
</DropdownMenuItem>
<DropdownMenuItem
onClick={() => {
window.open(
'https://github.com/langbot-app/LangBot/issues',
'_blank',
);
setUserMenuOpen(false);
setFeedbackOpen(true);
}}
>
<Lightbulb />
@@ -2096,6 +2096,18 @@ export default function HomeSidebar({
</SidebarFooter>
</Sidebar>
<Dialog open={feedbackOpen} onOpenChange={setFeedbackOpen}>
<DialogContent className="w-[calc(100vw-2rem)] sm:max-w-[380px]">
<DialogHeader className="sr-only">
<DialogTitle>{t('monitoring.feedback.title')}</DialogTitle>
<DialogDescription>
{t('monitoring.feedback.description')}
</DialogDescription>
</DialogHeader>
<FeedbackPopoverContent onSubmitted={() => setFeedbackOpen(false)} />
</DialogContent>
</Dialog>
<SettingsDialog
open={settingsOpen}
onOpenChange={handleSettingsOpenChange}
@@ -164,7 +164,7 @@ export default function TokenMonitoring({
}, [fetchStats]);
const chartData = useMemo(() => {
if (!stats) return [];
if (!stats || !Array.isArray(stats.timeseries)) return [];
return stats.timeseries.map((p) => ({
bucket: p.bucket,
input: p.input_tokens,
@@ -198,7 +198,7 @@ export default function TokenMonitoring({
);
}
if (!stats || stats.summary.total_calls === 0) {
if (!stats || !stats.summary || stats.summary.total_calls === 0) {
return (
<div className="bg-card rounded-xl border p-6">
<div className="h-[260px] flex flex-col items-center justify-center text-muted-foreground gap-2">
@@ -209,7 +209,8 @@ export default function TokenMonitoring({
);
}
const { summary, by_model } = stats;
const summary = stats.summary;
const by_model = Array.isArray(stats.by_model) ? stats.by_model : [];
return (
<div className="space-y-6">
@@ -65,14 +65,16 @@ export default function SystemStatusCard({
const fetchStatus = useCallback(async () => {
try {
const [plugin, box, sessions] = await Promise.all([
const [plugin, box] = await Promise.all([
httpClient.getPluginSystemStatus().catch(() => null),
httpClient.getBoxStatus().catch(() => null),
httpClient.getBoxSessions().catch(() => [] as BoxSessionInfo[]),
]);
const sessions = box?.hidden
? []
: await httpClient.getBoxSessions().catch(() => [] as BoxSessionInfo[]);
setPluginStatus(plugin);
setBoxStatus(box);
setBoxSessions(sessions);
setBoxSessions(Array.isArray(sessions) ? sessions : []);
} finally {
setLoading(false);
}
@@ -95,6 +97,7 @@ export default function SystemStatusCard({
: 'failed'
: null;
const boxOk = boxStatus ? boxStatus.available : null;
const hideBoxRuntime = boxStatus?.hidden === true;
// Box has three observable states: connected (ok), disabled by config
// (enabled = false → distinct gray dot + "disabled" hint), and configured
// but failed (red dot + connector_error). The dashboard must distinguish
@@ -152,11 +155,13 @@ export default function SystemStatusCard({
<Plug className="w-3.5 h-3.5 text-muted-foreground" />
<span className="text-sm">{t('monitoring.pluginRuntime')}</span>
</div>
<div className="flex items-center gap-2">
<StatusDot state={boxState} />
<Box className="w-3.5 h-3.5 text-muted-foreground" />
<span className="text-sm">{t('monitoring.boxRuntime')}</span>
</div>
{!hideBoxRuntime && (
<div className="flex items-center gap-2">
<StatusDot state={boxState} />
<Box className="w-3.5 h-3.5 text-muted-foreground" />
<span className="text-sm">{t('monitoring.boxRuntime')}</span>
</div>
)}
</CardContent>
</Card>
@@ -214,181 +219,189 @@ export default function SystemStatusCard({
</div>
</div>
<div className="border-t" />
{!hideBoxRuntime && (
<>
<div className="border-t" />
{/* Box Runtime */}
<div className="space-y-2">
<div className="flex items-center gap-2">
<Box className="w-4 h-4 text-muted-foreground" />
<span className="text-sm font-semibold">
{t('monitoring.boxRuntime')}
</span>
</div>
<div className="ml-6 text-sm space-y-1">
<div className="flex items-center gap-1.5">
{boxState === 'ok' ? (
<CircleCheck className="w-4 h-4 text-green-600" />
) : (
<CircleX
className={
boxState === 'disabled'
? 'w-4 h-4 text-muted-foreground'
: 'w-4 h-4 text-red-500'
}
/>
)}
<span
className={
boxState === 'ok'
? 'text-green-600 font-medium'
: boxState === 'disabled'
? 'text-muted-foreground font-medium'
: 'text-red-500 font-medium'
}
>
{boxState === 'ok'
? t('monitoring.connected')
: boxState === 'disabled'
? t('monitoring.disabled')
: t('monitoring.disconnected')}
</span>
</div>
{boxState === 'disabled' && (
<p className="text-muted-foreground text-xs">
{t('monitoring.boxDisabled')}
</p>
)}
{boxState === 'failed' && boxStatus?.connector_error && (
<p className="text-red-400 text-xs break-all">
{boxStatus.connector_error}
</p>
)}
{boxStatus && (
<div className="text-muted-foreground text-xs space-y-0.5">
{boxStatus.backend && (
<p>
{t('monitoring.boxBackend')}:{' '}
<span className="text-foreground font-mono">
{boxStatus.backend.name}
</span>
</p>
)}
<p>
{t('monitoring.boxProfile')}:{' '}
<span className="text-foreground font-mono">
{boxStatus.profile}
</span>
</p>
{boxOk && boxStatus.active_sessions !== undefined && (
<p>
{t('monitoring.boxSandboxes')}:{' '}
<span className="text-foreground font-mono">
{boxStatus.active_sessions}
</span>
</p>
)}
{/* Box Runtime */}
<div className="space-y-2">
<div className="flex items-center gap-2">
<Box className="w-4 h-4 text-muted-foreground" />
<span className="text-sm font-semibold">
{t('monitoring.boxRuntime')}
</span>
</div>
)}
{/* Active Sandboxes */}
{boxSessions.length > 0 && (
<div className="mt-3 space-y-2">
{boxSessions.map((session) => (
<div
key={session.session_id}
className="rounded-lg border p-3 space-y-2"
<div className="ml-6 text-sm space-y-1">
<div className="flex items-center gap-1.5">
{boxState === 'ok' ? (
<CircleCheck className="w-4 h-4 text-green-600" />
) : (
<CircleX
className={
boxState === 'disabled'
? 'w-4 h-4 text-muted-foreground'
: 'w-4 h-4 text-red-500'
}
/>
)}
<span
className={
boxState === 'ok'
? 'text-green-600 font-medium'
: boxState === 'disabled'
? 'text-muted-foreground font-medium'
: 'text-red-500 font-medium'
}
>
<div className="flex items-center gap-1.5 min-w-0">
<Container className="w-4 h-4 text-muted-foreground flex-shrink-0" />
<Tooltip>
<TooltipTrigger asChild>
<span className="font-mono font-semibold text-foreground truncate text-sm">
{session.session_id}
</span>
</TooltipTrigger>
<TooltipContent>
{session.session_id}
</TooltipContent>
</Tooltip>
</div>
<div className="grid grid-cols-2 gap-x-4 gap-y-1.5 text-xs">
<div className="flex items-center gap-1.5 text-muted-foreground min-w-0">
<Image className="w-3 h-3 flex-shrink-0" />
<Tooltip>
<TooltipTrigger asChild>
<span className="text-foreground font-mono truncate">
{session.image}
</span>
</TooltipTrigger>
<TooltipContent>{session.image}</TooltipContent>
</Tooltip>
</div>
<div className="flex items-center gap-1.5 text-muted-foreground">
<HardDrive className="w-3 h-3 flex-shrink-0" />
<span className="text-foreground">
{session.backend_name}
{boxState === 'ok'
? t('monitoring.connected')
: boxState === 'disabled'
? t('monitoring.disabled')
: t('monitoring.disconnected')}
</span>
</div>
{boxState === 'disabled' && (
<p className="text-muted-foreground text-xs">
{t('monitoring.boxDisabled')}
</p>
)}
{boxState === 'failed' && boxStatus?.connector_error && (
<p className="text-red-400 text-xs break-all">
{boxStatus.connector_error}
</p>
)}
{boxStatus && (
<div className="text-muted-foreground text-xs space-y-0.5">
{boxStatus.backend && (
<p>
{t('monitoring.boxBackend')}:{' '}
<span className="text-foreground font-mono">
{boxStatus.backend.name}
</span>
</div>
<div className="flex items-center gap-1.5 text-muted-foreground">
<Cpu className="w-3 h-3 flex-shrink-0" />
<span className="text-foreground">
{session.cpus} CPU / {session.memory_mb} MB
</p>
)}
<p>
{t('monitoring.boxProfile')}:{' '}
<span className="text-foreground font-mono">
{boxStatus.profile}
</span>
</p>
{boxOk && boxStatus.active_sessions !== undefined && (
<p>
{t('monitoring.boxSandboxes')}:{' '}
<span className="text-foreground font-mono">
{boxStatus.active_sessions}
</span>
</div>
<div className="flex items-center gap-1.5 text-muted-foreground">
<Network className="w-3 h-3 flex-shrink-0" />
<span className="text-foreground">
{session.network}
</span>
</div>
{session.host_path && (
<div className="flex items-center gap-1.5 text-muted-foreground col-span-2 min-w-0">
<FolderOpen className="w-3 h-3 flex-shrink-0" />
</p>
)}
</div>
)}
{/* Active Sandboxes */}
{boxSessions.length > 0 && (
<div className="mt-3 space-y-2">
{boxSessions.map((session) => (
<div
key={session.session_id}
className="rounded-lg border p-3 space-y-2"
>
<div className="flex items-center gap-1.5 min-w-0">
<Container className="w-4 h-4 text-muted-foreground flex-shrink-0" />
<Tooltip>
<TooltipTrigger asChild>
<span className="text-foreground font-mono truncate">
{session.host_path} : {session.mount_path}{' '}
<span className="text-muted-foreground">
({session.host_path_mode})
</span>
<span className="font-mono font-semibold text-foreground truncate text-sm">
{session.session_id}
</span>
</TooltipTrigger>
<TooltipContent>
{session.host_path} : {session.mount_path} (
{session.host_path_mode})
{session.session_id}
</TooltipContent>
</Tooltip>
</div>
)}
<div className="flex items-center gap-1.5 text-muted-foreground">
<Clock className="w-3 h-3 flex-shrink-0" />
<span>
{t('monitoring.boxSessionCreated')}:{' '}
<span className="text-foreground">
{new Date(
session.created_at,
).toLocaleString()}
</span>
</span>
<div className="grid grid-cols-2 gap-x-4 gap-y-1.5 text-xs">
<div className="flex items-center gap-1.5 text-muted-foreground min-w-0">
<Image className="w-3 h-3 flex-shrink-0" />
<Tooltip>
<TooltipTrigger asChild>
<span className="text-foreground font-mono truncate">
{session.image}
</span>
</TooltipTrigger>
<TooltipContent>
{session.image}
</TooltipContent>
</Tooltip>
</div>
<div className="flex items-center gap-1.5 text-muted-foreground">
<HardDrive className="w-3 h-3 flex-shrink-0" />
<span className="text-foreground">
{session.backend_name}
</span>
</div>
<div className="flex items-center gap-1.5 text-muted-foreground">
<Cpu className="w-3 h-3 flex-shrink-0" />
<span className="text-foreground">
{session.cpus} CPU / {session.memory_mb} MB
</span>
</div>
<div className="flex items-center gap-1.5 text-muted-foreground">
<Network className="w-3 h-3 flex-shrink-0" />
<span className="text-foreground">
{session.network}
</span>
</div>
{session.host_path && (
<div className="flex items-center gap-1.5 text-muted-foreground col-span-2 min-w-0">
<FolderOpen className="w-3 h-3 flex-shrink-0" />
<Tooltip>
<TooltipTrigger asChild>
<span className="text-foreground font-mono truncate">
{session.host_path} :{' '}
{session.mount_path}{' '}
<span className="text-muted-foreground">
({session.host_path_mode})
</span>
</span>
</TooltipTrigger>
<TooltipContent>
{session.host_path} :{' '}
{session.mount_path} (
{session.host_path_mode})
</TooltipContent>
</Tooltip>
</div>
)}
<div className="flex items-center gap-1.5 text-muted-foreground">
<Clock className="w-3 h-3 flex-shrink-0" />
<span>
{t('monitoring.boxSessionCreated')}:{' '}
<span className="text-foreground">
{new Date(
session.created_at,
).toLocaleString()}
</span>
</span>
</div>
<div className="flex items-center gap-1.5 text-muted-foreground">
<Clock className="w-3 h-3 flex-shrink-0" />
<span>
{t('monitoring.boxSessionLastUsed')}:{' '}
<span className="text-foreground">
{new Date(
session.last_used_at,
).toLocaleString()}
</span>
</span>
</div>
</div>
</div>
<div className="flex items-center gap-1.5 text-muted-foreground">
<Clock className="w-3 h-3 flex-shrink-0" />
<span>
{t('monitoring.boxSessionLastUsed')}:{' '}
<span className="text-foreground">
{new Date(
session.last_used_at,
).toLocaleString()}
</span>
</span>
</div>
</div>
))}
</div>
))}
)}
</div>
)}
</div>
</div>
</div>
</>
)}
</div>
</TooltipProvider>
</DialogContent>
@@ -34,14 +34,16 @@ export default function TrafficChart({
const { t } = useTranslation();
const chartData = useMemo(() => {
if (!messages.length && !llmCalls.length) {
const safeMessages = Array.isArray(messages) ? messages : [];
const safeLlmCalls = Array.isArray(llmCalls) ? llmCalls : [];
if (!safeMessages.length && !safeLlmCalls.length) {
return [];
}
// Combine all timestamps and find the range
const allTimestamps = [
...messages.map((m) => m.timestamp.getTime()),
...llmCalls.map((c) => c.timestamp.getTime()),
...safeMessages.map((m) => m.timestamp.getTime()),
...safeLlmCalls.map((c) => c.timestamp.getTime()),
];
if (allTimestamps.length === 0) return [];
@@ -99,7 +101,7 @@ export default function TrafficChart({
}
// Count messages per bucket
messages.forEach((msg) => {
safeMessages.forEach((msg) => {
const bucket =
Math.floor(msg.timestamp.getTime() / bucketSize) * bucketSize;
const point = buckets.get(bucket);
@@ -109,7 +111,7 @@ export default function TrafficChart({
});
// Count LLM calls per bucket
llmCalls.forEach((call) => {
safeLlmCalls.forEach((call) => {
const bucket =
Math.floor(call.timestamp.getTime() / bucketSize) * bucketSize;
const point = buckets.get(bucket);
@@ -92,18 +92,46 @@ export function useMonitoringData(filterState: FilterState) {
limit: 50,
});
const overview = response?.overview ?? {
total_messages: 0,
llm_calls: 0,
embedding_calls: 0,
model_calls: 0,
success_rate: 100,
active_sessions: 0,
};
const messages = Array.isArray(response?.messages)
? response.messages
: [];
const llmCalls = Array.isArray(response?.llmCalls)
? response.llmCalls
: [];
const embeddingCalls = Array.isArray(response?.embeddingCalls)
? response.embeddingCalls
: [];
const sessions = Array.isArray(response?.sessions)
? response.sessions
: [];
const errors = Array.isArray(response?.errors) ? response.errors : [];
const totalCount = response?.totalCount ?? {
messages: messages.length,
llmCalls: llmCalls.length,
embeddingCalls: embeddingCalls.length,
sessions: sessions.length,
errors: errors.length,
};
// Transform the response to match MonitoringData interface
const transformedData: MonitoringData = {
overview: {
totalMessages: response.overview.total_messages,
llmCalls: response.overview.llm_calls,
embeddingCalls: response.overview.embedding_calls || 0,
modelCalls:
response.overview.model_calls || response.overview.llm_calls,
successRate: response.overview.success_rate,
activeSessions: response.overview.active_sessions,
totalMessages: overview.total_messages,
llmCalls: overview.llm_calls,
embeddingCalls: overview.embedding_calls || 0,
modelCalls: overview.model_calls || overview.llm_calls,
successRate: overview.success_rate,
activeSessions: overview.active_sessions,
},
messages: response.messages.map(
messages: messages.map(
(msg: {
id: string;
timestamp: string;
@@ -136,7 +164,7 @@ export function useMonitoringData(filterState: FilterState) {
variables: msg.variables,
}),
),
llmCalls: response.llmCalls.map(
llmCalls: llmCalls.map(
(call: {
id: string;
timestamp: string;
@@ -173,7 +201,7 @@ export function useMonitoringData(filterState: FilterState) {
messageId: call.message_id,
}),
),
embeddingCalls: (response.embeddingCalls || []).map(
embeddingCalls: embeddingCalls.map(
(call: {
id: string;
timestamp: string;
@@ -208,7 +236,7 @@ export function useMonitoringData(filterState: FilterState) {
),
// Create merged modelCalls array from llmCalls and embeddingCalls
modelCalls: [] as ModelCall[], // Will be populated after transform
sessions: response.sessions.map(
sessions: sessions.map(
(session: {
session_id: string;
bot_id: string;
@@ -236,7 +264,7 @@ export function useMonitoringData(filterState: FilterState) {
userId: session.user_id,
}),
),
errors: response.errors.map(
errors: errors.map(
(error: {
id: string;
timestamp: string;
@@ -264,11 +292,11 @@ export function useMonitoringData(filterState: FilterState) {
}),
),
totalCount: {
messages: response.totalCount.messages,
llmCalls: response.totalCount.llmCalls,
embeddingCalls: response.totalCount.embeddingCalls || 0,
sessions: response.totalCount.sessions,
errors: response.totalCount.errors,
messages: totalCount.messages,
llmCalls: totalCount.llmCalls,
embeddingCalls: totalCount.embeddingCalls || 0,
sessions: totalCount.sessions,
errors: totalCount.errors,
},
};
+2
View File
@@ -373,6 +373,8 @@ export interface ApiRespPluginSystemStatus {
export interface ApiRespBoxStatus {
available: boolean;
/** UI hint: hide the Box runtime status surface for this deployment. */
hidden?: boolean;
/** Whether ``box.enabled`` is true in config. When false, the sandbox
* is deliberately disabled distinct from "configured but failed". */
enabled?: boolean;
+11
View File
@@ -1332,6 +1332,17 @@ export class BackendClient extends BaseHttpClient {
return this.post('/api/v1/survey/dismiss', { survey_id: surveyId });
}
public submitFeedback(data: {
content: string;
attachments?: Array<{
name: string;
mime_type: string;
data_url: string;
}>;
}): Promise<object> {
return this.post('/api/v1/survey/feedback', data);
}
// ============ Skills API ============
public getSkills(): Promise<ApiRespSkills> {
+17 -1
View File
@@ -35,7 +35,7 @@ const enUS = {
emptyPassword: 'Please enter your password',
language: 'Language',
helpDocs: 'Get Help',
featureRequest: 'Feature Request',
featureRequest: 'Feedback',
starOnGitHub: 'Star on GitHub',
joinDiscord: 'Join our Discord',
create: 'Create',
@@ -1362,6 +1362,22 @@ const enUS = {
inaccurateReasons: 'Inaccurate Reasons',
platform: 'Platform',
exportFeedback: 'Export Feedback',
description:
'Tell us what went wrong or what could be better. Instance UUID and login account are included for diagnosis.',
placeholder: 'Describe your suggestion, issue, or reproduction steps...',
attachImage: 'Add image',
screenshot: 'Screenshot',
submit: 'Submit feedback',
privacyHint:
'Do not include secrets, passwords, or private chat content.',
contentRequired: 'Please enter feedback first',
imageOnly: 'Only image attachments are supported',
imageTooLarge: 'Each image must be under 1MB',
tooManyImages: 'You can attach up to 3 images',
screenshotFailed: 'Screenshot failed. Try pasting or uploading an image.',
submitSuccess: 'Feedback submitted. Thanks!',
submitFailed: 'Failed to submit feedback. Please try again later.',
removeImage: 'Remove image',
},
queries: {
title: 'Queries',
+16
View File
@@ -1395,6 +1395,22 @@ const esES = {
inaccurateReasons: 'Razones de inexactitud',
platform: 'Plataforma',
exportFeedback: 'Exportar comentarios',
description:
'Tell us what went wrong or what could be better. Instance UUID and login account are included for diagnosis.',
placeholder: 'Describe your suggestion, issue, or reproduction steps...',
attachImage: 'Add image',
screenshot: 'Screenshot',
submit: 'Submit feedback',
privacyHint:
'Do not include secrets, passwords, or private chat content.',
contentRequired: 'Please enter feedback first',
imageOnly: 'Only image attachments are supported',
imageTooLarge: 'Each image must be under 1MB',
tooManyImages: 'You can attach up to 3 images',
screenshotFailed: 'Screenshot failed. Try pasting or uploading an image.',
submitSuccess: 'Feedback submitted. Thanks!',
submitFailed: 'Failed to submit feedback. Please try again later.',
removeImage: 'Remove image',
},
queries: {
title: 'Consultas',
+17 -1
View File
@@ -36,7 +36,7 @@ const jaJP = {
emptyPassword: 'パスワードを入力してください',
language: '言語',
helpDocs: 'ヘルプドキュメント',
featureRequest: '機能リクエスト',
featureRequest: 'フィードバック',
starOnGitHub: 'GitHubでStarする',
joinDiscord: 'Discord に参加',
create: '作成',
@@ -1368,6 +1368,22 @@ const jaJP = {
inaccurateReasons: '不正確な理由',
platform: 'プラットフォーム',
exportFeedback: 'フィードバックをエクスポート',
description:
'問題点や改善案を教えてください。診断のため、インスタンス UUID、ログインアカウント、ページ情報も送信されます。',
placeholder: '提案、問題、再現手順を入力してください...',
attachImage: '画像を追加',
screenshot: 'スクリーンショット',
submit: '送信',
privacyHint: '秘密鍵、パスワード、個人的な会話内容は含めないでください。',
contentRequired: 'フィードバック内容を入力してください',
imageOnly: '画像のみ添付できます',
imageTooLarge: '画像は 1 枚 2MB 未満にしてください',
tooManyImages: '画像は最大 3 枚まで添付できます',
screenshotFailed:
'スクリーンショットに失敗しました。貼り付けまたはアップロードを試してください。',
submitSuccess: 'フィードバックを送信しました。ありがとうございます!',
submitFailed: '送信に失敗しました。後でもう一度お試しください。',
removeImage: '画像を削除',
},
messageDetails: {
noData: 'このクエリにはLLM呼び出しやエラーがありません',
+16
View File
@@ -1371,6 +1371,22 @@ const ruRU = {
inaccurateReasons: 'Причины неточности',
platform: 'Платформа',
exportFeedback: 'Экспорт отзывов',
description:
'Tell us what went wrong or what could be better. Instance UUID and login account are included for diagnosis.',
placeholder: 'Describe your suggestion, issue, or reproduction steps...',
attachImage: 'Add image',
screenshot: 'Screenshot',
submit: 'Submit feedback',
privacyHint:
'Do not include secrets, passwords, or private chat content.',
contentRequired: 'Please enter feedback first',
imageOnly: 'Only image attachments are supported',
imageTooLarge: 'Each image must be under 1MB',
tooManyImages: 'You can attach up to 3 images',
screenshotFailed: 'Screenshot failed. Try pasting or uploading an image.',
submitSuccess: 'Feedback submitted. Thanks!',
submitFailed: 'Failed to submit feedback. Please try again later.',
removeImage: 'Remove image',
},
queries: {
title: 'Запросы',
+16
View File
@@ -1340,6 +1340,22 @@ const thTH = {
inaccurateReasons: 'เหตุผลที่ไม่ถูกต้อง',
platform: 'แพลตฟอร์ม',
exportFeedback: 'ส่งออกความคิดเห็น',
description:
'Tell us what went wrong or what could be better. Instance UUID and login account are included for diagnosis.',
placeholder: 'Describe your suggestion, issue, or reproduction steps...',
attachImage: 'Add image',
screenshot: 'Screenshot',
submit: 'Submit feedback',
privacyHint:
'Do not include secrets, passwords, or private chat content.',
contentRequired: 'Please enter feedback first',
imageOnly: 'Only image attachments are supported',
imageTooLarge: 'Each image must be under 1MB',
tooManyImages: 'You can attach up to 3 images',
screenshotFailed: 'Screenshot failed. Try pasting or uploading an image.',
submitSuccess: 'Feedback submitted. Thanks!',
submitFailed: 'Failed to submit feedback. Please try again later.',
removeImage: 'Remove image',
},
queries: {
title: 'คำค้นหา',
+16
View File
@@ -1364,6 +1364,22 @@ const viVN = {
inaccurateReasons: 'Lý do không chính xác',
platform: 'Nền tảng',
exportFeedback: 'Xuất phản hồi',
description:
'Tell us what went wrong or what could be better. Instance UUID and login account are included for diagnosis.',
placeholder: 'Describe your suggestion, issue, or reproduction steps...',
attachImage: 'Add image',
screenshot: 'Screenshot',
submit: 'Submit feedback',
privacyHint:
'Do not include secrets, passwords, or private chat content.',
contentRequired: 'Please enter feedback first',
imageOnly: 'Only image attachments are supported',
imageTooLarge: 'Each image must be under 1MB',
tooManyImages: 'You can attach up to 3 images',
screenshotFailed: 'Screenshot failed. Try pasting or uploading an image.',
submitSuccess: 'Feedback submitted. Thanks!',
submitFailed: 'Failed to submit feedback. Please try again later.',
removeImage: 'Remove image',
},
queries: {
title: 'Truy vấn',
+16 -1
View File
@@ -34,7 +34,7 @@ const zhHans = {
emptyPassword: '请输入密码',
language: '语言',
helpDocs: '帮助文档',
featureRequest: '需求建议',
featureRequest: '建议反馈',
starOnGitHub: '在 GitHub 上 Star',
joinDiscord: '加入 Discord 社区',
create: '创建',
@@ -1301,6 +1301,21 @@ const zhHans = {
inaccurateReasons: '不准确原因',
platform: '平台',
exportFeedback: '导出反馈',
description:
'告诉我们遇到的问题或想要的改进。提交时会附带实例 UUID 和登录账号,方便定位。',
placeholder: '请描述你的建议、问题或复现步骤...',
attachImage: '添加图片',
screenshot: '截图',
submit: '提交反馈',
privacyHint: '请勿提交敏感密钥、密码或私人聊天内容。',
contentRequired: '请先填写反馈内容',
imageOnly: '仅支持图片附件',
imageTooLarge: '单张图片不能超过 1MB',
tooManyImages: '最多添加 3 张图片',
screenshotFailed: '截图失败,请尝试粘贴或上传图片',
submitSuccess: '反馈已提交,感谢!',
submitFailed: '反馈提交失败,请稍后重试',
removeImage: '移除图片',
},
queries: {
title: '查询记录',
+15
View File
@@ -1300,6 +1300,21 @@ const zhHant = {
inaccurateReasons: '不準確原因',
platform: '平台',
exportFeedback: '匯出反饋',
description:
'告訴我們遇到的問題或想要的改進。提交時會附帶實例 UUID 和登入帳號,方便定位。',
placeholder: '請描述你的建議、問題或重現步驟...',
attachImage: '新增圖片',
screenshot: '截圖',
submit: '提交反饋',
privacyHint: '請勿提交敏感金鑰、密碼或私人聊天內容。',
contentRequired: '請先填寫反饋內容',
imageOnly: '僅支援圖片附件',
imageTooLarge: '單張圖片不能超過 1MB',
tooManyImages: '最多新增 3 張圖片',
screenshotFailed: '截圖失敗,請嘗試貼上或上傳圖片',
submitSuccess: '反饋已提交,感謝!',
submitFailed: '反饋提交失敗,請稍後再試',
removeImage: '移除圖片',
},
messageDetails: {
noData: '此查詢沒有LLM調用或錯誤記錄',