Compare commits

...

5 Commits

Author SHA1 Message Date
huanghuoguoguo 9b0f5b36f3 test(skills): add debug chat timing and isolation probes 2026-06-25 13:34:30 +08:00
huanghuoguoguo 7e36869494 test(skills): extend fake provider load profiles 2026-06-25 12:54:08 +08:00
huanghuoguoguo d59b49ec55 test(skills): add debug chat load gate 2026-06-25 11:48:59 +08:00
huanghuoguoguo 8749a9b56f test(skills): prepare user path performance gate 2026-06-25 10:07:04 +08:00
huanghuoguoguo 67437c2f5a Add performance and reliability QA gates 2026-06-25 00:07:37 +08:00
51 changed files with 6912 additions and 42 deletions
+1
View File
@@ -48,6 +48,7 @@ coverage.xml
.coverage .coverage
src/langbot/web/ src/langbot/web/
testsdk/ testsdk/
.qa/
# Build artifacts # Build artifacts
/dist /dist
+2 -1
View File
@@ -26,7 +26,7 @@ and LangBot's own Local Agent) working with the LangBot ecosystem.
## Quick start (for an AI agent) ## Quick start (for an AI agent)
1. Read this README, `AGENTS.md`, and `qa-agent-docs/` to understand the layout. 1. Read this README, `AGENTS.md`, and `docs/user-guide.md` to understand the layout.
2. Read `skills/.env` for shared local defaults. On a new machine, copy 2. Read `skills/.env` for shared local defaults. On a new machine, copy
`skills/.env.example` to `skills/.env.local` (gitignored) and override `skills/.env.example` to `skills/.env.local` (gitignored) and override
machine-specific values there. Never commit secrets. machine-specific values there. Never commit secrets.
@@ -48,6 +48,7 @@ bin/lbs env show # inspect resolved env defaults (redacted)
bin/lbs env doctor # diagnose local environment readiness bin/lbs env doctor # diagnose local environment readiness
bin/lbs case list --ready bin/lbs case list --ready
bin/lbs test plan <case-id> bin/lbs test plan <case-id>
bin/lbs suite plan langbot-debug-chat-load-gate
``` ```
## Maintenance rule ## Maintenance rule
+163
View File
@@ -0,0 +1,163 @@
# LangBot QA Skills User Guide
Use this guide as the first operational path after reading `README.md` and
`AGENTS.md`.
## 1. Configure Local Inputs
Read `skills/.env`, then create `skills/.env.local` for machine-local values.
Do not commit `.env.local`, browser profiles, reports, tokens, API keys, OAuth
state, or provider credentials.
Minimum local fields for live browser QA:
```bash
LANGBOT_REPO=/path/to/LangBot
LANGBOT_WEB_REPO=/path/to/LangBot/web
LANGBOT_BACKEND_URL=http://127.0.0.1:5300
LANGBOT_FRONTEND_URL=http://127.0.0.1:3000
LANGBOT_DEV_FRONTEND_URL=http://127.0.0.1:3000
LANGBOT_BROWSER_PROFILE=/path/to/langbot-browser-profile
LANGBOT_CHROMIUM_EXECUTABLE=/path/to/chromium-or-playwright-chrome
LANGBOT_E2E_LOGIN_USER=qa-local@example.com
```
`LANGBOT_E2E_LOGIN_USER` is a local QA account. The setup automation uses the
LangBot recovery key from the active checkout to initialize or refresh that
local account and write a browser `localStorage` token. It does not need the
user's GitHub or Space credentials.
## 2. Check Readiness
From `skills/`:
```bash
bin/lbs env show
bin/lbs env doctor
bin/lbs validate
bin/lbs index --check
```
`env doctor` should report reachable backend and frontend URLs before live
browser cases are run. Missing Space provider credentials are not a LangBot
product pass; classify them as `env_issue` and configure the local Space
provider before measuring Debug Chat performance.
## 3. Start Services
Start the backend from `LANGBOT_REPO`:
```bash
cd "$LANGBOT_REPO"
uv run main.py
```
Start the standalone frontend from `LANGBOT_WEB_REPO` and point it at the
backend:
```bash
cd "$LANGBOT_WEB_REPO"
VITE_API_BASE_URL="$LANGBOT_BACKEND_URL" pnpm dev --host 0.0.0.0
```
If `VITE_API_BASE_URL` is missing, browser tests can load the Vite page but send
API requests to the frontend port, which produces false UI failures.
## 4. Prepare User-Path Fixtures
For local-agent Debug Chat cases and the user-path performance gate:
```bash
node scripts/e2e/ensure-local-agent-pipeline.mjs --write-env
```
The script:
- refreshes the local QA login and browser token;
- marks the local wizard as skipped;
- creates or updates a local QA pipeline;
- scans Space LLM models, tests candidates, and switches to the first working
Space model with tested fallback models;
- writes `LANGBOT_PIPELINE_URL`, `LANGBOT_PIPELINE_NAME`, and local-agent
pipeline/model variables into `skills/.env.local`;
- returns `env_issue` when no Space model can be scanned or tested.
Useful model controls:
```bash
LANGBOT_E2E_MODEL_TEST_LIMIT=8
LANGBOT_E2E_MODEL_FALLBACK_COUNT=3
LANGBOT_E2E_SKIP_MODEL_UUIDS=uuid-a,uuid-b
LANGBOT_E2E_SKIP_MODEL_NAMES=model-a,model-b
LANGBOT_E2E_SCAN_SPACE_MODELS=true
```
The setup writes a current-runtime compatibility `max-round` value into the
pipeline config because this backend still reads that field directly during
message truncation. Do not treat it as a long-term QA contract.
## 5. Run Gates
Fast contract gate, no live service required:
```bash
bin/lbs suite run langbot-performance-contract-gate --run-id langbot-contract-local
```
Live backend gate:
```bash
bin/lbs suite run langbot-live-backend-gate --run-id langbot-backend-local
```
Browser-visible user-path performance gate:
```bash
bin/lbs suite plan langbot-user-path-performance-gate
bin/lbs suite run langbot-user-path-performance-gate --run-id langbot-user-path-local --include-manual-check
```
Controlled Debug Chat message-path load gate:
```bash
bin/lbs suite plan langbot-debug-chat-load-gate
bin/lbs test run langbot-fake-provider-debug-chat-load --run-id langbot-fake-load-local
bin/lbs test run langbot-fake-provider-debug-chat-slow-load --run-id langbot-fake-slow-local
bin/lbs test run langbot-fake-provider-debug-chat-cross-pipeline-isolation --run-id langbot-fake-cross-pipeline-local
bin/lbs test run langbot-fake-provider-debug-chat-fault-recovery --run-id langbot-fake-fault-local
bin/lbs test run langbot-space-debug-chat-concurrency-smoke --run-id langbot-space-smoke-local
```
Start with `langbot-fake-provider-debug-chat-load`. It launches a local
OpenAI-compatible fake provider, creates the matching provider/model/pipeline,
then sends concurrent WebSocket Debug Chat messages through the real backend.
Use `langbot-fake-provider-debug-chat-slow-load` to measure the same path under
deterministic streaming latency. Use
`langbot-fake-provider-debug-chat-cross-pipeline-isolation` to verify that
concurrent Debug Chat traffic on two pipelines does not leak assistant
responses across pipeline boundaries. Use
`langbot-fake-provider-debug-chat-fault-recovery` to inject bounded provider
HTTP failures and confirm later Debug Chat requests recover.
Use `langbot-space-debug-chat-concurrency-smoke` only as a low-volume live
provider smoke; it includes Space/model/network latency and should be compared
against the fake-provider baseline before attributing failures to LangBot.
`manual_check` means the agent must confirm the declared preconditions for that
run window. When setup automation is declared, run output may stop early with
`env_issue`; fix that environment input before treating the product path as
measured.
## 6. Read Results
Suite reports live under `skills/reports/`. Evidence lives under
`skills/reports/evidence/<run-id>/`.
For performance cases, inspect:
- `metrics.json` for p50/p95/p99, error rate, and total duration;
- `automation-result.json` for threshold decisions and artifacts;
- `console.log` and `network.log` for frontend/API failures;
- backend logs for provider, runner, WebSocket, or persistence failures.
Do not call a user-path performance result a LangBot overhead regression until
provider/tool/network time has been separated or ruled out.
+109 -2
View File
@@ -48,7 +48,18 @@
}, },
"type": { "type": {
"type": "string", "type": "string",
"enum": ["smoke", "regression", "feature", "provider", "exploratory"] "enum": [
"smoke",
"regression",
"feature",
"provider",
"exploratory",
"contract",
"performance",
"reliability",
"chaos",
"security"
]
}, },
"priority": { "priority": {
"type": "string", "type": "string",
@@ -102,7 +113,11 @@
"backend_log", "backend_log",
"frontend_log", "frontend_log",
"api_diagnostic", "api_diagnostic",
"filesystem" "filesystem",
"metrics",
"trace",
"profile",
"resource_log"
] ]
}, },
"minItems": 1 "minItems": 1
@@ -188,9 +203,101 @@
"type": "string", "type": "string",
"enum": ["person", "group"] "enum": ["person", "group"]
}, },
"automation_debug_chat_response_p95_ms": {
"type": "string"
},
"automation_debug_chat_max_error_rate": {
"type": "string"
},
"automation_debug_chat_load_requests": {
"type": "string"
},
"automation_debug_chat_load_concurrency": {
"type": "string"
},
"automation_debug_chat_load_timeout_ms": {
"type": "string"
},
"automation_debug_chat_load_response_p95_ms": {
"type": "string"
},
"automation_debug_chat_load_first_response_p95_ms": {
"type": "string"
},
"automation_debug_chat_load_max_error_rate": {
"type": "string"
},
"automation_debug_chat_load_min_error_rate": {
"type": "string"
},
"automation_debug_chat_load_min_error_count": {
"type": "string"
},
"automation_debug_chat_load_min_ok_count": {
"type": "string"
},
"automation_debug_chat_load_min_provider_fault_count": {
"type": "string"
},
"automation_debug_chat_load_expected_prefix": {
"type": "string"
},
"automation_debug_chat_load_prompt_template": {
"type": "string"
},
"automation_debug_chat_load_stream": {
"type": "string",
"enum": ["0", "1", "false", "true"]
},
"automation_debug_chat_load_reset": {
"type": "string",
"enum": ["0", "1", "false", "true"]
},
"automation_debug_chat_load_fail_on_final_mismatch": {
"type": "string",
"enum": ["0", "1", "false", "true"]
},
"automation_fake_provider_response_text": {
"type": "string"
},
"automation_fake_provider_first_token_delay_ms": {
"type": "string"
},
"automation_fake_provider_chunk_delay_ms": {
"type": "string"
},
"automation_fake_provider_chunk_count": {
"type": "string"
},
"automation_fake_provider_fail_first_n": {
"type": "string"
},
"automation_fake_provider_fail_every_n": {
"type": "string"
},
"automation_fake_provider_fault_status": {
"type": "string"
},
"automation_fake_provider_fail_after_first_chunk": {
"type": "string",
"enum": ["0", "1", "false", "true"]
},
"automation_fake_provider_dynamic_response": {
"type": "string",
"enum": ["0", "1", "false", "true"]
},
"automation_filesystem_checks_json": { "automation_filesystem_checks_json": {
"type": "string" "type": "string"
}, },
"metrics_thresholds_json": {
"type": "string"
},
"load_profile_json": {
"type": "string"
},
"fault_model_json": {
"type": "string"
},
"automation_pipeline_url_env": { "automation_pipeline_url_env": {
"type": "string", "type": "string",
"pattern": "^[A-Z][A-Z0-9_]*$" "pattern": "^[A-Z][A-Z0-9_]*$"
+11 -1
View File
@@ -18,7 +18,17 @@
}, },
"type": { "type": {
"type": "string", "type": "string",
"enum": ["smoke", "regression", "release_gate", "exploratory"] "enum": [
"smoke",
"regression",
"release_gate",
"exploratory",
"contract",
"performance",
"reliability",
"chaos",
"security"
]
}, },
"priority": { "priority": {
"type": "string", "type": "string",
@@ -0,0 +1,203 @@
#!/usr/bin/env node
import { spawn } from "node:child_process";
import { mkdir, readFile, writeFile } from "node:fs/promises";
import { dirname, resolve } from "node:path";
import { env } from "node:process";
import {
appendLine,
ensureEvidence,
evidencePaths,
loadEnvFiles,
redact,
writeResult,
} from "./lib/langbot-e2e.mjs";
const caseId = "ensure-fake-provider-cross-pipelines";
const DEFAULT_PIPELINE_A_NAME = "Agent QA Fake Provider Debug Chat A";
const DEFAULT_PIPELINE_B_NAME = "Agent QA Fake Provider Debug Chat B";
await loadEnvFiles();
const paths = evidencePaths(caseId);
await ensureEvidence(paths);
const writeEnv = process.argv.includes("--write-env");
const envLocalPath = resolve("skills/.env.local");
const pipelineAName = env.LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME || DEFAULT_PIPELINE_A_NAME;
const pipelineBName = env.LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME || DEFAULT_PIPELINE_B_NAME;
const result = {
source: "setup_automation",
case_id: caseId,
run_id: paths.runId,
status: "fail",
reason: "",
pipeline_a: {
name: pipelineAName,
id: "",
url: "",
},
pipeline_b: {
name: pipelineBName,
id: "",
url: "",
},
fake_provider: {
url: "",
base_url: "",
pid: null,
},
wrote_env: false,
evidence: {
console_log: paths.consoleLog,
automation_result_json: paths.automationResultJson,
result_json: paths.resultJson,
},
evidence_collected: ["api_diagnostic", "filesystem"],
};
try {
if (pipelineAName === pipelineBName) {
throw new Error("LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME and LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME must be different.");
}
const setupA = await runPipelineSetup(pipelineAName, "A");
const setupB = await runPipelineSetup(pipelineBName, "B");
result.pipeline_a = {
name: setupA.pipeline_name || pipelineAName,
id: setupA.pipeline_id || "",
url: setupA.pipeline_url || "",
};
result.pipeline_b = {
name: setupB.pipeline_name || pipelineBName,
id: setupB.pipeline_id || "",
url: setupB.pipeline_url || "",
};
result.fake_provider = {
url: setupB.fake_provider?.url || setupA.fake_provider?.url || "",
base_url: setupB.fake_provider?.base_url || setupA.fake_provider?.base_url || "",
pid: setupB.fake_provider?.pid ?? setupA.fake_provider?.pid ?? null,
};
if (!result.pipeline_a.url || !result.pipeline_b.url || !result.fake_provider.url) {
throw new Error("Cross-pipeline fake provider setup did not return both pipeline URLs and provider URL.");
}
if (writeEnv) {
await upsertEnvLocal(envLocalPath, {
LANGBOT_FAKE_PROVIDER_URL: result.fake_provider.url,
LANGBOT_FAKE_PROVIDER_BASE_URL: result.fake_provider.base_url,
LANGBOT_FAKE_PROVIDER_PID: result.fake_provider.pid ? String(result.fake_provider.pid) : "",
LANGBOT_FAKE_PROVIDER_PIPELINE_A_URL: result.pipeline_a.url,
LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME: result.pipeline_a.name,
LANGBOT_FAKE_PROVIDER_PIPELINE_B_URL: result.pipeline_b.url,
LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME: result.pipeline_b.name,
});
result.wrote_env = true;
}
result.status = "pass";
result.reason = "Fake provider cross-pipeline fixtures are configured.";
} catch (error) {
result.status = looksLikeEnvIssue(error) ? "env_issue" : "fail";
result.reason = safeReason(error.message);
} finally {
await writeResult(paths, result);
console.log(JSON.stringify(result, null, 2));
}
process.exit(result.status === "pass" ? 0 : result.status === "env_issue" ? 2 : 1);
function runPipelineSetup(pipelineName, label) {
return new Promise((resolvePromise, rejectPromise) => {
const child = spawn(process.execPath, ["scripts/e2e/ensure-fake-provider-pipeline.mjs"], {
cwd: resolve("."),
env: {
...env,
LANGBOT_FAKE_PROVIDER_PIPELINE_NAME: pipelineName,
LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS: env.LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS || "25",
LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS: env.LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS || "10",
LANGBOT_FAKE_PROVIDER_CHUNK_COUNT: env.LANGBOT_FAKE_PROVIDER_CHUNK_COUNT || "0",
LANGBOT_FAKE_PROVIDER_FAIL_FIRST_N: "0",
LANGBOT_FAKE_PROVIDER_FAIL_EVERY_N: "0",
LANGBOT_FAKE_PROVIDER_FAULT_STATUS: env.LANGBOT_FAKE_PROVIDER_FAULT_STATUS || "500",
LANGBOT_FAKE_PROVIDER_FAIL_AFTER_FIRST_CHUNK: "false",
LANGBOT_FAKE_PROVIDER_DYNAMIC_RESPONSE: "true",
},
stdio: ["ignore", "pipe", "pipe"],
});
let stdout = "";
let stderr = "";
child.stdout.on("data", (chunk) => {
const text = chunk.toString();
stdout += text;
appendLine(paths.consoleLog, `[setup ${label} stdout] ${text.trimEnd()}`).catch(() => {});
});
child.stderr.on("data", (chunk) => {
const text = chunk.toString();
stderr += text;
appendLine(paths.consoleLog, `[setup ${label} stderr] ${text.trimEnd()}`).catch(() => {});
});
child.on("error", rejectPromise);
child.on("close", (code) => {
const parsed = parseJsonOutput(stdout);
if (code !== 0 || parsed.status !== "pass") {
rejectPromise(new Error(parsed.reason || stderr || `Fake provider pipeline setup ${label} exited with ${code}.`));
return;
}
resolvePromise(parsed);
});
});
}
function parseJsonOutput(text) {
const trimmed = String(text || "").trim();
if (!trimmed) return {};
try {
return JSON.parse(trimmed);
} catch {
const start = trimmed.indexOf("{");
const end = trimmed.lastIndexOf("}");
if (start >= 0 && end > start) {
try {
return JSON.parse(trimmed.slice(start, end + 1));
} catch {
return {};
}
}
return {};
}
}
async function upsertEnvLocal(path, updates) {
await mkdir(dirname(path), { recursive: true });
let text = "";
try {
text = await readFile(path, "utf8");
} catch {
text = "";
}
const lines = text.split(/\r?\n/);
const seen = new Set();
const next = lines.map((line) => {
const trimmed = line.trim();
const match = trimmed.match(/^([A-Z][A-Z0-9_]*)=/);
if (!match || updates[match[1]] === undefined) return line;
seen.add(match[1]);
return `${match[1]}=${updates[match[1]]}`;
});
for (const [key, value] of Object.entries(updates)) {
if (!seen.has(key)) next.push(`${key}=${value}`);
}
await writeFile(path, `${next.join("\n").replace(/\n+$/, "")}\n`, "utf8");
}
function looksLikeEnvIssue(error) {
const message = String(error?.message || error || "");
return /fetch failed|ECONNREFUSED|ENOTFOUND|LANGBOT_.*not configured|Could not read recovery_key|Backend did not respond/i.test(message);
}
function safeReason(value) {
return redact(String(value || "")).slice(0, 1000);
}
@@ -0,0 +1,632 @@
#!/usr/bin/env node
import { spawn } from "node:child_process";
import { open, readFile, mkdir, writeFile } from "node:fs/promises";
import { dirname, resolve } from "node:path";
import { env } from "node:process";
import {
apiJson,
ensureEvidence,
evidencePaths,
loadEnvFiles,
redact,
resetAndAuthLocalUser,
writeResult,
} from "./lib/langbot-e2e.mjs";
const RUNNER_ID = "local-agent";
const DEFAULT_LOCAL_PASSWORD = "LangBotE2ELocalPass!2026";
const DEFAULT_PIPELINE_NAME = "Agent QA Fake Provider Debug Chat";
const DEFAULT_PROVIDER_NAME = "LangBot QA Fake OpenAI Provider";
const DEFAULT_MODEL_NAME = "gpt-4o-mini";
const DEFAULT_REQUESTER = "openai-chat-completions";
const caseId = "ensure-fake-provider-pipeline";
await loadEnvFiles();
const paths = evidencePaths(caseId);
await ensureEvidence(paths);
const writeEnv = process.argv.includes("--write-env");
const frontendUrl = env.LANGBOT_FRONTEND_URL || "";
const backendUrl = env.LANGBOT_BACKEND_URL || "";
const envLocalPath = resolve("skills/.env.local");
const repoRoot = resolve(env.LANGBOT_REPO || "..");
const fakeStateDir = resolve(env.LANGBOT_FAKE_PROVIDER_STATE_DIR || resolve(repoRoot, ".qa/fake-provider"));
const fakeStatePath = resolve(fakeStateDir, "state.json");
const fakeStdoutPath = resolve(fakeStateDir, "fake-provider.stdout.log");
const fakeStderrPath = resolve(fakeStateDir, "fake-provider.stderr.log");
const pipelineName = env.LANGBOT_FAKE_PROVIDER_PIPELINE_NAME || DEFAULT_PIPELINE_NAME;
const providerName = env.LANGBOT_FAKE_PROVIDER_NAME || DEFAULT_PROVIDER_NAME;
const requester = env.LANGBOT_FAKE_PROVIDER_REQUESTER || DEFAULT_REQUESTER;
const modelName = env.LANGBOT_FAKE_PROVIDER_MODEL_NAME || DEFAULT_MODEL_NAME;
const result = {
source: "automation",
case_id: caseId,
run_id: paths.runId,
status: "fail",
reason: "",
frontend_url: frontendUrl,
backend_url: backendUrl,
fake_provider: {
url: "",
base_url: "",
pid: null,
reused: false,
config: {},
state_file: fakeStatePath,
stdout_log: fakeStdoutPath,
stderr_log: fakeStderrPath,
},
provider: {
uuid: "",
name: providerName,
requester,
created: false,
updated: false,
},
model: {
uuid: "",
name: modelName,
created: false,
updated: false,
test_status: "not_run",
test_reason: "",
},
pipeline_id: "",
pipeline_name: pipelineName,
pipeline_url: "",
created: false,
updated: false,
wrote_env: false,
evidence: {
console_log: paths.consoleLog,
network_log: paths.networkLog,
automation_result_json: paths.automationResultJson,
result_json: paths.resultJson,
},
evidence_collected: ["api_diagnostic", "network", "filesystem"],
};
try {
if (!backendUrl) {
result.status = "env_issue";
throw new Error("LANGBOT_BACKEND_URL is not configured.");
}
if (!frontendUrl) {
result.status = "env_issue";
throw new Error("LANGBOT_FRONTEND_URL is not configured.");
}
const fakeProvider = await ensureFakeProvider();
const setupConfig = await configureFakeProvider(fakeProvider.url, healthyFakeProviderConfig(), true);
result.fake_provider = {
...result.fake_provider,
...fakeProvider,
config: setupConfig.config || healthyFakeProviderConfig(),
};
const user = env.LANGBOT_E2E_LOGIN_USER || "";
const password = env.LANGBOT_E2E_LOGIN_PASSWORD || DEFAULT_LOCAL_PASSWORD;
if (!user) {
result.status = "env_issue";
throw new Error("LANGBOT_E2E_LOGIN_USER is required so this setup can create/update the fake provider pipeline.");
}
const auth = await resetAndAuthLocalUser({ backendUrl, user, password });
const wizard = await skipWizard({ backendUrl, token: auth.token });
if (wizard.status !== "pass") {
result.status = "fail";
throw new Error(wizard.reason || "Failed to mark the local QA wizard as skipped.");
}
const provider = await ensureProvider({
backendUrl,
token: auth.token,
name: providerName,
requester,
baseUrl: fakeProvider.base_url,
});
result.provider = provider;
const model = await ensureModel({
backendUrl,
token: auth.token,
providerUuid: provider.uuid,
name: modelName,
});
result.model = model;
const pipeline = await ensurePipeline({
backendUrl,
token: auth.token,
name: pipelineName,
modelUuid: model.uuid,
});
Object.assign(result, pipeline);
result.pipeline_url = `${frontendUrl.replace(/\/$/, "")}/home/pipelines?id=${encodeURIComponent(pipeline.pipeline_id)}`;
const runConfig = await configureFakeProvider(fakeProvider.url, targetFakeProviderConfig(), true);
result.fake_provider.config = runConfig.config || targetFakeProviderConfig();
if (writeEnv) {
await upsertEnvLocal(envLocalPath, {
LANGBOT_E2E_LOGIN_USER: user,
LANGBOT_FAKE_PROVIDER_URL: fakeProvider.url,
LANGBOT_FAKE_PROVIDER_BASE_URL: fakeProvider.base_url,
LANGBOT_FAKE_PROVIDER_PID: fakeProvider.pid ? String(fakeProvider.pid) : "",
LANGBOT_FAKE_PROVIDER_PROVIDER_UUID: provider.uuid,
LANGBOT_FAKE_PROVIDER_MODEL_UUID: model.uuid,
LANGBOT_FAKE_PROVIDER_PIPELINE_URL: result.pipeline_url,
LANGBOT_FAKE_PROVIDER_PIPELINE_NAME: pipelineName,
});
result.wrote_env = true;
}
result.status = "pass";
result.reason = `Fake provider pipeline is configured with ${requester}/${modelName}.`;
} catch (error) {
result.status = result.status === "env_issue" ? "env_issue" : "fail";
result.reason = result.reason || safeReason(error.message);
} finally {
await writeResult(paths, result);
console.log(JSON.stringify(result, null, 2));
}
process.exit(result.status === "pass" ? 0 : result.status === "env_issue" ? 2 : 1);
async function ensureFakeProvider() {
const envUrl = normalizeProviderRootUrl(env.LANGBOT_FAKE_PROVIDER_URL || "");
if (envUrl && await fakeProviderHealthy(envUrl) && await fakeProviderConfigurable(envUrl)) {
return {
url: envUrl,
base_url: `${envUrl}/v1`,
pid: null,
reused: true,
};
}
const state = await readState(fakeStatePath);
const stateUrl = normalizeProviderRootUrl(state.url || "");
if (stateUrl && await fakeProviderHealthy(stateUrl)) {
if (await fakeProviderConfigurable(stateUrl)) {
return {
url: stateUrl,
base_url: state.base_url || `${stateUrl}/v1`,
pid: Number.isInteger(state.pid) ? state.pid : null,
reused: true,
};
}
if (Number.isInteger(state.pid)) await stopProcess(state.pid);
}
await mkdir(fakeStateDir, { recursive: true });
await writeFile(fakeStatePath, `${JSON.stringify({ status: "starting", started_at: new Date().toISOString() }, null, 2)}\n`, "utf8");
const stdout = await open(fakeStdoutPath, "a");
const stderr = await open(fakeStderrPath, "a");
const scriptPath = resolve("scripts/e2e/fake-openai-provider.mjs");
const host = env.LANGBOT_FAKE_PROVIDER_HOST || "127.0.0.1";
const port = env.LANGBOT_FAKE_PROVIDER_PORT || "0";
const child = spawn(process.execPath, [
scriptPath,
`--host=${host}`,
`--port=${port}`,
`--state-file=${fakeStatePath}`,
], {
cwd: resolve("."),
detached: true,
env: {
...env,
LANGBOT_FAKE_PROVIDER_MODEL_NAME: modelName,
},
stdio: ["ignore", stdout.fd, stderr.fd],
});
child.unref();
await stdout.close();
await stderr.close();
const started = await waitForFakeProviderState(fakeStatePath, child.pid, 10_000);
if (!started.url || !await fakeProviderHealthy(started.url) || !await fakeProviderConfigurable(started.url)) {
throw new Error(`Fake provider did not become healthy. See ${fakeStderrPath}`);
}
return {
url: started.url,
base_url: started.base_url || `${started.url}/v1`,
pid: child.pid ?? started.pid ?? null,
reused: false,
};
}
async function configureFakeProvider(rootUrl, config, resetRequestCount) {
const response = await fetch(`${normalizeProviderRootUrl(rootUrl)}/__qa/config`, {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify({
config,
reset_request_count: resetRequestCount,
}),
signal: AbortSignal.timeout(3000),
});
const json = await response.json().catch(() => ({}));
if (!response.ok || json.ok !== true) {
throw new Error(`Fake provider config failed with HTTP ${response.status}.`);
}
return json;
}
async function fakeProviderHealthy(rootUrl) {
try {
const response = await fetch(`${rootUrl.replace(/\/$/, "")}/healthz`, {
signal: AbortSignal.timeout(2000),
});
if (!response.ok) return false;
const json = await response.json().catch(() => ({}));
return json.ok === true;
} catch {
return false;
}
}
async function fakeProviderConfigurable(rootUrl) {
try {
const response = await fetch(`${rootUrl.replace(/\/$/, "")}/__qa/config`, {
signal: AbortSignal.timeout(2000),
});
if (!response.ok) return false;
const json = await response.json().catch(() => ({}));
return json.ok === true && json.config && typeof json.config === "object";
} catch {
return false;
}
}
async function stopProcess(pid) {
try {
process.kill(pid, "SIGTERM");
} catch {
return;
}
await sleep(500);
}
async function waitForFakeProviderState(path, expectedPid, timeoutMs) {
const startedAt = Date.now();
let lastState = {};
while (Date.now() - startedAt < timeoutMs) {
const state = await readState(path);
if (state.url && (!expectedPid || state.pid === expectedPid)) return state;
lastState = state;
await sleep(150);
}
return lastState;
}
async function readState(path) {
try {
return JSON.parse(await readFile(path, "utf8"));
} catch {
return {};
}
}
function normalizeProviderRootUrl(value) {
const trimmed = String(value || "").trim().replace(/\/$/, "");
return trimmed.endsWith("/v1") ? trimmed.slice(0, -3) : trimmed;
}
function healthyFakeProviderConfig() {
return {
response_text: "OK",
first_token_delay_ms: 25,
chunk_delay_ms: 10,
chunk_count: 0,
fault_status: 500,
fail_first_n: 0,
fail_every_n: 0,
fail_after_first_chunk: false,
dynamic_response: true,
};
}
function targetFakeProviderConfig() {
return {
response_text: env.LANGBOT_FAKE_PROVIDER_RESPONSE_TEXT || "OK",
first_token_delay_ms: nonNegativeInteger(env.LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS, 25),
chunk_delay_ms: nonNegativeInteger(env.LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS, 10),
chunk_count: nonNegativeInteger(env.LANGBOT_FAKE_PROVIDER_CHUNK_COUNT, 0),
fault_status: httpFaultStatus(env.LANGBOT_FAKE_PROVIDER_FAULT_STATUS, 500),
fail_first_n: nonNegativeInteger(env.LANGBOT_FAKE_PROVIDER_FAIL_FIRST_N, 0),
fail_every_n: nonNegativeInteger(env.LANGBOT_FAKE_PROVIDER_FAIL_EVERY_N, 0),
fail_after_first_chunk: envBool(env.LANGBOT_FAKE_PROVIDER_FAIL_AFTER_FIRST_CHUNK, false),
dynamic_response: envBool(env.LANGBOT_FAKE_PROVIDER_DYNAMIC_RESPONSE, true),
};
}
async function skipWizard({ backendUrl, token }) {
const response = await apiJson(backendUrl, "/api/v1/system/wizard/completed", {
method: "POST",
token,
body: { status: "skipped" },
});
const ok = response.status < 400 && response.json.code === 0;
return {
status: ok ? "pass" : "fail",
http_status: response.status,
code: response.json.code ?? null,
reason: ok ? "Wizard marked skipped for local QA." : response.json.msg || "Wizard status update failed.",
};
}
async function ensureProvider({ backendUrl, token, name, requester, baseUrl }) {
const list = await apiJson(backendUrl, "/api/v1/provider/providers", { token });
if (isApiFailure(list)) {
throw new Error(list.json.msg || "Failed to list providers.");
}
const providers = list.json.data?.providers || [];
const existing = providers.find((provider) => (
provider.name === name
|| (provider.requester === requester && String(provider.base_url || "").replace(/\/$/, "") === baseUrl.replace(/\/$/, ""))
));
const body = {
name,
requester,
base_url: baseUrl,
api_keys: [env.LANGBOT_FAKE_PROVIDER_API_KEY || "langbot-fake-provider-key"],
};
if (existing?.uuid) {
const update = await apiJson(backendUrl, `/api/v1/provider/providers/${encodeURIComponent(existing.uuid)}`, {
method: "PUT",
token,
body,
});
if (isApiFailure(update)) {
throw new Error(update.json.msg || "Failed to update fake provider.");
}
return {
uuid: existing.uuid,
name,
requester,
created: false,
updated: true,
};
}
const create = await apiJson(backendUrl, "/api/v1/provider/providers", {
method: "POST",
token,
body,
});
const uuid = create.json.data?.uuid || "";
if (isApiFailure(create) || !uuid) {
throw new Error(create.json.msg || "Failed to create fake provider.");
}
return {
uuid,
name,
requester,
created: true,
updated: false,
};
}
async function ensureModel({ backendUrl, token, providerUuid, name }) {
const list = await apiJson(backendUrl, `/api/v1/provider/models/llm?provider_uuid=${encodeURIComponent(providerUuid)}`, { token });
if (isApiFailure(list)) {
throw new Error(list.json.msg || "Failed to list fake provider models.");
}
const models = list.json.data?.models || [];
const existing = models.find((model) => model.name === name);
const body = {
name,
provider_uuid: providerUuid,
abilities: [],
context_length: positiveInteger(env.LANGBOT_FAKE_PROVIDER_CONTEXT_LENGTH, 8192),
extra_args: {},
prefered_ranking: 0,
};
let modelUuid = existing?.uuid || "";
let created = false;
let updated = false;
if (modelUuid) {
const update = await apiJson(backendUrl, `/api/v1/provider/models/llm/${encodeURIComponent(modelUuid)}`, {
method: "PUT",
token,
body,
});
if (isApiFailure(update)) {
throw new Error(update.json.msg || "Failed to update fake provider model.");
}
updated = true;
} else {
const create = await apiJson(backendUrl, "/api/v1/provider/models/llm", {
method: "POST",
token,
body,
});
modelUuid = create.json.data?.uuid || "";
if (isApiFailure(create) || !modelUuid) {
throw new Error(create.json.msg || "Failed to create fake provider model.");
}
created = true;
}
const test = await apiJson(backendUrl, `/api/v1/provider/models/llm/${encodeURIComponent(modelUuid)}/test`, {
method: "POST",
token,
body: { extra_args: {} },
});
if (isApiFailure(test)) {
throw new Error(safeReason(test.json.msg || test.json.message || "Fake provider model test failed."));
}
return {
uuid: modelUuid,
name,
created,
updated,
test_status: "pass",
test_reason: "",
};
}
async function ensurePipeline({ backendUrl, token, name, modelUuid }) {
const list = await apiJson(backendUrl, "/api/v1/pipelines", { token });
if (isApiFailure(list)) {
throw new Error(list.json.msg || "Failed to list pipelines.");
}
const pipelines = list.json.data?.pipelines || [];
let pipeline = pipelines.find((item) => item.name === name) || null;
let created = false;
if (!pipeline) {
const create = await apiJson(backendUrl, "/api/v1/pipelines", {
method: "POST",
token,
body: {
name,
description: "Local QA pipeline for controlled fake-provider Debug Chat load tests.",
emoji: "QA",
},
});
const pipelineId = create.json.data?.uuid || "";
if (isApiFailure(create) || !pipelineId) {
throw new Error(create.json.msg || "Failed to create fake provider pipeline.");
}
created = true;
pipeline = { uuid: pipelineId };
}
const loaded = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(pipeline.uuid)}`, { token });
pipeline = loaded.json.data?.pipeline || null;
if (isApiFailure(loaded) || !pipeline?.uuid) {
throw new Error(loaded.json.msg || "Failed to load fake provider pipeline.");
}
const config = pipeline.config && typeof pipeline.config === "object" ? pipeline.config : {};
const ai = config.ai && typeof config.ai === "object" ? config.ai : {};
const existingLocalAgentConfig = ai["local-agent"] && typeof ai["local-agent"] === "object"
? ai["local-agent"]
: {};
const localAgentConfig = {
timeout: 60,
prompt: [{ role: "system", content: "You are a deterministic QA assistant. Reply exactly as instructed." }],
"remove-think": false,
"knowledge-bases": [],
"box-session-id-template": "{launcher_type}_{launcher_id}",
"retrieval-top-k": 5,
"rerank-model": "",
"rerank-top-k": 5,
"max-tool-iterations": 20,
"tool-execution-mode": "parallel",
"max-tool-result-chars": 20000,
"context-history-fetch-limit": 20,
"context-window-tokens": 8192,
"context-reserve-tokens": 1024,
"context-keep-recent-tokens": 2048,
"context-summary-tokens": 1024,
...existingLocalAgentConfig,
// Current backend truncation still reads this field directly.
"max-round": positiveInteger(existingLocalAgentConfig["max-round"], 10),
model: {
primary: modelUuid,
fallbacks: [],
},
};
const updatedConfig = {
...config,
ai: {
...ai,
runner: {
...(ai.runner && typeof ai.runner === "object" ? ai.runner : {}),
id: RUNNER_ID,
runner: RUNNER_ID,
"expire-time": 0,
},
"local-agent": localAgentConfig,
},
};
const update = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(pipeline.uuid)}`, {
method: "PUT",
token,
body: {
name,
description: "Local QA pipeline for controlled fake-provider Debug Chat load tests.",
emoji: "QA",
config: updatedConfig,
},
});
if (isApiFailure(update)) {
throw new Error(update.json.msg || "Failed to update fake provider pipeline.");
}
return {
pipeline_id: pipeline.uuid,
pipeline_name: name,
created,
updated: true,
};
}
function isApiFailure(response) {
return response.status >= 400 || (response.json.code !== undefined && response.json.code !== 0);
}
function positiveInteger(value, fallback) {
const parsed = Number(value);
return Number.isInteger(parsed) && parsed > 0 ? parsed : fallback;
}
function nonNegativeInteger(value, fallback) {
const parsed = Number(value);
return Number.isInteger(parsed) && parsed >= 0 ? parsed : fallback;
}
function httpFaultStatus(value, fallback) {
const parsed = Number(value);
return Number.isInteger(parsed) && parsed >= 400 && parsed <= 599 ? parsed : fallback;
}
function envBool(value, fallback) {
if (value === undefined || value === "") return fallback;
if (/^(1|true|yes|on)$/i.test(String(value))) return true;
if (/^(0|false|no|off)$/i.test(String(value))) return false;
return fallback;
}
function sleep(ms) {
return new Promise((resolve) => setTimeout(resolve, ms));
}
function safeReason(value) {
return redact(String(value || "")).slice(0, 1000);
}
async function upsertEnvLocal(path, updates) {
await mkdir(dirname(path), { recursive: true });
let text = "";
try {
text = await readFile(path, "utf8");
} catch {
text = "";
}
const lines = text.split(/\r?\n/);
const seen = new Set();
const next = lines.map((line) => {
const trimmed = line.trim();
const equals = trimmed.indexOf("=");
if (equals <= 0 || trimmed.startsWith("#")) return line;
const key = trimmed.slice(0, equals).trim();
if (!(key in updates)) return line;
seen.add(key);
return `${key}=${updates[key]}`;
});
for (const [key, value] of Object.entries(updates)) {
if (!seen.has(key)) next.push(`${key}=${value}`);
}
await writeFile(path, `${next.filter((line, index) => line !== "" || index < next.length - 1).join("\n")}\n`, "utf8");
}
@@ -10,6 +10,7 @@ import {
ensureEvidence, ensureEvidence,
evidencePaths, evidencePaths,
loadEnvFiles, loadEnvFiles,
redact,
resetAndAuthLocalUser, resetAndAuthLocalUser,
safeScreenshot, safeScreenshot,
setBrowserToken, setBrowserToken,
@@ -17,9 +18,12 @@ import {
writeResult, writeResult,
} from "./lib/langbot-e2e.mjs"; } from "./lib/langbot-e2e.mjs";
const RUNNER_ID = "plugin:langbot/local-agent/default"; const RUNNER_ID = "local-agent";
const SPACE_PROVIDER_UUID = "00000000-0000-0000-0000-000000000000";
const DEFAULT_PIPELINE_NAME = "Agent QA Local Agent Debug Chat"; const DEFAULT_PIPELINE_NAME = "Agent QA Local Agent Debug Chat";
const DEFAULT_LOCAL_PASSWORD = "LangBotE2ELocalPass!2026"; const DEFAULT_LOCAL_PASSWORD = "LangBotE2ELocalPass!2026";
const DEFAULT_MODEL_TEST_LIMIT = 8;
const DEFAULT_MODEL_FALLBACK_COUNT = 3;
const caseId = "ensure-local-agent-pipeline"; const caseId = "ensure-local-agent-pipeline";
await loadEnvFiles(); await loadEnvFiles();
@@ -45,11 +49,18 @@ const result = {
pipeline_url: "", pipeline_url: "",
runner_id: RUNNER_ID, runner_id: RUNNER_ID,
selected_model_id: "", selected_model_id: "",
selected_model_name: "",
fallback_model_ids: [],
model_count: 0, model_count: 0,
space_model_count: 0,
scanned_space_model_count: 0,
tested_model_count: 0,
model_tests: [],
created: false, created: false,
updated: false, updated: false,
wrote_env: false, wrote_env: false,
auth: null, auth: null,
wizard: null,
browser_token_check: null, browser_token_check: null,
page_signal: "", page_signal: "",
evidence: { evidence: {
@@ -71,6 +82,7 @@ try {
const user = env.LANGBOT_E2E_LOGIN_USER || ""; const user = env.LANGBOT_E2E_LOGIN_USER || "";
const password = env.LANGBOT_E2E_LOGIN_PASSWORD || DEFAULT_LOCAL_PASSWORD; const password = env.LANGBOT_E2E_LOGIN_PASSWORD || DEFAULT_LOCAL_PASSWORD;
if (!user) { if (!user) {
result.status = "env_issue";
throw new Error("LANGBOT_E2E_LOGIN_USER is required so this setup can create/update the pipeline via backend API."); throw new Error("LANGBOT_E2E_LOGIN_USER is required so this setup can create/update the pipeline via backend API.");
} }
@@ -81,6 +93,13 @@ try {
backend_token_check: auth.check, backend_token_check: auth.check,
}; };
const wizard = await skipWizard({ backendUrl, token: auth.token });
result.wizard = wizard;
if (wizard.status !== "pass") {
result.status = "fail";
throw new Error(wizard.reason || "Failed to mark the local QA wizard as skipped.");
}
const prepared = await ensureLocalAgentPipeline({ const prepared = await ensureLocalAgentPipeline({
backendUrl, backendUrl,
token: auth.token, token: auth.token,
@@ -99,6 +118,10 @@ try {
LANGBOT_PIPELINE_NAME: result.pipeline_name || pipelineName, LANGBOT_PIPELINE_NAME: result.pipeline_name || pipelineName,
LANGBOT_LOCAL_AGENT_PIPELINE_URL: result.pipeline_url, LANGBOT_LOCAL_AGENT_PIPELINE_URL: result.pipeline_url,
LANGBOT_LOCAL_AGENT_PIPELINE_NAME: result.pipeline_name || pipelineName, LANGBOT_LOCAL_AGENT_PIPELINE_NAME: result.pipeline_name || pipelineName,
...(result.selected_model_id ? {
LANGBOT_LOCAL_AGENT_MODEL_UUID: result.selected_model_id,
LANGBOT_E2E_MODEL_UUID: result.selected_model_id,
} : {}),
}); });
result.wrote_env = true; result.wrote_env = true;
} }
@@ -127,6 +150,21 @@ try {
process.exit(result.status === "pass" ? 0 : result.status === "env_issue" ? 2 : 1); process.exit(result.status === "pass" ? 0 : result.status === "env_issue" ? 2 : 1);
async function skipWizard({ backendUrl, token }) {
const response = await apiJson(backendUrl, "/api/v1/system/wizard/completed", {
method: "POST",
token,
body: { status: "skipped" },
});
const ok = response.status < 400 && response.json.code === 0;
return {
status: ok ? "pass" : "fail",
http_status: response.status,
code: response.json.code ?? null,
reason: ok ? "Wizard marked skipped for local QA." : response.json.msg || "Wizard status update failed.",
};
}
async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runnerId }) { async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runnerId }) {
const [pipelineList, modelList] = await Promise.all([ const [pipelineList, modelList] = await Promise.all([
apiJson(backendUrl, "/api/v1/pipelines", { token }), apiJson(backendUrl, "/api/v1/pipelines", { token }),
@@ -149,7 +187,19 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
} }
const models = modelList.json.data?.models || []; const models = modelList.json.data?.models || [];
const selectedModel = models.find((model) => model.uuid) || null; const skippedModelIds = new Set(
String(env.LANGBOT_E2E_SKIP_MODEL_UUIDS || "")
.split(",")
.map((item) => item.trim())
.filter(Boolean),
);
const skippedModelNames = new Set(
String(env.LANGBOT_E2E_SKIP_MODEL_NAMES || "")
.split(",")
.map((item) => item.trim())
.filter(Boolean),
);
const spaceModels = models.filter((model) => isSpaceModel(model) && !skippedModelIds.has(model.uuid));
const pipelines = pipelineList.json.data?.pipelines || []; const pipelines = pipelineList.json.data?.pipelines || [];
let pipeline = pipelines.find((item) => item.name === pipelineName) || null; let pipeline = pipelines.find((item) => item.name === pipelineName) || null;
let created = false; let created = false;
@@ -170,6 +220,7 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
reason: createdResponse.json.msg || "Failed to create pipeline.", reason: createdResponse.json.msg || "Failed to create pipeline.",
create_status: createdResponse.status, create_status: createdResponse.status,
model_count: models.length, model_count: models.length,
space_model_count: spaceModels.length,
}; };
} }
const pipelineId = createdResponse.json.data?.uuid || ""; const pipelineId = createdResponse.json.data?.uuid || "";
@@ -183,6 +234,7 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
status: "fail", status: "fail",
reason: "Pipeline was not created or resolved.", reason: "Pipeline was not created or resolved.",
model_count: models.length, model_count: models.length,
space_model_count: spaceModels.length,
}; };
} }
@@ -194,27 +246,37 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
get_status: loaded.status, get_status: loaded.status,
pipeline_id: pipeline.uuid, pipeline_id: pipeline.uuid,
model_count: models.length, model_count: models.length,
space_model_count: spaceModels.length,
}; };
} }
pipeline = loaded.json.data.pipeline; pipeline = loaded.json.data.pipeline;
const config = pipeline.config && typeof pipeline.config === "object" ? pipeline.config : {}; const config = pipeline.config && typeof pipeline.config === "object" ? pipeline.config : {};
const ai = config.ai && typeof config.ai === "object" ? config.ai : {}; const ai = config.ai && typeof config.ai === "object" ? config.ai : {};
const runnerConfig = ai.runner_config && typeof ai.runner_config === "object" ? ai.runner_config : {}; const rawExistingLocalAgentConfig = ai["local-agent"] && typeof ai["local-agent"] === "object"
const rawExistingLocalAgentConfig = runnerConfig[runnerId] && typeof runnerConfig[runnerId] === "object" ? ai["local-agent"]
? runnerConfig[runnerId]
: {}; : {};
const existingLocalAgentConfig = rawExistingLocalAgentConfig; const existingLocalAgentConfig = rawExistingLocalAgentConfig;
const existingModel = existingLocalAgentConfig.model && typeof existingLocalAgentConfig.model === "object" const existingModel = existingLocalAgentConfig.model && typeof existingLocalAgentConfig.model === "object"
? existingLocalAgentConfig.model ? existingLocalAgentConfig.model
: {}; : {};
const requestedModelId = env.LANGBOT_LOCAL_AGENT_MODEL_UUID || env.LANGBOT_E2E_MODEL_UUID || ""; const requestedModelId = env.LANGBOT_LOCAL_AGENT_MODEL_UUID || env.LANGBOT_E2E_MODEL_UUID || "";
const selectedModelId = requestedModelId || existingModel.primary || selectedModel?.uuid || ""; const selected = await selectWorkingSpaceModel({
backendUrl,
token,
models,
skippedModelIds,
skippedModelNames,
requestedModelId,
existingModelId: existingModel.primary || "",
});
const selectedModelId = selected.selected_model_id || "";
const localAgentConfig = { const localAgentConfig = {
timeout: 300, timeout: 300,
prompt: [{ role: "system", content: "You are a helpful assistant." }], prompt: [{ role: "system", content: "You are a helpful assistant." }],
"remove-think": false, "remove-think": false,
"knowledge-bases": [], "knowledge-bases": [],
"box-session-id-template": "{launcher_type}_{launcher_id}",
"retrieval-top-k": 5, "retrieval-top-k": 5,
"rerank-model": "", "rerank-model": "",
"rerank-top-k": 5, "rerank-top-k": 5,
@@ -227,9 +289,11 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
"context-keep-recent-tokens": 20000, "context-keep-recent-tokens": 20000,
"context-summary-tokens": 8000, "context-summary-tokens": 8000,
...existingLocalAgentConfig, ...existingLocalAgentConfig,
// Current backend truncation still reads this field directly.
"max-round": positiveInteger(existingLocalAgentConfig["max-round"], 10),
model: { model: {
primary: selectedModelId, primary: selectedModelId,
fallbacks: requestedModelId ? [] : Array.isArray(existingModel.fallbacks) ? existingModel.fallbacks : [], fallbacks: selected.fallback_model_ids || [],
}, },
}; };
const updatedConfig = { const updatedConfig = {
@@ -239,12 +303,10 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
runner: { runner: {
...(ai.runner && typeof ai.runner === "object" ? ai.runner : {}), ...(ai.runner && typeof ai.runner === "object" ? ai.runner : {}),
id: runnerId, id: runnerId,
runner: runnerId,
"expire-time": 0, "expire-time": 0,
}, },
runner_config: { "local-agent": localAgentConfig,
...runnerConfig,
[runnerId]: localAgentConfig,
},
}, },
}; };
@@ -265,19 +327,31 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
update_status: updateResponse.status, update_status: updateResponse.status,
pipeline_id: pipeline.uuid, pipeline_id: pipeline.uuid,
model_count: models.length, model_count: models.length,
space_model_count: spaceModels.length,
scanned_space_model_count: selected.scanned_space_model_count,
tested_model_count: selected.tested_model_count,
model_tests: selected.model_tests,
selected_model_id: selectedModelId, selected_model_id: selectedModelId,
selected_model_name: selected.selected_model_name,
fallback_model_ids: selected.fallback_model_ids,
}; };
} }
return { return {
status: selectedModelId ? "pass" : "env_issue", status: selectedModelId ? "pass" : "env_issue",
reason: selectedModelId reason: selectedModelId
? "Local-agent pipeline is configured for Debug Chat." ? `Local-agent pipeline is configured for Debug Chat with Space model ${selected.selected_model_name || selectedModelId} and ${selected.fallback_model_ids.length} fallback(s).`
: "Pipeline was created but no LLM model is configured in this LangBot instance.", : selected.reason || "No working Space LLM model is configured in this LangBot instance.",
pipeline_id: pipeline.uuid, pipeline_id: pipeline.uuid,
pipeline_name: pipeline.name, pipeline_name: pipelineName,
model_count: models.length, model_count: models.length,
space_model_count: spaceModels.length,
scanned_space_model_count: selected.scanned_space_model_count,
tested_model_count: selected.tested_model_count,
model_tests: selected.model_tests,
selected_model_id: selectedModelId, selected_model_id: selectedModelId,
selected_model_name: selected.selected_model_name,
fallback_model_ids: selected.fallback_model_ids,
created, created,
updated: true, updated: true,
}; };
@@ -287,6 +361,229 @@ function isApiFailure(response) {
return response.status >= 400 || (response.json.code !== undefined && response.json.code !== 0); return response.status >= 400 || (response.json.code !== undefined && response.json.code !== 0);
} }
function isSpaceModel(model) {
const provider = model?.provider && typeof model.provider === "object" ? model.provider : {};
return model?.provider_uuid === SPACE_PROVIDER_UUID
|| provider.uuid === SPACE_PROVIDER_UUID
|| provider.requester === "space-chat-completions"
|| provider.name === "LangBot Models";
}
async function selectWorkingSpaceModel({
backendUrl,
token,
models,
skippedModelIds,
skippedModelNames,
requestedModelId,
existingModelId,
}) {
const modelTests = [];
const testLimit = positiveInteger(env.LANGBOT_E2E_MODEL_TEST_LIMIT, DEFAULT_MODEL_TEST_LIMIT);
const fallbackCount = positiveInteger(env.LANGBOT_E2E_MODEL_FALLBACK_COUNT, DEFAULT_MODEL_FALLBACK_COUNT);
const workingModels = [];
const spaceModels = rankModels(models.filter((model) => (
model.uuid
&& isSpaceModel(model)
&& !skippedModelIds.has(model.uuid)
&& !skippedModelNames.has(model.name)
)));
const requestedModel = requestedModelId
? spaceModels.find((model) => model.uuid === requestedModelId) || null
: null;
const existingModel = existingModelId
? spaceModels.find((model) => model.uuid === existingModelId) || null
: null;
const candidates = uniqueCandidates([
...(requestedModel ? [existingCandidate(requestedModel, "requested")] : []),
...(existingModel ? [existingCandidate(existingModel, "existing-pipeline")] : []),
...spaceModels.map((model) => existingCandidate(model, "configured-space")),
]);
let scanResult = { status: "skipped", models: [], reason: "" };
if (env.LANGBOT_E2E_SCAN_SPACE_MODELS !== "false") {
scanResult = await scanSpaceModels({ backendUrl, token });
if (scanResult.status === "pass") {
const knownNames = new Set(spaceModels.map((model) => model.name));
candidates.push(...scanResult.models
.filter((model) => model.name && !knownNames.has(model.name) && !skippedModelNames.has(model.name))
.map((model) => scannedCandidate(model)));
}
}
const unique = uniqueCandidates(candidates);
for (const candidate of unique.slice(0, testLimit)) {
const test = await ensureAndTestModel({ backendUrl, token, candidate });
modelTests.push(test);
if (test.status === "pass" && test.model_uuid) {
workingModels.push(test);
if (workingModels.length >= fallbackCount + 1) break;
}
}
if (workingModels.length > 0) {
const [primary, ...fallbacks] = workingModels;
return {
status: "pass",
reason: "",
selected_model_id: primary.model_uuid,
selected_model_name: primary.model_name,
fallback_model_ids: fallbacks.map((model) => model.model_uuid),
scanned_space_model_count: scanResult.models.length,
tested_model_count: modelTests.length,
model_tests: modelTests,
};
}
const baseReason = unique.length === 0
? scanResult.reason || "No Space LLM model candidates are available."
: `No working Space LLM model found after testing ${modelTests.length} candidate(s).`;
return {
status: "env_issue",
reason: requestedModelId && !requestedModel
? `Requested Space LLM model ${requestedModelId} is missing or skipped; ${baseReason}`
: baseReason,
selected_model_id: "",
selected_model_name: "",
fallback_model_ids: [],
scanned_space_model_count: scanResult.models.length,
tested_model_count: modelTests.length,
model_tests: modelTests,
};
}
async function scanSpaceModels({ backendUrl, token }) {
const response = await apiJson(
backendUrl,
`/api/v1/provider/providers/${encodeURIComponent(SPACE_PROVIDER_UUID)}/scan-models?type=llm`,
{ token },
);
if (isApiFailure(response)) {
return {
status: "env_issue",
models: [],
reason: safeReason(response.json.msg || response.json.message || "Failed to scan Space LLM models."),
};
}
return {
status: "pass",
models: response.json.data?.models || [],
reason: "",
};
}
async function ensureAndTestModel({ backendUrl, token, candidate }) {
let modelUuid = candidate.uuid || "";
let created = false;
if (!modelUuid) {
const create = await apiJson(backendUrl, "/api/v1/provider/models/llm", {
method: "POST",
token,
body: {
name: candidate.name,
provider_uuid: SPACE_PROVIDER_UUID,
abilities: candidate.abilities || [],
context_length: candidate.context_length ?? null,
extra_args: {},
prefered_ranking: positiveInteger(candidate.prefered_ranking, 0),
},
});
modelUuid = create.json.data?.uuid || "";
if (isApiFailure(create) || !modelUuid) {
return modelTestResult(candidate, {
status: "fail",
reason: safeReason(create.json.msg || "Failed to create scanned Space model."),
http_status: create.status,
});
}
created = true;
}
const test = await apiJson(backendUrl, `/api/v1/provider/models/llm/${encodeURIComponent(modelUuid)}/test`, {
method: "POST",
token,
body: { extra_args: {} },
});
const passed = !isApiFailure(test);
if (!passed && created) {
await apiJson(backendUrl, `/api/v1/provider/models/llm/${encodeURIComponent(modelUuid)}`, {
method: "DELETE",
token,
}).catch(() => {});
}
return modelTestResult(candidate, {
status: passed ? "pass" : "fail",
reason: passed ? "" : safeReason(test.json.msg || test.json.message || "Space model test failed."),
http_status: test.status,
model_uuid: modelUuid,
created,
});
}
function modelTestResult(candidate, details) {
return {
source: candidate.source,
model_uuid: details.model_uuid || candidate.uuid || "",
model_name: candidate.name,
status: details.status,
reason: details.reason || "",
http_status: details.http_status ?? null,
created: Boolean(details.created),
};
}
function existingCandidate(model, source) {
return {
source,
uuid: model.uuid,
name: model.name,
abilities: model.abilities || [],
context_length: model.context_length,
prefered_ranking: model.prefered_ranking,
};
}
function scannedCandidate(model) {
return {
source: "scanned-space",
uuid: "",
name: model.name || model.id,
abilities: model.abilities || [],
context_length: model.context_length,
prefered_ranking: model.prefered_ranking,
};
}
function uniqueCandidates(candidates) {
const seen = new Set();
const result = [];
for (const candidate of candidates) {
const key = candidate.uuid ? `uuid:${candidate.uuid}` : `name:${candidate.name}`;
if (!candidate.name || seen.has(key)) continue;
seen.add(key);
result.push(candidate);
}
return result;
}
function rankModels(models) {
return [...models].sort((left, right) => {
const leftRank = Number.isFinite(Number(left.prefered_ranking)) ? Number(left.prefered_ranking) : 9999;
const rightRank = Number.isFinite(Number(right.prefered_ranking)) ? Number(right.prefered_ranking) : 9999;
if (leftRank !== rightRank) return leftRank - rightRank;
return String(left.name || "").localeCompare(String(right.name || ""));
});
}
function positiveInteger(value, fallback) {
const parsed = Number(value);
return Number.isInteger(parsed) && parsed > 0 ? parsed : fallback;
}
function safeReason(value) {
return redact(String(value || "")).slice(0, 1000);
}
async function upsertEnvLocal(path, updates) { async function upsertEnvLocal(path, updates) {
let text = ""; let text = "";
try { try {
+496
View File
@@ -0,0 +1,496 @@
#!/usr/bin/env node
import { createServer } from "node:http";
import { mkdir, writeFile } from "node:fs/promises";
import { dirname, resolve } from "node:path";
import { env, exit } from "node:process";
const args = parseArgs(process.argv.slice(2));
const host = args.host || env.LANGBOT_FAKE_PROVIDER_HOST || "127.0.0.1";
const port = integer(args.port ?? env.LANGBOT_FAKE_PROVIDER_PORT, 0);
const stateFile = args["state-file"] || env.LANGBOT_FAKE_PROVIDER_STATE_FILE || "";
const modelName = env.LANGBOT_FAKE_PROVIDER_MODEL_NAME || "gpt-4o-mini";
const config = {
response_text: env.LANGBOT_FAKE_PROVIDER_RESPONSE_TEXT || "OK",
first_token_delay_ms: integer(env.LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS, 25),
chunk_delay_ms: integer(env.LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS, 10),
chunk_count: integer(env.LANGBOT_FAKE_PROVIDER_CHUNK_COUNT, 0),
fault_status: integer(env.LANGBOT_FAKE_PROVIDER_FAULT_STATUS, 500),
fail_first_n: integer(env.LANGBOT_FAKE_PROVIDER_FAIL_FIRST_N, 0),
fail_every_n: integer(env.LANGBOT_FAKE_PROVIDER_FAIL_EVERY_N, 0),
fail_after_first_chunk: bool(env.LANGBOT_FAKE_PROVIDER_FAIL_AFTER_FIRST_CHUNK, false),
dynamic_response: !/^(0|false|no|off)$/i.test(env.LANGBOT_FAKE_PROVIDER_DYNAMIC_RESPONSE || ""),
request_log_limit: integer(env.LANGBOT_FAKE_PROVIDER_REQUEST_LOG_LIMIT, 500),
};
let requestCount = 0;
const recentRequests = [];
const server = createServer(async (request, response) => {
const startedAt = Date.now();
const startedPerf = performance.now();
let requestRecord = null;
const url = new URL(request.url || "/", `http://${request.headers.host || `${host}:${port}`}`);
try {
if (request.method === "GET" && url.pathname === "/healthz") {
sendJson(response, 200, {
ok: true,
model: modelName,
config,
request_count: requestCount,
recent_request_count: recentRequests.length,
});
return;
}
if (request.method === "GET" && url.pathname === "/__qa/config") {
sendJson(response, 200, {
ok: true,
model: modelName,
config,
request_count: requestCount,
recent_requests: recentRequests,
});
return;
}
if (request.method === "POST" && url.pathname === "/__qa/config") {
const body = await readJson(request);
applyConfig(body.config && typeof body.config === "object" ? body.config : body);
if (body.reset_request_count !== false) resetRequestState();
sendJson(response, 200, {
ok: true,
model: modelName,
config,
request_count: requestCount,
});
return;
}
if (request.method === "POST" && url.pathname === "/__qa/reset") {
resetRequestState();
sendJson(response, 200, {
ok: true,
model: modelName,
config,
request_count: requestCount,
});
return;
}
if (request.method === "GET" && ["/models", "/v1/models"].includes(url.pathname)) {
sendJson(response, 200, {
object: "list",
data: [
{
id: modelName,
object: "model",
created: 1,
owned_by: "langbot-qa",
type: "llm",
},
],
});
return;
}
if (request.method === "POST" && ["/chat/completions", "/v1/chat/completions"].includes(url.pathname)) {
requestCount += 1;
const body = await readJson(request);
const requestId = `chatcmpl-langbot-fake-${requestCount}`;
const shouldFail = requestCount <= config.fail_first_n
|| (config.fail_every_n > 0 && requestCount % config.fail_every_n === 0);
const replyText = responseTextForBody(body);
requestRecord = recordRequest({
id: requestId,
request_number: requestCount,
path: url.pathname,
stream: Boolean(body.stream),
model: body.model || "",
message_count: Array.isArray(body.messages) ? body.messages.length : 0,
should_fail: shouldFail,
status: "running",
http_status: null,
expected_text: replyText,
response_text_preview: previewText(replyText),
started_at: new Date(startedAt).toISOString(),
started_epoch_ms: startedAt,
configured_first_token_delay_ms: config.first_token_delay_ms,
configured_chunk_delay_ms: config.chunk_delay_ms,
configured_chunk_count: config.chunk_count,
});
if (shouldFail) {
await sleep(config.first_token_delay_ms);
sendJson(response, config.fault_status, {
error: {
message: `LangBot fake provider injected HTTP ${config.fault_status}`,
type: "fake_provider_fault",
code: "fake_provider_fault",
},
});
finishRequestRecord(requestRecord, startedPerf, {
status: "http_fault",
http_status: config.fault_status,
});
return;
}
if (body.stream) {
await streamCompletion(response, {
requestId,
model: body.model || modelName,
content: replyText,
failAfterFirstChunk: config.fail_after_first_chunk,
requestRecord,
startedPerf,
});
} else {
await sleep(config.first_token_delay_ms + config.chunk_delay_ms);
sendJson(response, 200, completionPayload({
requestId,
model: body.model || modelName,
content: replyText,
}));
markRequestTiming(requestRecord, "first_chunk", startedPerf);
markRequestTiming(requestRecord, "first_content_chunk", startedPerf);
requestRecord.content_chunk_count = 1;
finishRequestRecord(requestRecord, startedPerf, {
status: "ok",
http_status: 200,
});
}
return;
}
sendJson(response, 404, {
error: {
message: `No fake provider route for ${request.method} ${url.pathname}`,
type: "not_found",
},
});
} catch (error) {
if (requestRecord) {
finishRequestRecord(requestRecord, startedPerf, {
status: "fake_provider_error",
http_status: 500,
error: error instanceof Error ? error.message : String(error),
});
}
sendJson(response, 500, {
error: {
message: error instanceof Error ? error.message : String(error),
type: "fake_provider_error",
},
});
} finally {
const durationMs = Date.now() - startedAt;
if (url.pathname !== "/healthz") {
console.log(JSON.stringify({
at: new Date().toISOString(),
method: request.method,
path: url.pathname,
duration_ms: durationMs,
}));
}
}
});
server.listen(port, host, async () => {
const address = server.address();
const selectedPort = typeof address === "object" && address ? address.port : port;
const url = `http://${host}:${selectedPort}`;
const state = {
status: "ready",
pid: process.pid,
url,
base_url: `${url}/v1`,
model: modelName,
started_at: new Date().toISOString(),
};
if (stateFile) {
const path = resolve(stateFile);
await mkdir(dirname(path), { recursive: true });
await writeFile(path, `${JSON.stringify(state, null, 2)}\n`, "utf8");
}
console.log(JSON.stringify(state));
});
server.on("error", (error) => {
console.error(JSON.stringify({
status: "error",
reason: error instanceof Error ? error.message : String(error),
}));
exit(1);
});
process.on("SIGTERM", () => {
server.close(() => exit(0));
});
function parseArgs(argv) {
const result = {};
for (const item of argv) {
const match = item.match(/^--([^=]+)(?:=(.*))?$/);
if (!match) continue;
result[match[1]] = match[2] ?? "1";
}
return result;
}
function integer(value, fallback) {
const parsed = Number.parseInt(String(value ?? ""), 10);
return Number.isFinite(parsed) && parsed >= 0 ? parsed : fallback;
}
function bool(value, fallback) {
if (value === undefined || value === "") return fallback;
if (/^(1|true|yes|on)$/i.test(String(value))) return true;
if (/^(0|false|no|off)$/i.test(String(value))) return false;
return fallback;
}
function sleep(ms) {
return new Promise((resolve) => setTimeout(resolve, Math.max(0, ms)));
}
async function readJson(request) {
let text = "";
for await (const chunk of request) text += chunk.toString();
if (!text) return {};
return JSON.parse(text);
}
function sendJson(response, status, payload) {
const text = `${JSON.stringify(payload)}\n`;
response.writeHead(status, {
"content-type": "application/json",
"content-length": Buffer.byteLength(text),
});
response.end(text);
}
function completionPayload({ requestId, model, content }) {
const completionTokens = tokenEstimate(content);
return {
id: requestId,
object: "chat.completion",
created: Math.floor(Date.now() / 1000),
model,
choices: [
{
index: 0,
message: {
role: "assistant",
content,
},
finish_reason: "stop",
},
],
usage: {
prompt_tokens: 8,
completion_tokens: completionTokens,
total_tokens: 8 + completionTokens,
},
};
}
async function streamCompletion(response, {
requestId,
model,
content,
failAfterFirstChunk: failMidStream,
requestRecord,
startedPerf,
}) {
response.writeHead(200, {
"content-type": "text/event-stream; charset=utf-8",
"cache-control": "no-cache",
"connection": "keep-alive",
});
await sleep(config.first_token_delay_ms);
markRequestTiming(requestRecord, "first_chunk", startedPerf);
writeSse(response, {
id: requestId,
object: "chat.completion.chunk",
created: Math.floor(Date.now() / 1000),
model,
choices: [{ index: 0, delta: { role: "assistant" }, finish_reason: null }],
});
const chunks = splitContent(content);
for (let index = 0; index < chunks.length; index += 1) {
await sleep(config.chunk_delay_ms);
if (index === 0) markRequestTiming(requestRecord, "first_content_chunk", startedPerf);
requestRecord.content_chunk_count = (requestRecord.content_chunk_count || 0) + 1;
writeSse(response, {
id: requestId,
object: "chat.completion.chunk",
created: Math.floor(Date.now() / 1000),
model,
choices: [{ index: 0, delta: { content: chunks[index] }, finish_reason: null }],
});
if (failMidStream && index === 0) {
finishRequestRecord(requestRecord, startedPerf, {
status: "mid_stream_disconnect",
http_status: 200,
});
response.destroy(new Error("LangBot fake provider injected mid-stream disconnect"));
return;
}
}
await sleep(config.chunk_delay_ms);
const completionTokens = tokenEstimate(content);
writeSse(response, {
id: requestId,
object: "chat.completion.chunk",
created: Math.floor(Date.now() / 1000),
model,
choices: [{ index: 0, delta: {}, finish_reason: "stop" }],
usage: {
prompt_tokens: 8,
completion_tokens: completionTokens,
total_tokens: 8 + completionTokens,
},
});
response.write("data: [DONE]\n\n");
response.end();
finishRequestRecord(requestRecord, startedPerf, {
status: "ok",
http_status: 200,
});
}
function writeSse(response, payload) {
response.write(`data: ${JSON.stringify(payload)}\n\n`);
}
function splitContent(content) {
const text = String(content);
const requested = config.chunk_count;
if (requested <= 1 || text.length <= 1) return [text];
const chunkSize = Math.max(1, Math.ceil(text.length / requested));
const chunks = [];
for (let index = 0; index < text.length; index += chunkSize) {
chunks.push(text.slice(index, index + chunkSize));
}
return chunks;
}
function tokenEstimate(content) {
return Math.max(1, Math.ceil(String(content || "").length / 4));
}
function responseTextForBody(body) {
if (!config.dynamic_response) {
return config.response_text;
}
const messages = Array.isArray(body.messages) ? body.messages : [];
const lastUser = [...messages].reverse().find((message) => message?.role === "user");
const text = flattenContent(lastUser?.content || "");
const quoted = text.match(/["'“”](.{1,80}?)["'“”]/);
if (quoted?.[1]) return quoted[1].trim();
const exact = text.match(/(?:reply|回复|输出|return)\s+(?:exactly\s+)?([A-Za-z0-9_.:@-]{1,80})/i);
if (exact?.[1]) return exact[1].trim().replace(/[。.!?]+$/, "");
const only = text.match(/只回复\s*([A-Za-z0-9_.:@-]{1,80})/);
if (only?.[1]) return only[1].trim().replace(/[。.!?]+$/, "");
return config.response_text;
}
function flattenContent(content) {
if (typeof content === "string") return content;
if (Array.isArray(content)) {
return content
.map((item) => {
if (typeof item === "string") return item;
if (item && typeof item === "object") return item.text || "";
return "";
})
.join("\n");
}
return "";
}
function recordRequest(entry) {
const item = {
...entry,
at: new Date().toISOString(),
finished_at: null,
finished_epoch_ms: null,
duration_ms: null,
first_chunk_at: null,
first_chunk_epoch_ms: null,
first_chunk_ms: null,
first_content_chunk_at: null,
first_content_chunk_epoch_ms: null,
first_content_chunk_ms: null,
content_chunk_count: 0,
};
recentRequests.push(item);
while (recentRequests.length > config.request_log_limit) recentRequests.shift();
return item;
}
function markRequestTiming(entry, key, startedPerf) {
if (!entry || entry[`${key}_at`]) return;
const now = Date.now();
entry[`${key}_at`] = new Date(now).toISOString();
entry[`${key}_epoch_ms`] = now;
entry[`${key}_ms`] = rounded(performance.now() - startedPerf);
}
function finishRequestRecord(entry, startedPerf, updates = {}) {
if (!entry || entry.finished_at) return;
const now = Date.now();
Object.assign(entry, updates);
entry.finished_at = new Date(now).toISOString();
entry.finished_epoch_ms = now;
entry.duration_ms = rounded(performance.now() - startedPerf);
}
function rounded(value) {
return Number(value.toFixed(3));
}
function previewText(value) {
return String(value || "").slice(0, 120);
}
function resetRequestState() {
requestCount = 0;
recentRequests.length = 0;
}
function applyConfig(updates) {
if (!updates || typeof updates !== "object") return;
assignString(updates, "response_text");
assignNonNegativeInteger(updates, "first_token_delay_ms");
assignNonNegativeInteger(updates, "chunk_delay_ms");
assignNonNegativeInteger(updates, "chunk_count");
assignNonNegativeInteger(updates, "fail_first_n");
assignNonNegativeInteger(updates, "fail_every_n");
assignNonNegativeInteger(updates, "request_log_limit");
if (updates.fault_status !== undefined) {
const parsed = Number.parseInt(String(updates.fault_status), 10);
if (Number.isInteger(parsed) && parsed >= 400 && parsed <= 599) config.fault_status = parsed;
}
assignBoolean(updates, "fail_after_first_chunk");
assignBoolean(updates, "dynamic_response");
}
function assignString(updates, key) {
if (updates[key] !== undefined) config[key] = String(updates[key]);
}
function assignNonNegativeInteger(updates, key) {
if (updates[key] === undefined) return;
const parsed = Number.parseInt(String(updates[key]), 10);
if (Number.isInteger(parsed) && parsed >= 0) config[key] = parsed;
}
function assignBoolean(updates, key) {
if (updates[key] === undefined) return;
config[key] = bool(updates[key], config[key]);
}
+2 -1
View File
@@ -72,6 +72,7 @@ export async function writeResult(paths, result) {
} }
export async function loadEnvFiles(paths = ["skills/.env", "skills/.env.local"]) { export async function loadEnvFiles(paths = ["skills/.env", "skills/.env.local"]) {
const processEnvKeys = new Set(Object.keys(env));
for (const path of paths) { for (const path of paths) {
let text = ""; let text = "";
try { try {
@@ -86,7 +87,7 @@ export async function loadEnvFiles(paths = ["skills/.env", "skills/.env.local"])
if (equals <= 0) continue; if (equals <= 0) continue;
const key = trimmed.slice(0, equals).trim(); const key = trimmed.slice(0, equals).trim();
const value = trimmed.slice(equals + 1).trim().replace(/^["']|["']$/g, ""); const value = trimmed.slice(equals + 1).trim().replace(/^["']|["']$/g, "");
if (!(key in env)) env[key] = value; if (!processEnvKeys.has(key)) env[key] = value;
} }
} }
} }
+79 -1
View File
@@ -54,6 +54,7 @@ const debugChatSessionType = env.LANGBOT_E2E_DEBUG_CHAT_SESSION_TYPE || "person"
const pipelineConfigDiagnosticPath = resolve(paths.evidenceDir, "pipeline-config-diagnostic.json"); const pipelineConfigDiagnosticPath = resolve(paths.evidenceDir, "pipeline-config-diagnostic.json");
const debugChatResetDiagnosticPath = resolve(paths.evidenceDir, "debug-chat-reset-diagnostic.json"); const debugChatResetDiagnosticPath = resolve(paths.evidenceDir, "debug-chat-reset-diagnostic.json");
const pipelineConfigRestoreDiagnosticPath = resolve(paths.evidenceDir, "pipeline-config-restore-diagnostic.json"); const pipelineConfigRestoreDiagnosticPath = resolve(paths.evidenceDir, "pipeline-config-restore-diagnostic.json");
const metricsPath = resolve(paths.evidenceDir, "metrics.json");
const startedAt = new Date(); const startedAt = new Date();
let browser; let browser;
@@ -80,10 +81,11 @@ let result = {
console_log: paths.consoleLog, console_log: paths.consoleLog,
network_log: paths.networkLog, network_log: paths.networkLog,
screenshot: paths.screenshot, screenshot: paths.screenshot,
metrics_json: metricsPath,
automation_result_json: paths.automationResultJson, automation_result_json: paths.automationResultJson,
result_json: paths.resultJson, result_json: paths.resultJson,
}, },
evidence_collected: ["ui", "screenshot", "console", "network"], evidence_collected: ["ui", "screenshot", "console", "network", "metrics"],
}; };
function boolFromEnv(value, defaultValue) { function boolFromEnv(value, defaultValue) {
@@ -103,6 +105,29 @@ function parseJsonEnv(key, fallback) {
} }
} }
function positiveNumberEnv(key, fallback) {
const value = Number(env[key] || "");
return Number.isFinite(value) && value >= 0 ? value : fallback;
}
function percentile(values, percentileValue) {
if (values.length === 0) return 0;
const sorted = [...values].sort((a, b) => a - b);
const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
return Number(sorted[index].toFixed(3));
}
function stats(values) {
if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
return {
min: Number(Math.min(...values).toFixed(3)),
p50: percentile(values, 50),
p95: percentile(values, 95),
p99: percentile(values, 99),
max: Number(Math.max(...values).toFixed(3)),
};
}
function promptStepsFromEnv() { function promptStepsFromEnv() {
const rawSteps = parseJsonEnv("LANGBOT_E2E_PROMPTS_JSON", null); const rawSteps = parseJsonEnv("LANGBOT_E2E_PROMPTS_JSON", null);
if (rawSteps === null) { if (rawSteps === null) {
@@ -658,6 +683,7 @@ try {
} else { } else {
for (let index = 0; index < promptSteps.length; index += 1) { for (let index = 0; index < promptSteps.length; index += 1) {
const step = promptSteps[index]; const step = promptSteps[index];
const promptStartedAt = Date.now();
const chatResult = await runDebugChatPrompt(page, { const chatResult = await runDebugChatPrompt(page, {
prompt: step.prompt, prompt: step.prompt,
expectedText: step.expectedText, expectedText: step.expectedText,
@@ -665,11 +691,13 @@ try {
imagePath: index === 0 ? imagePath : "", imagePath: index === 0 ? imagePath : "",
failureSignals: failureSignals.length > 0 ? failureSignals : undefined, failureSignals: failureSignals.length > 0 ? failureSignals : undefined,
}); });
const promptDurationMs = Date.now() - promptStartedAt;
result.chat_results.push({ result.chat_results.push({
index, index,
expected_text: step.expectedText, expected_text: step.expectedText,
status: chatResult.status, status: chatResult.status,
reason: chatResult.reason, reason: chatResult.reason,
response_duration_ms: promptDurationMs,
min_expected_count: chatResult.min_expected_count, min_expected_count: chatResult.min_expected_count,
final_count: chatResult.final_count, final_count: chatResult.final_count,
before_assistant_expected_count: chatResult.before_assistant_expected_count, before_assistant_expected_count: chatResult.before_assistant_expected_count,
@@ -714,6 +742,56 @@ try {
const finishedAt = new Date(); const finishedAt = new Date();
result.finished_at = finishedAt.toISOString(); result.finished_at = finishedAt.toISOString();
result.finished_at_local = localIsoWithOffset(finishedAt); result.finished_at_local = localIsoWithOffset(finishedAt);
result.duration_ms = finishedAt.getTime() - startedAt.getTime();
const responseDurations = result.chat_results
.map((item) => item.response_duration_ms)
.filter((value) => Number.isFinite(value));
const passedPrompts = result.chat_results.filter((item) => item.status === "pass").length;
const attemptedPrompts = result.chat_results.length;
const errorRate = attemptedPrompts === 0 ? 1 : Number(((attemptedPrompts - passedPrompts) / attemptedPrompts).toFixed(4));
const responseStats = stats(responseDurations);
const responseP95BudgetMs = positiveNumberEnv(
"LANGBOT_E2E_DEBUG_CHAT_RESPONSE_P95_MS",
positiveNumberEnv("LANGBOT_DEBUG_CHAT_RESPONSE_P95_MS", safeResponseTimeoutMs),
);
const maxErrorRate = positiveNumberEnv("LANGBOT_E2E_DEBUG_CHAT_MAX_ERROR_RATE", 0);
const metrics = {
probe: caseId,
url: result.url,
prompt_count: result.prompt_count,
attempted_prompt_count: attemptedPrompts,
passed_prompt_count: passedPrompts,
error_rate: errorRate,
response_duration_ms: responseStats,
total_duration_ms: result.duration_ms,
chat_results: result.chat_results,
};
result.metrics_summary = {
prompt_count: metrics.prompt_count,
attempted_prompt_count: metrics.attempted_prompt_count,
passed_prompt_count: metrics.passed_prompt_count,
error_rate: metrics.error_rate,
response_p50_ms: metrics.response_duration_ms.p50,
response_p95_ms: metrics.response_duration_ms.p95,
total_duration_ms: metrics.total_duration_ms,
};
result.thresholds_summary = {
response_p95_ms: {
actual: metrics.response_duration_ms.p95,
max: responseP95BudgetMs,
pass: attemptedPrompts > 0 && metrics.response_duration_ms.p95 <= responseP95BudgetMs,
},
error_rate: {
actual: metrics.error_rate,
max: maxErrorRate,
pass: metrics.error_rate <= maxErrorRate,
},
};
await writeFile(metricsPath, `${JSON.stringify(metrics, null, 2)}\n`, "utf8");
if (result.status === "pass" && !Object.values(result.thresholds_summary).every((item) => item.pass)) {
result.status = "fail";
result.reason = "Debug Chat performance breached response latency or error-rate thresholds.";
}
const existingEvidence = {}; const existingEvidence = {};
for (const [key, value] of Object.entries(result.evidence)) { for (const [key, value] of Object.entries(result.evidence)) {
if (typeof value !== "string") continue; if (typeof value !== "string") continue;
+459
View File
@@ -130,6 +130,7 @@
"references/local-agent-runner.md", "references/local-agent-runner.md",
"references/mcp-stdio-testing.md", "references/mcp-stdio-testing.md",
"references/model-provider-testing.md", "references/model-provider-testing.md",
"references/performance-reliability-testing.md",
"references/pipeline-debug-chat.md", "references/pipeline-debug-chat.md",
"references/plugin-e2e-smoke.md", "references/plugin-e2e-smoke.md",
"references/sandbox-skill-authoring.md", "references/sandbox-skill-authoring.md",
@@ -150,6 +151,16 @@
"agent-runner-release-preflight", "agent-runner-release-preflight",
"agent-runner-runtime-chaos", "agent-runner-runtime-chaos",
"dify-agent-debug-chat", "dify-agent-debug-chat",
"langbot-fake-provider-debug-chat-cross-pipeline-isolation",
"langbot-fake-provider-debug-chat-fault-recovery",
"langbot-fake-provider-debug-chat-load",
"langbot-fake-provider-debug-chat-slow-load",
"langbot-fault-taxonomy-contract",
"langbot-live-backend-latency",
"langbot-live-backend-log-health",
"langbot-live-control-plane-api",
"langbot-overhead-accounting-contract",
"langbot-space-debug-chat-concurrency-smoke",
"langrag-kb-retrieve", "langrag-kb-retrieve",
"langrag-parser-golden-e2e", "langrag-parser-golden-e2e",
"langrag-sentinel-kb-discover", "langrag-sentinel-kb-discover",
@@ -165,6 +176,7 @@
"mcp-stdio-register", "mcp-stdio-register",
"mcp-stdio-tool-call", "mcp-stdio-tool-call",
"pipeline-debug-chat", "pipeline-debug-chat",
"pipeline-debug-chat-performance",
"plugin-e2e-smoke", "plugin-e2e-smoke",
"provider-deepseek", "provider-deepseek",
"qa-plugin-smoke-live-install", "qa-plugin-smoke-live-install",
@@ -486,6 +498,316 @@
"backend_log" "backend_log"
] ]
}, },
{
"id": "langbot-fake-provider-debug-chat-cross-pipeline-isolation",
"title": "LangBot Debug Chat fake-provider cross-pipeline isolation probe",
"mode": "probe",
"area": "reliability",
"type": "reliability",
"priority": "p1",
"risk": "high",
"ci_eligible": false,
"tags": [
"reliability",
"debug-chat",
"websocket",
"fake-provider",
"isolation",
"concurrency",
"metrics"
],
"automation": "skills/langbot-testing/probes/langbot-debug-chat-cross-pipeline-isolation.mjs",
"setup_automation": [
"node:scripts/e2e/ensure-fake-provider-cross-pipelines.mjs --write-env"
],
"setup_provides_env": [
"LANGBOT_FAKE_PROVIDER_URL",
"LANGBOT_FAKE_PROVIDER_BASE_URL",
"LANGBOT_FAKE_PROVIDER_PID",
"LANGBOT_FAKE_PROVIDER_PIPELINE_A_URL",
"LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME",
"LANGBOT_FAKE_PROVIDER_PIPELINE_B_URL",
"LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME"
],
"evidence_required": [
"metrics",
"network",
"api_diagnostic",
"filesystem"
]
},
{
"id": "langbot-fake-provider-debug-chat-fault-recovery",
"title": "LangBot Debug Chat fake-provider fault recovery probe",
"mode": "probe",
"area": "reliability",
"type": "chaos",
"priority": "p1",
"risk": "high",
"ci_eligible": false,
"tags": [
"reliability",
"chaos",
"debug-chat",
"websocket",
"fake-provider",
"fault-injection",
"metrics"
],
"automation": "skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs",
"setup_automation": [
"node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
],
"setup_provides_env": [
"LANGBOT_FAKE_PROVIDER_URL",
"LANGBOT_FAKE_PROVIDER_BASE_URL",
"LANGBOT_FAKE_PROVIDER_PID",
"LANGBOT_FAKE_PROVIDER_PROVIDER_UUID",
"LANGBOT_FAKE_PROVIDER_MODEL_UUID",
"LANGBOT_FAKE_PROVIDER_PIPELINE_URL",
"LANGBOT_FAKE_PROVIDER_PIPELINE_NAME"
],
"evidence_required": [
"metrics",
"network",
"api_diagnostic",
"filesystem"
]
},
{
"id": "langbot-fake-provider-debug-chat-load",
"title": "LangBot Debug Chat controlled fake-provider load probe",
"mode": "probe",
"area": "performance",
"type": "performance",
"priority": "p1",
"risk": "medium",
"ci_eligible": false,
"tags": [
"performance",
"debug-chat",
"websocket",
"fake-provider",
"load",
"metrics"
],
"automation": "skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs",
"setup_automation": [
"node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
],
"setup_provides_env": [
"LANGBOT_FAKE_PROVIDER_URL",
"LANGBOT_FAKE_PROVIDER_BASE_URL",
"LANGBOT_FAKE_PROVIDER_PID",
"LANGBOT_FAKE_PROVIDER_PROVIDER_UUID",
"LANGBOT_FAKE_PROVIDER_MODEL_UUID",
"LANGBOT_FAKE_PROVIDER_PIPELINE_URL",
"LANGBOT_FAKE_PROVIDER_PIPELINE_NAME"
],
"evidence_required": [
"metrics",
"network",
"api_diagnostic",
"filesystem"
]
},
{
"id": "langbot-fake-provider-debug-chat-slow-load",
"title": "LangBot Debug Chat slow fake-provider load probe",
"mode": "probe",
"area": "performance",
"type": "performance",
"priority": "p1",
"risk": "medium",
"ci_eligible": false,
"tags": [
"performance",
"debug-chat",
"websocket",
"fake-provider",
"slow-provider",
"load",
"metrics"
],
"automation": "skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs",
"setup_automation": [
"node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
],
"setup_provides_env": [
"LANGBOT_FAKE_PROVIDER_URL",
"LANGBOT_FAKE_PROVIDER_BASE_URL",
"LANGBOT_FAKE_PROVIDER_PID",
"LANGBOT_FAKE_PROVIDER_PROVIDER_UUID",
"LANGBOT_FAKE_PROVIDER_MODEL_UUID",
"LANGBOT_FAKE_PROVIDER_PIPELINE_URL",
"LANGBOT_FAKE_PROVIDER_PIPELINE_NAME"
],
"evidence_required": [
"metrics",
"network",
"api_diagnostic",
"filesystem"
]
},
{
"id": "langbot-fault-taxonomy-contract",
"title": "LangBot fault taxonomy and cleanup contract",
"mode": "probe",
"area": "reliability",
"type": "chaos",
"priority": "p1",
"risk": "medium",
"ci_eligible": true,
"tags": [
"reliability",
"chaos",
"contract",
"synthetic"
],
"automation": "skills/langbot-testing/probes/langbot-fault-taxonomy-contract.mjs",
"setup_automation": [],
"setup_provides_env": [],
"evidence_required": [
"metrics",
"filesystem"
]
},
{
"id": "langbot-live-backend-latency",
"title": "LangBot live backend basic latency probe",
"mode": "probe",
"area": "performance",
"type": "performance",
"priority": "p1",
"risk": "medium",
"ci_eligible": false,
"tags": [
"performance",
"live-backend",
"latency",
"metrics"
],
"automation": "skills/langbot-testing/probes/langbot-live-backend-latency.mjs",
"setup_automation": [],
"setup_provides_env": [],
"evidence_required": [
"metrics",
"network",
"api_diagnostic",
"filesystem"
]
},
{
"id": "langbot-live-backend-log-health",
"title": "LangBot live backend log health probe",
"mode": "probe",
"area": "reliability",
"type": "reliability",
"priority": "p1",
"risk": "medium",
"ci_eligible": false,
"tags": [
"reliability",
"live-backend",
"backend-log",
"metrics"
],
"automation": "skills/langbot-testing/probes/langbot-live-backend-log-health.mjs",
"setup_automation": [],
"setup_provides_env": [],
"evidence_required": [
"metrics",
"backend_log",
"filesystem"
]
},
{
"id": "langbot-live-control-plane-api",
"title": "LangBot live control-plane API probe",
"mode": "probe",
"area": "performance",
"type": "performance",
"priority": "p1",
"risk": "medium",
"ci_eligible": false,
"tags": [
"performance",
"reliability",
"live-backend",
"control-plane",
"metrics"
],
"automation": "skills/langbot-testing/probes/langbot-live-control-plane-api.mjs",
"setup_automation": [],
"setup_provides_env": [],
"evidence_required": [
"metrics",
"network",
"api_diagnostic",
"filesystem"
]
},
{
"id": "langbot-overhead-accounting-contract",
"title": "LangBot overhead accounting metrics contract",
"mode": "probe",
"area": "performance",
"type": "performance",
"priority": "p1",
"risk": "medium",
"ci_eligible": true,
"tags": [
"performance",
"metrics",
"contract",
"synthetic"
],
"automation": "skills/langbot-testing/probes/langbot-overhead-accounting-contract.mjs",
"setup_automation": [],
"setup_provides_env": [],
"evidence_required": [
"metrics",
"resource_log",
"filesystem"
]
},
{
"id": "langbot-space-debug-chat-concurrency-smoke",
"title": "LangBot Debug Chat real Space-provider concurrency smoke",
"mode": "probe",
"area": "performance",
"type": "performance",
"priority": "p1",
"risk": "high",
"ci_eligible": false,
"tags": [
"performance",
"debug-chat",
"websocket",
"space",
"live-provider",
"smoke",
"metrics"
],
"automation": "skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs",
"setup_automation": [
"node:scripts/e2e/ensure-local-agent-pipeline.mjs --write-env"
],
"setup_provides_env": [
"LANGBOT_PIPELINE_URL",
"LANGBOT_PIPELINE_NAME",
"LANGBOT_LOCAL_AGENT_PIPELINE_URL",
"LANGBOT_LOCAL_AGENT_PIPELINE_NAME",
"LANGBOT_LOCAL_AGENT_MODEL_UUID",
"LANGBOT_E2E_MODEL_UUID"
],
"evidence_required": [
"metrics",
"network",
"api_diagnostic",
"filesystem"
]
},
{ {
"id": "langrag-kb-retrieve", "id": "langrag-kb-retrieve",
"title": "LangRAG knowledge base ingests and retrieves a sentinel document", "title": "LangRAG knowledge base ingests and retrieves a sentinel document",
@@ -911,6 +1233,38 @@
"backend_log" "backend_log"
] ]
}, },
{
"id": "pipeline-debug-chat-performance",
"title": "Pipeline Debug Chat user-path performance probe",
"mode": "agent-browser",
"area": "pipeline",
"type": "performance",
"priority": "p1",
"risk": "medium",
"ci_eligible": false,
"tags": [
"performance",
"pipeline",
"debug-chat",
"user-path",
"metrics"
],
"automation": "scripts/e2e/pipeline-debug-chat.mjs",
"setup_automation": [
"node:scripts/e2e/ensure-local-agent-pipeline.mjs --write-env"
],
"setup_provides_env": [
"LANGBOT_PIPELINE_URL",
"LANGBOT_PIPELINE_NAME"
],
"evidence_required": [
"ui",
"screenshot",
"console",
"network",
"metrics"
]
},
{ {
"id": "plugin-e2e-smoke", "id": "plugin-e2e-smoke",
"title": "Plugin system installs a local plugin and exposes tool/page APIs", "title": "Plugin system installs a local plugin and exposes tool/page APIs",
@@ -1059,6 +1413,11 @@
"suites": [ "suites": [
"agent-runner-release-gate", "agent-runner-release-gate",
"core-smoke", "core-smoke",
"langbot-debug-chat-load-gate",
"langbot-live-backend-gate",
"langbot-performance-contract-gate",
"langbot-performance-reliability-gate",
"langbot-user-path-performance-gate",
"local-agent-gate" "local-agent-gate"
], ],
"suite_summaries": [ "suite_summaries": [
@@ -1121,6 +1480,97 @@
"local-agent-basic-debug-chat" "local-agent-basic-debug-chat"
] ]
}, },
{
"id": "langbot-debug-chat-load-gate",
"title": "LangBot Debug Chat load gate",
"description": "Message-path load checks for Pipeline Debug Chat: controlled fake-provider baseline, slow-provider and fault-recovery profiles, plus optional real Space-provider smoke.",
"type": "performance",
"priority": "p1",
"tags": [
"performance",
"debug-chat",
"websocket",
"load"
],
"cases": [
"langbot-fake-provider-debug-chat-load",
"langbot-fake-provider-debug-chat-slow-load",
"langbot-fake-provider-debug-chat-cross-pipeline-isolation",
"langbot-fake-provider-debug-chat-fault-recovery",
"langbot-space-debug-chat-concurrency-smoke"
]
},
{
"id": "langbot-live-backend-gate",
"title": "LangBot live backend reliability gate",
"description": "Live backend control-plane responsiveness and runtime log health checks for a locally running LangBot instance.",
"type": "reliability",
"priority": "p1",
"tags": [
"performance",
"reliability",
"live-backend",
"metrics"
],
"cases": [
"langbot-live-backend-latency",
"langbot-live-control-plane-api",
"langbot-live-backend-log-health"
]
},
{
"id": "langbot-performance-contract-gate",
"title": "LangBot performance contract gate",
"description": "Fast synthetic contract checks for performance metric accounting and non-destructive reliability fault taxonomy.",
"type": "contract",
"priority": "p1",
"tags": [
"performance",
"reliability",
"contract",
"metrics"
],
"cases": [
"langbot-overhead-accounting-contract",
"langbot-fault-taxonomy-contract"
]
},
{
"id": "langbot-performance-reliability-gate",
"title": "LangBot performance and reliability starter gate",
"description": "Starter gate for LangBot performance accounting, live backend control-plane latency, and non-destructive fault taxonomy checks.",
"type": "reliability",
"priority": "p1",
"tags": [
"performance",
"reliability",
"metrics",
"chaos"
],
"cases": [
"langbot-overhead-accounting-contract",
"langbot-fault-taxonomy-contract",
"langbot-live-backend-latency",
"langbot-live-control-plane-api",
"langbot-live-backend-log-health"
]
},
{
"id": "langbot-user-path-performance-gate",
"title": "LangBot user-path performance gate",
"description": "Browser-visible performance checks for user-facing LangBot paths such as Pipeline Debug Chat.",
"type": "performance",
"priority": "p1",
"tags": [
"performance",
"browser",
"debug-chat",
"user-path"
],
"cases": [
"pipeline-debug-chat-performance"
]
},
{ {
"id": "local-agent-gate", "id": "local-agent-gate",
"title": "Local Agent runner regression gate", "title": "Local Agent runner regression gate",
@@ -1265,6 +1715,7 @@
"sandbox-native-tools-unavailable", "sandbox-native-tools-unavailable",
"socks-proxy-without-socksio", "socks-proxy-without-socksio",
"survey-widget-blocks-debug-chat", "survey-widget-blocks-debug-chat",
"telemetry-proxy-noise",
"tool-name-collision-between-mcp-and-plugin", "tool-name-collision-between-mcp-and-plugin",
"uv-run-resyncs-local-sdk" "uv-run-resyncs-local-sdk"
], ],
@@ -1449,6 +1900,14 @@
"mcp-stdio-tool-call" "mcp-stdio-tool-call"
] ]
}, },
{
"id": "telemetry-proxy-noise",
"title": "Telemetry posting fails through the proxy while the target flow succeeds",
"category": "env_issue",
"related_cases": [
"langbot-space-debug-chat-concurrency-smoke"
]
},
{ {
"id": "tool-name-collision-between-mcp-and-plugin", "id": "tool-name-collision-between-mcp-and-plugin",
"title": "MCP and plugin expose the same tool name", "title": "MCP and plugin expose the same tool name",
+17
View File
@@ -26,6 +26,23 @@ LANGBOT_NO_PROXY=localhost,127.0.0.1,::1
LANGBOT_PIPELINE_URL= LANGBOT_PIPELINE_URL=
LANGBOT_PIPELINE_NAME= LANGBOT_PIPELINE_NAME=
# Optional fake OpenAI-compatible provider controls for Debug Chat load tests.
# Leave URL empty to let setup automation start a local provider and write the
# selected URL to skills/.env.local.
LANGBOT_FAKE_PROVIDER_URL=
LANGBOT_FAKE_PROVIDER_HOST=127.0.0.1
LANGBOT_FAKE_PROVIDER_PORT=
LANGBOT_FAKE_PROVIDER_MODEL_NAME=gpt-4o-mini
LANGBOT_FAKE_PROVIDER_RESPONSE_TEXT=OK
LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS=25
LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS=10
LANGBOT_FAKE_PROVIDER_CHUNK_COUNT=0
LANGBOT_FAKE_PROVIDER_FAIL_FIRST_N=0
LANGBOT_FAKE_PROVIDER_FAIL_EVERY_N=0
LANGBOT_FAKE_PROVIDER_FAULT_STATUS=500
LANGBOT_FAKE_PROVIDER_FAIL_AFTER_FIRST_CHUNK=false
LANGBOT_FAKE_PROVIDER_DYNAMIC_RESPONSE=true
# Optional case-specific runner targets. Prefer these for runner-specific cases # Optional case-specific runner targets. Prefer these for runner-specific cases
# so the automation cannot silently test the wrong runner. # so the automation cannot silently test the wrong runner.
LANGBOT_LOCAL_AGENT_PIPELINE_URL= LANGBOT_LOCAL_AGENT_PIPELINE_URL=
@@ -53,7 +53,7 @@ Start the new frontend from the web repo:
```bash ```bash
cd "$LANGBOT_WEB_REPO" cd "$LANGBOT_WEB_REPO"
npm run dev VITE_API_BASE_URL="$LANGBOT_BACKEND_URL" pnpm dev --host 0.0.0.0
``` ```
Healthy startup includes: Healthy startup includes:
@@ -68,6 +68,10 @@ Quick check:
curl -I --max-time 3 "$LANGBOT_FRONTEND_URL" curl -I --max-time 3 "$LANGBOT_FRONTEND_URL"
``` ```
If `VITE_API_BASE_URL` is missing, Vite still serves the page but frontend API
calls may go to the frontend port instead of the backend port. That produces
false browser failures in login, wizard, pipeline, and Debug Chat cases.
## Completion Signal ## Completion Signal
Environment setup is not complete until the required frontend/backend URLs are reachable and the chosen browser-control path can open the WebUI. Environment setup is not complete until the required frontend/backend URLs are reachable and the chosen browser-control path can open the WebUI.
+3
View File
@@ -21,6 +21,7 @@ Use this skill when an agent needs to verify LangBot behavior through the WebUI
- **Sandbox-backed skill authoring**: read `references/sandbox-skill-authoring.md`. - **Sandbox-backed skill authoring**: read `references/sandbox-skill-authoring.md`.
- **LangRAG knowledge bases**: read `references/langrag-knowledge-base.md`. - **LangRAG knowledge bases**: read `references/langrag-knowledge-base.md`.
- **MCP stdio tool testing**: read `references/mcp-stdio-testing.md`. - **MCP stdio tool testing**: read `references/mcp-stdio-testing.md`.
- **Performance, reliability, or chaos probes**: read `references/performance-reliability-testing.md`.
- **Drive a live instance over MCP (not raw HTTP)**: use the `langbot-mcp-ops` skill — the instance exposes an MCP server at `http://<host>:5300/mcp` (reuses API keys). Useful for setting up bots/pipelines/models as test fixtures programmatically. - **Drive a live instance over MCP (not raw HTTP)**: use the `langbot-mcp-ops` skill — the instance exposes an MCP server at `http://<host>:5300/mcp` (reuses API keys). Useful for setting up bots/pipelines/models as test fixtures programmatically.
- **Known failures and fixes**: read `references/troubleshooting.md`. - **Known failures and fixes**: read `references/troubleshooting.md`.
- **Reusable test groups**: run `bin/lbs suite list` and `bin/lbs suite plan <suite-id>` before manually assembling a case set. - **Reusable test groups**: run `bin/lbs suite list` and `bin/lbs suite plan <suite-id>` before manually assembling a case set.
@@ -36,6 +37,8 @@ Use this skill when an agent needs to verify LangBot behavior through the WebUI
- Use an authenticated browser profile prepared by `langbot-env-setup`. - Use an authenticated browser profile prepared by `langbot-env-setup`.
- Do not expose API keys, OAuth secrets, tokens, or localStorage token values in output. - Do not expose API keys, OAuth secrets, tokens, or localStorage token values in output.
- A WebUI test is not complete until the visible UI result is checked against backend logs or network behavior. - A WebUI test is not complete until the visible UI result is checked against backend logs or network behavior.
- A performance result is not complete without `metrics` evidence and a clear split between LangBot overhead and external provider/tool/network time.
- A chaos or reliability result is not complete until the fault scope, cleanup, and recovery checks are recorded.
- For a suite, use `bin/lbs suite start <suite-id>` to create the suite evidence root, per-case directories, and `suite-start.json`/`suite-start.md` handoff files; use `bin/lbs test result <case-id>` to write final per-case `result.json`, then run `bin/lbs suite report <suite-id> --evidence-dir <dir>`. - For a suite, use `bin/lbs suite start <suite-id>` to create the suite evidence root, per-case directories, and `suite-start.json`/`suite-start.md` handoff files; use `bin/lbs test result <case-id>` to write final per-case `result.json`, then run `bin/lbs suite report <suite-id> --evidence-dir <dir>`.
- Do not mark a case `pass` until `test result --evidence` covers every value in the case's `evidence_required`. - Do not mark a case `pass` until `test result --evidence` covers every value in the case's `evidence_required`.
- For runner-specific Debug Chat cases, use the case-specific pipeline env declared by `automation_pipeline_url_env` / `automation_pipeline_name_env`; do not silently reuse a generic `LANGBOT_PIPELINE_URL`. - For runner-specific Debug Chat cases, use the case-specific pipeline env declared by `automation_pipeline_url_env` / `automation_pipeline_name_env`; do not silently reuse a generic `LANGBOT_PIPELINE_URL`.
@@ -0,0 +1,81 @@
id: langbot-fake-provider-debug-chat-cross-pipeline-isolation
title: "LangBot Debug Chat fake-provider cross-pipeline isolation probe"
mode: probe
area: reliability
type: reliability
priority: p1
risk: high
ci_eligible: false
tags:
- reliability
- debug-chat
- websocket
- fake-provider
- isolation
- concurrency
- metrics
skills:
- langbot-env-setup
- langbot-testing
env:
- LANGBOT_BACKEND_URL
- LANGBOT_FRONTEND_URL
- LANGBOT_E2E_LOGIN_USER
automation: skills/langbot-testing/probes/langbot-debug-chat-cross-pipeline-isolation.mjs
automation_env:
- LANGBOT_BACKEND_URL
- LANGBOT_E2E_LOGIN_USER
- LANGBOT_FAKE_PROVIDER_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_A_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME
- LANGBOT_FAKE_PROVIDER_PIPELINE_B_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME
automation_debug_chat_load_requests: "6"
automation_debug_chat_load_concurrency: "4"
automation_debug_chat_load_timeout_ms: "30000"
automation_debug_chat_load_response_p95_ms: "5000"
automation_debug_chat_load_max_error_rate: "0"
automation_debug_chat_load_prompt_template: '请只回复 "{expected}",不要解释,不要添加其他字符。'
automation_debug_chat_load_stream: "true"
automation_debug_chat_load_reset: "true"
metrics_thresholds_json: '{"cross_pipeline_leak_count":{"max":0},"response_p95_ms":{"max":5000},"error_rate":{"max":0}}'
load_profile_json: '{"requests_per_pipeline":6,"pipelines":2,"concurrency":4,"path":"Pipeline Debug Chat WebSocket","provider":"controlled fake OpenAI-compatible provider","metric":"cross-pipeline response isolation and send-to-final-assistant-response"}'
setup_automation:
- "node:scripts/e2e/ensure-fake-provider-cross-pipelines.mjs --write-env"
setup_provides_env:
- LANGBOT_FAKE_PROVIDER_URL
- LANGBOT_FAKE_PROVIDER_BASE_URL
- LANGBOT_FAKE_PROVIDER_PID
- LANGBOT_FAKE_PROVIDER_PIPELINE_A_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME
- LANGBOT_FAKE_PROVIDER_PIPELINE_B_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME
steps:
- "Start or reuse the local fake OpenAI-compatible provider."
- "Create or update two local-agent pipelines that both point at the controlled fake provider."
- "Reset both Debug Chat sessions and the fake-provider request log."
- "Open concurrent WebSocket Debug Chat connections to both pipelines and send unique pipeline-scoped response tokens."
checks:
- "automation-result.json status is pass only when every request receives its own expected token and cross_pipeline_leak_count is zero."
- "metrics_summary includes by_pipeline status counts, fake-provider request count, and LangBot/provider timing estimates."
- "samples.json contains per-request pipeline labels so any leak can be attributed to the receiving pipeline."
evidence_required:
- metrics
- network
- api_diagnostic
- filesystem
diagnostics:
- "This probe targets Debug Chat isolation under concurrent traffic from two pipelines."
- "It is designed to expose regressions where global pipeline state causes one pipeline's assistant response to be delivered to another pipeline's Debug Chat session."
- "Same-pipeline foreign responses are tolerated because Debug Chat intentionally broadcasts within the same pipeline/session; cross-pipeline tokens are never tolerated."
success_patterns:
- "Debug Chat cross-pipeline isolation probe passed"
failure_patterns:
- "cross_pipeline_leak"
- "Timed out after"
- "WebSocket connection error"
- "Final assistant response did not include"
troubleshooting:
- backend-not-listening
- debug-chat-history-contaminates-automation
- local-agent-model-route-unavailable
@@ -0,0 +1,95 @@
id: langbot-fake-provider-debug-chat-fault-recovery
title: "LangBot Debug Chat fake-provider fault recovery probe"
mode: probe
area: reliability
type: chaos
priority: p1
risk: high
ci_eligible: false
tags:
- reliability
- chaos
- debug-chat
- websocket
- fake-provider
- fault-injection
- metrics
skills:
- langbot-env-setup
- langbot-testing
env:
- LANGBOT_BACKEND_URL
- LANGBOT_FRONTEND_URL
- LANGBOT_E2E_LOGIN_USER
automation: skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs
automation_env:
- LANGBOT_BACKEND_URL
- LANGBOT_E2E_LOGIN_USER
- LANGBOT_FAKE_PROVIDER_PIPELINE_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
automation_pipeline_url_env: LANGBOT_FAKE_PROVIDER_PIPELINE_URL
automation_pipeline_name_env: LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
automation_debug_chat_load_requests: "6"
automation_debug_chat_load_concurrency: "1"
automation_debug_chat_load_timeout_ms: "15000"
automation_debug_chat_load_response_p95_ms: "5000"
automation_debug_chat_load_max_error_rate: "0"
automation_debug_chat_load_min_ok_count: "6"
automation_debug_chat_load_min_provider_fault_count: "2"
automation_debug_chat_load_expected_prefix: "FAULTQA"
automation_debug_chat_load_prompt_template: '请只回复 "{expected}",不要解释,不要添加其他字符。'
automation_debug_chat_load_stream: "true"
automation_debug_chat_load_reset: "true"
automation_debug_chat_load_fail_on_final_mismatch: "true"
automation_fake_provider_first_token_delay_ms: "25"
automation_fake_provider_chunk_delay_ms: "10"
automation_fake_provider_chunk_count: "0"
automation_fake_provider_fail_first_n: "2"
automation_fake_provider_fail_every_n: "0"
automation_fake_provider_fault_status: "503"
metrics_thresholds_json: '{"response_p95_ms":{"max":5000},"error_rate":{"max":0},"ok_count_min":{"min":6},"fake_provider_fault_count_min":{"min":2}}'
fault_model_json: '{"provider_fault":"HTTP 503 for first 2 fake-provider chat completions after reset","expected_behavior":"LangBot retries or otherwise recovers from bounded provider failures so every Debug Chat request receives its expected response without backend crash."}'
load_profile_json: '{"requests":6,"concurrency":1,"path":"Pipeline Debug Chat WebSocket","provider":"controlled fake OpenAI-compatible provider","classification":"fault-recovery-not-throughput-benchmark"}'
setup_automation:
- "node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
setup_provides_env:
- LANGBOT_FAKE_PROVIDER_URL
- LANGBOT_FAKE_PROVIDER_BASE_URL
- LANGBOT_FAKE_PROVIDER_PID
- LANGBOT_FAKE_PROVIDER_PROVIDER_UUID
- LANGBOT_FAKE_PROVIDER_MODEL_UUID
- LANGBOT_FAKE_PROVIDER_PIPELINE_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
steps:
- "Configure the local fake provider to return HTTP 503 for the first two chat completions after reset."
- "Create or update the LangBot provider, model, and local-agent pipeline that points at the fake provider."
- "Reset the target Debug Chat session and fake-provider request counter."
- "Send a sequential Debug Chat batch and verify later requests recover after the injected provider faults."
checks:
- "automation-result.json status is pass when the fake provider records at least two injected faults, every Debug Chat request succeeds, and total user-visible error rate stays at zero."
- "metrics_summary includes fake_provider_fault_count and status_counts for the same run window."
- "backend logs show request handling for the same run window without unexpected Traceback or task-leak findings."
evidence_required:
- metrics
- network
- api_diagnostic
- filesystem
diagnostics:
- "This is a fault-recovery probe, not a throughput benchmark."
- "Provider faults may be retried inside the provider/requester path; judge this case by fake_provider_fault_count plus user-visible success/error metrics."
- "The profile uses concurrency 1 because Debug Chat broadcasts assistant responses to every connection in a session, and failed responses do not carry the unique success token needed for concurrent attribution."
success_patterns:
- "Debug Chat WebSocket concurrency probe passed"
- "Streaming completed"
failure_patterns:
- "fake_provider_fault"
- "HTTP 503"
- "Timed out after"
- "All models failed during streaming setup"
expected_failures:
- "fake_provider_fault"
- "HTTP 503"
troubleshooting:
- backend-not-listening
- debug-chat-history-contaminates-automation
- local-agent-model-route-unavailable
@@ -0,0 +1,81 @@
id: langbot-fake-provider-debug-chat-load
title: "LangBot Debug Chat controlled fake-provider load probe"
mode: probe
area: performance
type: performance
priority: p1
risk: medium
ci_eligible: false
tags:
- performance
- debug-chat
- websocket
- fake-provider
- load
- metrics
skills:
- langbot-env-setup
- langbot-testing
env:
- LANGBOT_BACKEND_URL
- LANGBOT_FRONTEND_URL
- LANGBOT_E2E_LOGIN_USER
automation: skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs
automation_env:
- LANGBOT_BACKEND_URL
- LANGBOT_E2E_LOGIN_USER
- LANGBOT_FAKE_PROVIDER_PIPELINE_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
automation_pipeline_url_env: LANGBOT_FAKE_PROVIDER_PIPELINE_URL
automation_pipeline_name_env: LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
automation_debug_chat_load_requests: "12"
automation_debug_chat_load_concurrency: "4"
automation_debug_chat_load_timeout_ms: "30000"
automation_debug_chat_load_response_p95_ms: "5000"
automation_debug_chat_load_first_response_p95_ms: "3000"
automation_debug_chat_load_max_error_rate: "0"
automation_debug_chat_load_expected_prefix: "FAKEQA"
automation_debug_chat_load_prompt_template: '请只回复 "{expected}",不要解释,不要添加其他字符。'
automation_debug_chat_load_stream: "true"
automation_debug_chat_load_reset: "true"
metrics_thresholds_json: '{"response_p95_ms":{"max":5000},"first_response_p95_ms":{"max":3000},"error_rate":{"max":0}}'
load_profile_json: '{"requests":12,"concurrency":4,"path":"Pipeline Debug Chat WebSocket","provider":"controlled fake OpenAI-compatible provider","metric":"send-to-final-assistant-response"}'
setup_automation:
- "node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
setup_provides_env:
- LANGBOT_FAKE_PROVIDER_URL
- LANGBOT_FAKE_PROVIDER_BASE_URL
- LANGBOT_FAKE_PROVIDER_PID
- LANGBOT_FAKE_PROVIDER_PROVIDER_UUID
- LANGBOT_FAKE_PROVIDER_MODEL_UUID
- LANGBOT_FAKE_PROVIDER_PIPELINE_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
steps:
- "Start or reuse the local fake OpenAI-compatible provider."
- "Create or update the LangBot provider, model, and local-agent pipeline that points at the fake provider."
- "Reset the target Debug Chat session."
- "Open concurrent WebSocket Debug Chat connections and send unique deterministic prompts through the real backend pipeline."
checks:
- "automation-result.json status is pass when every request receives its own expected assistant response."
- "metrics_summary includes request count, concurrency, p50/p95 response latency, first response latency, throughput, and error rate."
- "thresholds_summary shows response_p95_ms, first_response_p95_ms, and error_rate pass."
evidence_required:
- metrics
- network
- api_diagnostic
- filesystem
diagnostics:
- "This probe removes external model latency from the measurement; it still exercises the live LangBot backend, provider requester, local-agent runner, pipeline, and Debug Chat WebSocket adapter."
- "Use this as the repeatable message-path baseline before comparing against Space or another real provider."
success_patterns:
- "Debug Chat WebSocket concurrency probe passed"
- "Streaming completed"
failure_patterns:
- "WebSocket connection error"
- "Timed out after"
- "Final assistant response did not include"
- "All models failed during streaming setup"
troubleshooting:
- backend-not-listening
- debug-chat-history-contaminates-automation
- local-agent-model-route-unavailable
@@ -0,0 +1,88 @@
id: langbot-fake-provider-debug-chat-slow-load
title: "LangBot Debug Chat slow fake-provider load probe"
mode: probe
area: performance
type: performance
priority: p1
risk: medium
ci_eligible: false
tags:
- performance
- debug-chat
- websocket
- fake-provider
- slow-provider
- load
- metrics
skills:
- langbot-env-setup
- langbot-testing
env:
- LANGBOT_BACKEND_URL
- LANGBOT_FRONTEND_URL
- LANGBOT_E2E_LOGIN_USER
automation: skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs
automation_env:
- LANGBOT_BACKEND_URL
- LANGBOT_E2E_LOGIN_USER
- LANGBOT_FAKE_PROVIDER_PIPELINE_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
automation_pipeline_url_env: LANGBOT_FAKE_PROVIDER_PIPELINE_URL
automation_pipeline_name_env: LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
automation_debug_chat_load_requests: "8"
automation_debug_chat_load_concurrency: "4"
automation_debug_chat_load_timeout_ms: "45000"
automation_debug_chat_load_response_p95_ms: "10000"
automation_debug_chat_load_first_response_p95_ms: "7000"
automation_debug_chat_load_max_error_rate: "0"
automation_debug_chat_load_expected_prefix: "SLOWQA"
automation_debug_chat_load_prompt_template: '请只回复 "{expected}",不要解释,不要添加其他字符。'
automation_debug_chat_load_stream: "true"
automation_debug_chat_load_reset: "true"
automation_fake_provider_first_token_delay_ms: "1000"
automation_fake_provider_chunk_delay_ms: "250"
automation_fake_provider_chunk_count: "4"
automation_fake_provider_fail_first_n: "0"
automation_fake_provider_fail_every_n: "0"
automation_fake_provider_fault_status: "500"
metrics_thresholds_json: '{"response_p95_ms":{"max":10000},"first_response_p95_ms":{"max":7000},"error_rate":{"max":0}}'
load_profile_json: '{"requests":8,"concurrency":4,"path":"Pipeline Debug Chat WebSocket","provider":"controlled slow fake OpenAI-compatible provider","metric":"send-to-final-assistant-response","provider_profile":{"first_token_delay_ms":1000,"chunk_delay_ms":250,"chunk_count":4}}'
setup_automation:
- "node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
setup_provides_env:
- LANGBOT_FAKE_PROVIDER_URL
- LANGBOT_FAKE_PROVIDER_BASE_URL
- LANGBOT_FAKE_PROVIDER_PID
- LANGBOT_FAKE_PROVIDER_PROVIDER_UUID
- LANGBOT_FAKE_PROVIDER_MODEL_UUID
- LANGBOT_FAKE_PROVIDER_PIPELINE_URL
- LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
steps:
- "Configure the local fake provider with deterministic slow streaming latency."
- "Create or update the LangBot provider, model, and local-agent pipeline that points at the fake provider."
- "Reset the target Debug Chat session."
- "Open concurrent WebSocket Debug Chat connections and send unique deterministic prompts through the real backend pipeline."
checks:
- "automation-result.json status is pass when every request receives its own expected assistant response."
- "metrics_summary shows zero errors under the slow-provider profile."
- "thresholds_summary shows response_p95_ms, first_response_p95_ms, and error_rate pass."
evidence_required:
- metrics
- network
- api_diagnostic
- filesystem
diagnostics:
- "This probe keeps the model deterministic while injecting provider latency, so it catches backend timeout, streaming, and WebSocket backpressure issues without Space variability."
- "Compare with langbot-fake-provider-debug-chat-load to separate fixed LangBot overhead from provider-latency amplification."
success_patterns:
- "Debug Chat WebSocket concurrency probe passed"
- "Streaming completed"
failure_patterns:
- "WebSocket connection error"
- "Timed out after"
- "Final assistant response did not include"
- "All models failed during streaming setup"
troubleshooting:
- backend-not-listening
- debug-chat-history-contaminates-automation
- local-agent-model-route-unavailable
@@ -0,0 +1,35 @@
id: langbot-fault-taxonomy-contract
title: "LangBot fault taxonomy and cleanup contract"
mode: probe
area: reliability
type: chaos
priority: p1
risk: medium
ci_eligible: true
tags:
- reliability
- chaos
- contract
- synthetic
skills:
- langbot-testing
automation: skills/langbot-testing/probes/langbot-fault-taxonomy-contract.mjs
fault_model_json: '{"kind":"taxonomy-contract","destructive":false,"scenarios":["provider-timeout","plugin-runtime-disconnect","mcp-stdio-server-exit","operator-missing-login","transient-marketplace-timeout"]}'
steps:
- "Run `rtk bin/lbs test run langbot-fault-taxonomy-contract --dry-run` first; remove `--dry-run` after checking the evidence directory."
- "Automation validates that representative fault scenarios declare target, injected fault, expected status, recovery check, and cleanup."
- "Review metrics.json, fault-model.json, and automation-result.json under LBS_EVIDENCE_DIR."
checks:
- "automation-result.json status is pass."
- "Every scenario has an expected status in pass, fail, blocked, env_issue, or flaky."
- "Every scenario declares a cleanup action and recovery check."
evidence_required:
- metrics
- filesystem
diagnostics:
- "This is a non-destructive taxonomy contract probe; it does not inject real runtime faults."
- "Use it as a gate before adding live chaos cases that kill runtimes, route traffic through a proxy, or disrupt a backend dependency."
success_patterns:
- "Fault taxonomy contract declares status"
failure_patterns:
- "missing required scenario fields"
@@ -0,0 +1,42 @@
id: langbot-live-backend-latency
title: "LangBot live backend basic latency probe"
mode: probe
area: performance
type: performance
priority: p1
risk: medium
ci_eligible: false
tags:
- performance
- live-backend
- latency
- metrics
skills:
- langbot-testing
env:
- LANGBOT_BACKEND_URL
automation: skills/langbot-testing/probes/langbot-live-backend-latency.mjs
metrics_thresholds_json: '{"backend_p95_ms":{"max":1000},"error_rate":{"max":0}}'
load_profile_json: '{"requests":12,"concurrency":2,"endpoints":["/healthz"]}'
steps:
- "Confirm the selected LangBot backend is the intended test target."
- "Run `rtk bin/lbs test run langbot-live-backend-latency --dry-run` first; remove `--dry-run` after checking LANGBOT_BACKEND_URL and evidence directory."
- "Automation sends a small request batch to LANGBOT_BACKEND_URL/healthz and records latency, status counts, and network errors."
checks:
- "automation-result.json status is pass when the backend responds and p95/error-rate thresholds pass."
- "automation-result.json status is env_issue when the backend is not reachable."
- "metrics.json and network.log are written under LBS_EVIDENCE_DIR."
evidence_required:
- metrics
- network
- api_diagnostic
- filesystem
diagnostics:
- "This probe measures backend health endpoint reachability latency only; it does not cover model/provider, browser, Debug Chat, RAG, or plugin runtime latency."
success_patterns:
- "Live backend latency probe passed"
failure_patterns:
- "Backend did not respond"
- "breached latency or error-rate thresholds"
troubleshooting:
- socks-proxy-without-socksio
@@ -0,0 +1,45 @@
id: langbot-live-backend-log-health
title: "LangBot live backend log health probe"
mode: probe
area: reliability
type: reliability
priority: p1
risk: medium
ci_eligible: false
tags:
- reliability
- live-backend
- backend-log
- metrics
skills:
- langbot-testing
env:
- LANGBOT_BACKEND_URL
automation: skills/langbot-testing/probes/langbot-live-backend-log-health.mjs
metrics_thresholds_json: '{"fail_count":{"max":0}}'
load_profile_json: '{"lookback_seconds":300,"log_source":"LANGBOT_BACKEND_LOG or latest LANGBOT_REPO/data/logs/langbot-*.log"}'
steps:
- "Confirm the selected LangBot backend log belongs to the intended test target."
- "Run `rtk bin/lbs test run langbot-live-backend-log-health --dry-run` first; remove `--dry-run` after checking evidence directory and log source."
- "Automation scans the recent backend log window for fail-severity runtime findings such as Traceback, ImportError, ERROR, unclosed sessions, and unawaited coroutines."
checks:
- "automation-result.json status is pass only when fail_count is 0."
- "metrics_summary includes scanned_line_count, fail_count, warning_count, and finding_count."
- "findings.json and scanned-backend.log are written under LBS_EVIDENCE_DIR."
evidence_required:
- metrics
- backend_log
- filesystem
diagnostics:
- "Set LANGBOT_BACKEND_LOG to an explicit log path when the latest log file is not the run target."
- "Set LANGBOT_BACKEND_LOG_SINCE or LANGBOT_BACKEND_LOG_LOOKBACK_SECONDS to control the scan window."
- "This probe measures runtime log health; it does not prove user-facing Debug Chat, plugin, model, or RAG behavior."
success_patterns:
- "Live backend log health passed"
failure_patterns:
- "Traceback"
- "ImportError"
- "ERROR"
- "unclosed"
troubleshooting:
- socks-proxy-without-socksio
@@ -0,0 +1,44 @@
id: langbot-live-control-plane-api
title: "LangBot live control-plane API probe"
mode: probe
area: performance
type: performance
priority: p1
risk: medium
ci_eligible: false
tags:
- performance
- reliability
- live-backend
- control-plane
- metrics
skills:
- langbot-testing
env:
- LANGBOT_BACKEND_URL
automation: skills/langbot-testing/probes/langbot-live-control-plane-api.mjs
metrics_thresholds_json: '{"error_rate":{"max":0},"response_shape_failures":{"max":0},"healthz_p95_ms":{"max":500},"system_info_p95_ms":{"max":1000}}'
load_profile_json: '{"requests":20,"concurrency":4,"endpoints":["/healthz","/api/v1/system/info"],"auth_required":false}'
steps:
- "Confirm the selected LangBot backend is the intended test target."
- "Run `rtk bin/lbs test run langbot-live-control-plane-api --dry-run` first; remove `--dry-run` after checking LANGBOT_BACKEND_URL and evidence directory."
- "Automation sends a small request batch to /healthz and /api/v1/system/info, then validates status code, JSON shape, and latency budgets."
checks:
- "automation-result.json status is pass when every control-plane request returns HTTP 200, JSON code 0, and required response fields."
- "metrics_summary includes per-endpoint p50/p95 latency, error rate, status counts, and response_shape_failures."
- "thresholds_summary shows error_rate, response_shape_failures, healthz_p95_ms, and system_info_p95_ms all pass."
evidence_required:
- metrics
- network
- api_diagnostic
- filesystem
diagnostics:
- "This probe measures unauthenticated backend control-plane readiness; it does not cover authenticated UI flows, Debug Chat, model calls, plugins, or RAG."
- "A system_info shape failure usually means the API contract or startup state changed and should be investigated before treating latency as healthy."
success_patterns:
- "Live control-plane API probe passed"
failure_patterns:
- "Backend did not respond"
- "breached shape, latency, or error-rate thresholds"
troubleshooting:
- socks-proxy-without-socksio
@@ -0,0 +1,37 @@
id: langbot-overhead-accounting-contract
title: "LangBot overhead accounting metrics contract"
mode: probe
area: performance
type: performance
priority: p1
risk: medium
ci_eligible: true
tags:
- performance
- metrics
- contract
- synthetic
skills:
- langbot-testing
automation: skills/langbot-testing/probes/langbot-overhead-accounting-contract.mjs
metrics_thresholds_json: '{"sample_count":{"min":50},"langbot_overhead_p95_ms":{"max":25},"accounting_gap_max_ms":{"max":0.001}}'
load_profile_json: '{"kind":"synthetic-overhead-accounting","samples":80,"external_latency_segments":["provider","external_tool","network"]}'
steps:
- "Run `rtk bin/lbs test run langbot-overhead-accounting-contract --dry-run` first; remove `--dry-run` after checking the evidence directory."
- "Automation generates deterministic message-path latency samples and separates LangBot overhead from provider/tool/network latency."
- "Review metrics.json, thresholds.json, resource-log.json, and automation-result.json under LBS_EVIDENCE_DIR."
checks:
- "automation-result.json status is pass."
- "metrics_summary includes sample_count, langbot_overhead_p95_ms, e2e_latency_p95_ms, external_latency_p95_ms, and accounting_gap_max_ms."
- "thresholds_summary shows sample_count, langbot_overhead_p95_ms, and accounting_gap_max_ms all pass."
evidence_required:
- metrics
- resource_log
- filesystem
diagnostics:
- "This is a synthetic contract probe for the QA harness; it is not live product performance."
- "Use it to verify that reports can carry overhead accounting metrics before running live backend or browser performance probes."
success_patterns:
- "Overhead accounting contract passed"
failure_patterns:
- "breached one or more thresholds"
@@ -0,0 +1,84 @@
id: langbot-space-debug-chat-concurrency-smoke
title: "LangBot Debug Chat real Space-provider concurrency smoke"
mode: probe
area: performance
type: performance
priority: p1
risk: high
ci_eligible: false
tags:
- performance
- debug-chat
- websocket
- space
- live-provider
- smoke
- metrics
skills:
- langbot-env-setup
- langbot-testing
env:
- LANGBOT_BACKEND_URL
- LANGBOT_FRONTEND_URL
- LANGBOT_E2E_LOGIN_USER
automation: skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs
automation_env:
- LANGBOT_BACKEND_URL
- LANGBOT_E2E_LOGIN_USER
- LANGBOT_LOCAL_AGENT_PIPELINE_URL
- LANGBOT_LOCAL_AGENT_PIPELINE_NAME
automation_pipeline_url_env: LANGBOT_LOCAL_AGENT_PIPELINE_URL
automation_pipeline_name_env: LANGBOT_LOCAL_AGENT_PIPELINE_NAME
automation_debug_chat_load_requests: "3"
automation_debug_chat_load_concurrency: "2"
automation_debug_chat_load_timeout_ms: "120000"
automation_debug_chat_load_response_p95_ms: "120000"
automation_debug_chat_load_max_error_rate: "0"
automation_debug_chat_load_expected_prefix: "SPACEQA"
automation_debug_chat_load_prompt_template: '请只回复 "{expected}",不要解释,不要添加其他字符。'
automation_debug_chat_load_stream: "true"
automation_debug_chat_load_reset: "true"
metrics_thresholds_json: '{"response_p95_ms":{"max":120000},"error_rate":{"max":0}}'
load_profile_json: '{"requests":3,"concurrency":2,"path":"Pipeline Debug Chat WebSocket","provider":"LangBot Space model route","metric":"send-to-final-assistant-response","classification":"smoke-not-benchmark"}'
setup_automation:
- "node:scripts/e2e/ensure-local-agent-pipeline.mjs --write-env"
setup_provides_env:
- LANGBOT_PIPELINE_URL
- LANGBOT_PIPELINE_NAME
- LANGBOT_LOCAL_AGENT_PIPELINE_URL
- LANGBOT_LOCAL_AGENT_PIPELINE_NAME
- LANGBOT_LOCAL_AGENT_MODEL_UUID
- LANGBOT_E2E_MODEL_UUID
preconditions:
- "The selected local LangBot instance is safe for a low-volume real Space model smoke run."
- "Treat Space/provider/network failures as environment or dependency findings until fake-provider baseline evidence separates LangBot overhead."
steps:
- "Prepare a local-agent pipeline with a tested Space model and fallback models."
- "Reset the target Debug Chat session."
- "Open a small number of concurrent WebSocket Debug Chat connections and send unique deterministic prompts through the live Space provider path."
checks:
- "automation-result.json status is pass when every request receives its own expected assistant response."
- "metrics_summary includes request count, concurrency, p95 response latency, throughput, and error rate."
- "The report classifies the result as a live-provider smoke, not a stable LangBot overhead benchmark."
evidence_required:
- metrics
- network
- api_diagnostic
- filesystem
diagnostics:
- "This probe measures real user-path latency through Space and includes provider latency, model behavior, and network effects."
- "Compare with langbot-fake-provider-debug-chat-load before attributing slow or failed runs to LangBot itself."
success_patterns:
- "Debug Chat WebSocket concurrency probe passed"
- "Streaming completed"
failure_patterns:
- "invalid api key"
- "WebSocket connection error"
- "Timed out after"
- "Final assistant response did not include"
- "All models failed during streaming setup"
troubleshooting:
- local-agent-model-route-unavailable
- marketplace-network-flaky
- proxy-env-mismatch
- telemetry-proxy-noise
@@ -0,0 +1,80 @@
id: pipeline-debug-chat-performance
title: "Pipeline Debug Chat user-path performance probe"
mode: agent-browser
area: pipeline
type: performance
priority: p1
risk: medium
ci_eligible: false
tags:
- performance
- pipeline
- debug-chat
- user-path
- metrics
skills:
- langbot-env-setup
- langbot-testing
env:
- LANGBOT_FRONTEND_URL
- LANGBOT_BACKEND_URL
env_any:
- LANGBOT_PIPELINE_URL|LANGBOT_PIPELINE_NAME
automation: scripts/e2e/pipeline-debug-chat.mjs
automation_env:
- LANGBOT_FRONTEND_URL
- LANGBOT_BACKEND_URL
- LANGBOT_BROWSER_PROFILE
- LANGBOT_CHROMIUM_EXECUTABLE
- LANGBOT_E2E_PROMPT
- LANGBOT_E2E_EXPECTED_TEXT
- LANGBOT_E2E_RESPONSE_TIMEOUT_MS
automation_env_any:
- LANGBOT_PIPELINE_URL|LANGBOT_PIPELINE_NAME
automation_prompt: "请只回复 OK,用于性能测试。"
automation_expected_text: "OK"
automation_response_timeout_ms: "120000"
automation_reset_debug_chat: "true"
automation_debug_chat_response_p95_ms: "120000"
automation_debug_chat_max_error_rate: "0"
metrics_thresholds_json: '{"response_p95_ms":{"max":120000},"error_rate":{"max":0}}'
load_profile_json: '{"prompts":1,"browser":true,"path":"Pipeline Debug Chat","metric":"send-to-visible-completion"}'
setup_automation:
- "node:scripts/e2e/ensure-local-agent-pipeline.mjs --write-env"
setup_provides_env:
- LANGBOT_PIPELINE_URL
- LANGBOT_PIPELINE_NAME
preconditions:
- "LANGBOT_PIPELINE_URL or LANGBOT_PIPELINE_NAME points to the pipeline intended for this Debug Chat performance run."
- "The target pipeline is safe to reset Debug Chat history for this run."
- "The target pipeline has a known-good runner/model; provider latency should be interpreted separately from LangBot overhead."
steps:
- "Open LANGBOT_FRONTEND_URL with the prepared browser profile."
- "Open the target pipeline and select Debug Chat."
- "Reset Debug Chat history through the backend API when configured."
- "Send the deterministic prompt and wait for the expected assistant response."
checks:
- "automation-result.json status is pass when the expected assistant response appears."
- "metrics_summary includes response_p50_ms, response_p95_ms, error_rate, and total_duration_ms."
- "thresholds_summary shows response_p95_ms and error_rate pass."
evidence_required:
- ui
- screenshot
- console
- network
- metrics
diagnostics:
- "This case measures browser-visible send-to-completion latency; it does not split provider latency from LangBot overhead."
- "Use backend logs and provider diagnostics to explain slow runs before calling them LangBot regressions."
success_patterns:
- "Processing request from person_websocket"
- "Streaming completed"
failure_patterns:
- "Action invoke_llm_stream call timed out"
- "Task exception was never retrieved"
- "All models failed during streaming setup"
troubleshooting:
- debug-chat-history-contaminates-automation
- local-agent-model-route-unavailable
- plugin-runtime-timeout
- proxy-env-mismatch
@@ -1 +1,3 @@
dist/ dist/*
!dist/
!dist/qa-plugin-smoke-0.1.0.lbpkg
@@ -0,0 +1,837 @@
#!/usr/bin/env node
import crypto from "node:crypto";
import net from "node:net";
import tls from "node:tls";
import { mkdir, writeFile } from "node:fs/promises";
import { join, resolve } from "node:path";
import { env, exit } from "node:process";
import {
apiJson,
appendLine,
ensureEvidence,
evidencePaths,
loadEnvFiles,
localIsoWithOffset,
redact,
resetAndAuthLocalUser,
writeResult,
} from "../../../scripts/e2e/lib/langbot-e2e.mjs";
import {
buildProviderTimingMetrics,
summarizeFakeProviderState,
} from "./lib/fake-provider-timing.mjs";
const DEFAULT_LOCAL_PASSWORD = "LangBotE2ELocalPass!2026";
await loadEnvFiles();
const caseId = env.LBS_CASE_ID || "langbot-debug-chat-concurrency";
const paths = evidencePaths(caseId);
await ensureEvidence(paths);
const startedAt = new Date();
const metricsPath = resolve(paths.evidenceDir, "metrics.json");
const samplesPath = resolve(paths.evidenceDir, "samples.json");
const fakeProviderStatePath = resolve(paths.evidenceDir, "fake-provider-state.json");
const resetDiagnosticPath = resolve(paths.evidenceDir, "debug-chat-reset-diagnostic.json");
const backendUrl = env.LANGBOT_BACKEND_URL || "";
const fakeProviderUrl = env.LANGBOT_FAKE_PROVIDER_URL || "";
const pipelineUrl = env.LANGBOT_E2E_PIPELINE_URL || env.LANGBOT_PIPELINE_URL || "";
const pipelineName = env.LANGBOT_E2E_PIPELINE_NAME || env.LANGBOT_PIPELINE_NAME || "";
const sessionType = env.LANGBOT_DEBUG_CHAT_LOAD_SESSION_TYPE || env.LANGBOT_E2E_DEBUG_CHAT_SESSION_TYPE || "person";
const totalRequests = positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_REQUESTS, defaultRequests(caseId));
const concurrency = Math.min(totalRequests, positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_CONCURRENCY, defaultConcurrency(caseId)));
const timeoutMs = positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_TIMEOUT_MS, defaultTimeout(caseId));
const expectedPrefix = env.LANGBOT_DEBUG_CHAT_LOAD_EXPECTED_PREFIX || "LBQA";
const promptTemplate = env.LANGBOT_DEBUG_CHAT_LOAD_PROMPT_TEMPLATE
|| "请只回复 \"{expected}\",不要解释,不要添加其他字符。";
const stream = bool(env.LANGBOT_DEBUG_CHAT_LOAD_STREAM, true);
const resetBeforeRun = bool(env.LANGBOT_DEBUG_CHAT_LOAD_RESET, true);
const responseP95BudgetMs = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_RESPONSE_P95_MS, defaultP95Budget(caseId));
const firstResponseP95BudgetMs = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_FIRST_RESPONSE_P95_MS, 0);
const maxErrorRate = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_MAX_ERROR_RATE, 0);
const minErrorRate = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_MIN_ERROR_RATE, 0);
const minErrorCount = nonNegativeInteger(env.LANGBOT_DEBUG_CHAT_LOAD_MIN_ERROR_COUNT, 0);
const minOkCount = nonNegativeInteger(env.LANGBOT_DEBUG_CHAT_LOAD_MIN_OK_COUNT, 0);
const minProviderFaultCount = nonNegativeInteger(env.LANGBOT_DEBUG_CHAT_LOAD_MIN_PROVIDER_FAULT_COUNT, 0);
const failOnFinalMismatch = bool(env.LANGBOT_DEBUG_CHAT_LOAD_FAIL_ON_FINAL_MISMATCH, false);
const failureSignals = textList(env.LANGBOT_E2E_FAILURE_SIGNALS || env.LANGBOT_DEBUG_CHAT_LOAD_FAILURE_SIGNALS || "");
const result = {
source: "automation",
case_id: caseId,
run_id: paths.runId,
status: "fail",
reason: "",
started_at: startedAt.toISOString(),
started_at_local: localIsoWithOffset(startedAt),
finished_at: "",
finished_at_local: "",
duration_ms: 0,
backend_url: backendUrl,
pipeline_url: pipelineUrl,
pipeline_name: pipelineName,
pipeline_id: "",
session_type: sessionType,
load_profile: {
requests: totalRequests,
concurrency,
timeout_ms: timeoutMs,
stream,
reset_before_run: resetBeforeRun,
fail_on_final_mismatch: failOnFinalMismatch,
},
evidence: {
network_log: paths.networkLog,
metrics_json: metricsPath,
samples_json: samplesPath,
fake_provider_state_json: fakeProviderStatePath,
debug_chat_reset_diagnostic_json: resetDiagnosticPath,
automation_result_json: paths.automationResultJson,
result_json: paths.resultJson,
},
evidence_collected: ["metrics", "network", "api_diagnostic", "filesystem"],
};
try {
if (!backendUrl) {
result.status = "env_issue";
throw new Error("LANGBOT_BACKEND_URL is not configured.");
}
if (!["person", "group"].includes(sessionType)) {
throw new Error(`LANGBOT_DEBUG_CHAT_LOAD_SESSION_TYPE must be person or group, got ${sessionType}.`);
}
const backendReady = await backendReachable(backendUrl);
if (!backendReady) {
result.status = "env_issue";
throw new Error(`Backend did not respond at ${backendUrl}.`);
}
const user = env.LANGBOT_E2E_LOGIN_USER || "";
const password = env.LANGBOT_E2E_LOGIN_PASSWORD || DEFAULT_LOCAL_PASSWORD;
if (!user) {
result.status = "env_issue";
throw new Error("LANGBOT_E2E_LOGIN_USER is required so this probe can resolve/reset the Debug Chat session.");
}
const auth = await resetAndAuthLocalUser({ backendUrl, user, password });
const pipeline = await resolvePipeline({ backendUrl, token: auth.token, pipelineUrl, pipelineName });
result.pipeline_id = pipeline.id;
result.pipeline_name = pipeline.name || pipelineName;
if (!result.pipeline_url && env.LANGBOT_FRONTEND_URL) {
result.pipeline_url = `${env.LANGBOT_FRONTEND_URL.replace(/\/$/, "")}/home/pipelines?id=${encodeURIComponent(pipeline.id)}`;
}
if (resetBeforeRun) {
const reset = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(pipeline.id)}/ws/reset/${encodeURIComponent(sessionType)}`, {
method: "POST",
token: auth.token,
});
const resetDiagnostic = {
status: isApiFailure(reset) ? "fail" : "ready",
http_status: reset.status,
code: reset.json.code ?? null,
reason: isApiFailure(reset) ? reset.json.msg || "Debug Chat reset failed." : "Debug Chat session reset.",
};
await writeFile(resetDiagnosticPath, `${JSON.stringify(resetDiagnostic, null, 2)}\n`, "utf8");
if (resetDiagnostic.status === "fail") {
throw new Error(resetDiagnostic.reason);
}
}
const wsUrl = websocketUrl(backendUrl, pipeline.id, sessionType);
const loadStartedAt = performance.now();
const samples = await runLoad({
wsUrl,
totalRequests,
concurrency,
timeoutMs,
promptTemplate,
expectedPrefix,
stream,
failOnFinalMismatch,
failureSignals,
});
const loadDurationMs = performance.now() - loadStartedAt;
const fakeProviderState = await readFakeProviderState(fakeProviderUrl);
if (fakeProviderState) {
await writeFile(fakeProviderStatePath, `${JSON.stringify(fakeProviderState, null, 2)}\n`, "utf8");
}
const metrics = buildMetrics({
samples,
totalRequests,
concurrency,
timeoutMs,
loadDurationMs,
backendUrl,
pipelineId: pipeline.id,
sessionType,
fakeProviderState,
});
const thresholds = buildThresholds(metrics);
const passed = Object.values(thresholds).every((item) => item.pass);
result.status = passed ? "pass" : "fail";
result.reason = passed
? "Debug Chat WebSocket concurrency probe passed all thresholds."
: "Debug Chat WebSocket concurrency probe breached latency or error-rate thresholds.";
result.metrics_summary = {
requests: metrics.total_requests,
concurrency: metrics.concurrency,
ok_count: metrics.ok_count,
error_count: metrics.error_count,
timeout_count: metrics.timeout_count,
error_rate: metrics.error_rate,
response_p50_ms: metrics.response_duration_ms.p50,
response_p95_ms: metrics.response_duration_ms.p95,
first_assistant_event_p95_ms: metrics.first_assistant_event_ms.p95,
first_assistant_content_p95_ms: metrics.first_assistant_content_ms.p95,
first_response_p95_ms: metrics.first_response_ms.p95,
throughput_rps: metrics.throughput_rps,
status_counts: metrics.status_counts,
fake_provider_request_count: metrics.fake_provider?.request_count ?? null,
fake_provider_fault_count: metrics.fake_provider?.fault_count ?? null,
fake_provider_duration_p95_ms: metrics.provider_timing?.provider_duration_ms.p95 ?? null,
langbot_overhead_estimate_p95_ms: metrics.provider_timing?.langbot_overhead_estimate_ms.p95 ?? null,
send_to_provider_start_p95_ms: metrics.provider_timing?.send_to_provider_start_ms.p95 ?? null,
provider_finish_to_ws_final_p95_ms: metrics.provider_timing?.provider_finish_to_ws_final_ms.p95 ?? null,
provider_timing_matched_request_count: metrics.provider_timing?.matched_request_count ?? null,
};
result.thresholds_summary = thresholds;
result.artifacts = {
metrics_json: metricsPath,
samples_json: samplesPath,
fake_provider_state_json: fakeProviderState ? fakeProviderStatePath : "",
network_log: paths.networkLog,
automation_result_json: paths.automationResultJson,
result_json: paths.resultJson,
};
await writeFile(metricsPath, `${JSON.stringify({ ...metrics, thresholds }, null, 2)}\n`, "utf8");
await writeFile(samplesPath, `${JSON.stringify(samples, null, 2)}\n`, "utf8");
} catch (error) {
if (!["env_issue", "blocked"].includes(result.status)) {
result.status = looksLikeEnvIssue(error) ? "env_issue" : "fail";
}
result.reason = result.reason || safeReason(error.message);
} finally {
const finishedAt = new Date();
result.finished_at = finishedAt.toISOString();
result.finished_at_local = localIsoWithOffset(finishedAt);
result.duration_ms = finishedAt.getTime() - startedAt.getTime();
await mkdir(paths.evidenceDir, { recursive: true });
await writeResult(paths, result);
console.log(JSON.stringify(result, null, 2));
}
exit(result.status === "pass" ? 0 : result.status === "env_issue" || result.status === "blocked" ? 2 : 1);
function defaultRequests(id) {
return id.includes("space") ? 3 : 12;
}
function defaultConcurrency(id) {
return id.includes("space") ? 1 : 4;
}
function defaultTimeout(id) {
return id.includes("space") ? 120_000 : 30_000;
}
function defaultP95Budget(id) {
return id.includes("space") ? 120_000 : 5_000;
}
function positiveInteger(value, fallback) {
const parsed = Number.parseInt(String(value || ""), 10);
return Number.isInteger(parsed) && parsed > 0 ? parsed : fallback;
}
function nonNegativeInteger(value, fallback) {
const parsed = Number.parseInt(String(value ?? ""), 10);
return Number.isInteger(parsed) && parsed >= 0 ? parsed : fallback;
}
function positiveNumber(value, fallback) {
const parsed = Number(value || "");
return Number.isFinite(parsed) && parsed >= 0 ? parsed : fallback;
}
function bool(value, fallback) {
if (value === undefined || value === "") return fallback;
if (/^(1|true|yes|on)$/i.test(String(value))) return true;
if (/^(0|false|no|off)$/i.test(String(value))) return false;
return fallback;
}
function textList(value) {
return String(value || "")
.split(/\r?\n|,/)
.map((item) => item.trim())
.filter(Boolean);
}
async function backendReachable(baseUrl) {
try {
const response = await fetch(`${baseUrl.replace(/\/$/, "")}/healthz`, {
signal: AbortSignal.timeout(3000),
});
return response.status < 500;
} catch {
return false;
}
}
async function readFakeProviderState(rootUrl) {
if (!rootUrl) return null;
try {
const response = await fetch(`${normalizeProviderRootUrl(rootUrl)}/__qa/config`, {
signal: AbortSignal.timeout(3000),
});
const json = await response.json().catch(() => ({}));
return {
status: response.ok && json.ok === true ? "loaded" : "unavailable",
url: normalizeProviderRootUrl(rootUrl),
http_status: response.status,
model: json.model || "",
config: json.config || {},
request_count: Number.isFinite(json.request_count) ? json.request_count : null,
recent_requests: Array.isArray(json.recent_requests) ? json.recent_requests : [],
};
} catch (error) {
return {
status: "unavailable",
url: normalizeProviderRootUrl(rootUrl),
reason: safeReason(error.message),
request_count: null,
recent_requests: [],
};
}
}
function normalizeProviderRootUrl(value) {
const trimmed = String(value || "").trim().replace(/\/$/, "");
return trimmed.endsWith("/v1") ? trimmed.slice(0, -3) : trimmed;
}
function pipelineIdFromUrl(url) {
if (!url) return "";
try {
const parsed = new URL(url);
return parsed.searchParams.get("id") || "";
} catch {
return "";
}
}
async function resolvePipeline({ backendUrl, token, pipelineUrl, pipelineName }) {
const idFromUrl = pipelineIdFromUrl(pipelineUrl);
if (idFromUrl) {
const response = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(idFromUrl)}`, { token });
const pipeline = response.json.data?.pipeline;
if (isApiFailure(response) || !pipeline?.uuid) {
throw new Error(response.json.msg || `Could not load pipeline ${idFromUrl}.`);
}
return { id: pipeline.uuid, name: pipeline.name || "" };
}
if (!pipelineName) {
throw new Error("Set LANGBOT_E2E_PIPELINE_URL or LANGBOT_E2E_PIPELINE_NAME before running this probe.");
}
const response = await apiJson(backendUrl, "/api/v1/pipelines", { token });
if (isApiFailure(response)) {
throw new Error(response.json.msg || "Failed to list pipelines.");
}
const pipeline = (response.json.data?.pipelines || []).find((item) => item.name === pipelineName);
if (!pipeline?.uuid) {
throw new Error(`Could not find pipeline named ${pipelineName}.`);
}
return { id: pipeline.uuid, name: pipeline.name || pipelineName };
}
function isApiFailure(response) {
return response.status >= 400 || (response.json.code !== undefined && response.json.code !== 0);
}
function websocketUrl(baseUrl, pipelineId, sessionType) {
const parsed = new URL(baseUrl);
parsed.protocol = parsed.protocol === "https:" ? "wss:" : "ws:";
parsed.pathname = `/api/v1/pipelines/${encodeURIComponent(pipelineId)}/ws/connect`;
parsed.search = `?session_type=${encodeURIComponent(sessionType)}`;
return parsed.toString();
}
async function runLoad(options) {
const samples = [];
let nextIndex = 0;
const workers = Array.from({ length: options.concurrency }, async () => {
while (nextIndex < options.totalRequests) {
const index = nextIndex;
nextIndex += 1;
const sample = await runSingleRequest({ ...options, index });
samples.push(sample);
}
});
await Promise.all(workers);
return samples.sort((left, right) => left.index - right.index);
}
function expectedForIndex(prefix, index) {
return `${prefix}-${String(index + 1).padStart(4, "0")}`;
}
function promptForIndex(template, expected) {
return template.replaceAll("{expected}", expected);
}
function runSingleRequest({
wsUrl,
index,
timeoutMs,
promptTemplate,
expectedPrefix,
stream,
failOnFinalMismatch,
failureSignals,
}) {
return new Promise((resolve) => {
const expected = expectedForIndex(expectedPrefix, index);
const prompt = promptForIndex(promptTemplate, expected);
const sample = {
index,
status: "running",
ok: false,
expected_text: expected,
prompt,
response_text: "",
started_at: new Date().toISOString(),
started_epoch_ms: Date.now(),
connected_at: null,
connected_epoch_ms: null,
sent_at: null,
sent_epoch_ms: null,
first_assistant_event_at: null,
first_assistant_event_epoch_ms: null,
first_assistant_event_ms: null,
first_assistant_content_at: null,
first_assistant_content_epoch_ms: null,
first_assistant_content_ms: null,
first_response_at: null,
first_response_epoch_ms: null,
connected_ms: null,
first_response_ms: null,
response_duration_ms: null,
finished_at: null,
finished_epoch_ms: null,
event_count: 0,
foreign_response_count: 0,
last_foreign_response_text: "",
error: "",
close_code: null,
close_reason: "",
};
let closed = false;
let connectedAt = 0;
let sentAt = 0;
const startedAt = performance.now();
let client = null;
const timer = setTimeout(() => {
finish("timeout", `Timed out after ${timeoutMs} ms.`);
}, timeoutMs);
client = openRawWebSocket(wsUrl, {
onOpen() {
connectedAt = performance.now();
const now = Date.now();
sample.connected_at = new Date(now).toISOString();
sample.connected_epoch_ms = now;
sample.connected_ms = rounded(connectedAt - startedAt);
},
onMessage(text) {
sample.event_count += 1;
let data;
try {
data = JSON.parse(String(text || ""));
} catch (error) {
finish("error", `Invalid WebSocket JSON: ${error.message}`);
return;
}
appendLine(paths.networkLog, JSON.stringify({
request_index: index,
type: data.type,
session_type: data.session_type || "",
role: data.data?.role || "",
is_final: data.data?.is_final ?? null,
content_preview: redact(String(data.data?.content || data.message || "").slice(0, 200)),
})).catch(() => {});
if (data.type === "connected") {
sentAt = performance.now();
const now = Date.now();
sample.sent_at = new Date(now).toISOString();
sample.sent_epoch_ms = now;
client.send(JSON.stringify({
type: "message",
message: [{ type: "Plain", text: prompt }],
stream,
}));
return;
}
if (data.type === "error") {
finish("error", data.message || "WebSocket error message.");
return;
}
if (data.type !== "response" || data.data?.role !== "assistant") return;
const content = String(data.data.content || "");
markFirstAssistantEvent(sample, sentAt);
if (content) sample.response_text = content;
if (content) markFirstAssistantContent(sample, sentAt);
if (content.includes(expected) && sample.first_response_ms === null && sentAt > 0) {
const now = Date.now();
sample.first_response_at = new Date(now).toISOString();
sample.first_response_epoch_ms = now;
sample.first_response_ms = rounded(performance.now() - sentAt);
}
if (data.data.is_final === true) {
const ok = sample.response_text.includes(expected);
if (ok) {
if (sample.first_response_ms === null && sentAt > 0) {
sample.first_response_ms = rounded(performance.now() - sentAt);
}
finish("pass", "");
} else if (matchesFailureSignal(sample.response_text, failureSignals)) {
finish("app_error", `Assistant final response matched a failure signal: ${sample.response_text}`);
} else if (failOnFinalMismatch && !containsLoadToken(sample.response_text, expectedPrefix)) {
finish("mismatch", `Final assistant response did not include ${expected}: ${sample.response_text}`);
} else {
sample.foreign_response_count += 1;
sample.last_foreign_response_text = sample.response_text;
}
}
},
onError(error) {
finish("connection_error", `WebSocket connection error: ${error.message}`);
},
onClose(event) {
sample.close_code = event.code;
sample.close_reason = event.reason || "";
if (!closed) finish("closed", `WebSocket closed before final assistant response: ${event.code}`);
},
});
function finish(status, reason) {
if (closed) return;
closed = true;
clearTimeout(timer);
sample.status = status;
sample.ok = status === "pass";
sample.error = status === "timeout" && sample.foreign_response_count > 0
? `${reason || ""} Saw ${sample.foreign_response_count} foreign assistant response(s); last=${sample.last_foreign_response_text}`
: reason || "";
if (sentAt > 0) sample.response_duration_ms = rounded(performance.now() - sentAt);
else sample.response_duration_ms = rounded(performance.now() - startedAt);
const now = Date.now();
sample.finished_at = new Date(now).toISOString();
sample.finished_epoch_ms = now;
try {
client?.close();
} catch {
// Closing a failed socket should not hide the sample result.
}
resolve(sample);
}
});
}
function markFirstAssistantEvent(sample, sentAt) {
if (sample.first_assistant_event_ms !== null || sentAt <= 0) return;
const now = Date.now();
sample.first_assistant_event_at = new Date(now).toISOString();
sample.first_assistant_event_epoch_ms = now;
sample.first_assistant_event_ms = rounded(performance.now() - sentAt);
}
function markFirstAssistantContent(sample, sentAt) {
if (sample.first_assistant_content_ms !== null || sentAt <= 0) return;
const now = Date.now();
sample.first_assistant_content_at = new Date(now).toISOString();
sample.first_assistant_content_epoch_ms = now;
sample.first_assistant_content_ms = rounded(performance.now() - sentAt);
}
function containsLoadToken(text, prefix) {
const escaped = String(prefix).replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
return new RegExp(`${escaped}-\\d{4}`).test(String(text || ""));
}
function matchesFailureSignal(text, signals) {
const lower = String(text || "").toLowerCase();
return signals.some((signal) => lower.includes(signal.toLowerCase()));
}
function openRawWebSocket(wsUrl, handlers) {
const parsed = new URL(wsUrl);
const secure = parsed.protocol === "wss:";
const port = Number(parsed.port || (secure ? 443 : 80));
const host = parsed.hostname;
const path = `${parsed.pathname}${parsed.search}`;
const key = crypto.randomBytes(16).toString("base64");
const socket = secure
? tls.connect({ host, port, servername: host })
: net.connect({ host, port });
let opened = false;
let closed = false;
let buffer = Buffer.alloc(0);
socket.setNoDelay(true);
socket.on("connect", () => {
const originProtocol = secure ? "https" : "http";
const request = [
`GET ${path} HTTP/1.1`,
`Host: ${parsed.host}`,
"Upgrade: websocket",
"Connection: Upgrade",
`Sec-WebSocket-Key: ${key}`,
"Sec-WebSocket-Version: 13",
`Origin: ${originProtocol}://${parsed.host}`,
"",
"",
].join("\r\n");
socket.write(request);
});
socket.on("data", (chunk) => {
buffer = Buffer.concat([buffer, chunk]);
if (!opened) {
const headerEnd = buffer.indexOf("\r\n\r\n");
if (headerEnd === -1) return;
const headerText = buffer.slice(0, headerEnd).toString("utf8");
buffer = buffer.slice(headerEnd + 4);
if (!/^HTTP\/1\.1 101\b/i.test(headerText)) {
handlers.onError(new Error(`Handshake failed: ${headerText.split("\r\n")[0] || "missing status"}`));
socket.destroy();
return;
}
opened = true;
handlers.onOpen();
}
processFrames();
});
socket.on("error", (error) => {
if (!closed) handlers.onError(error);
});
socket.on("close", () => {
if (closed) return;
closed = true;
handlers.onClose({ code: null, reason: "" });
});
function processFrames() {
while (true) {
const frame = readFrame(buffer);
if (!frame) return;
buffer = buffer.slice(frame.consumed);
if (frame.opcode === 0x1) {
handlers.onMessage(frame.payload.toString("utf8"));
} else if (frame.opcode === 0x8) {
const code = frame.payload.length >= 2 ? frame.payload.readUInt16BE(0) : null;
const reason = frame.payload.length > 2 ? frame.payload.slice(2).toString("utf8") : "";
closed = true;
handlers.onClose({ code, reason });
socket.end();
return;
} else if (frame.opcode === 0x9) {
writeFrame(socket, 0xA, frame.payload);
}
}
}
return {
send(text) {
if (closed || !opened) return;
writeFrame(socket, 0x1, Buffer.from(text, "utf8"));
},
close() {
if (closed) return;
closed = true;
if (!socket.destroyed) {
if (opened) writeFrame(socket, 0x8, Buffer.alloc(0));
setTimeout(() => socket.end(), 50).unref();
}
},
};
}
function readFrame(buffer) {
if (buffer.length < 2) return null;
const first = buffer[0];
const second = buffer[1];
const opcode = first & 0x0f;
const masked = Boolean(second & 0x80);
let length = second & 0x7f;
let offset = 2;
if (length === 126) {
if (buffer.length < offset + 2) return null;
length = buffer.readUInt16BE(offset);
offset += 2;
} else if (length === 127) {
if (buffer.length < offset + 8) return null;
const high = buffer.readUInt32BE(offset);
const low = buffer.readUInt32BE(offset + 4);
length = high * 2 ** 32 + low;
offset += 8;
}
let mask = null;
if (masked) {
if (buffer.length < offset + 4) return null;
mask = buffer.slice(offset, offset + 4);
offset += 4;
}
if (buffer.length < offset + length) return null;
let payload = buffer.slice(offset, offset + length);
if (mask) {
payload = Buffer.from(payload);
for (let index = 0; index < payload.length; index += 1) {
payload[index] ^= mask[index % 4];
}
}
return {
opcode,
payload,
consumed: offset + length,
};
}
function writeFrame(socket, opcode, payload) {
const body = Buffer.isBuffer(payload) ? payload : Buffer.from(payload || "");
const mask = crypto.randomBytes(4);
const headerLength = body.length < 126 ? 2 : body.length <= 0xffff ? 4 : 10;
const header = Buffer.alloc(headerLength);
header[0] = 0x80 | opcode;
if (body.length < 126) {
header[1] = 0x80 | body.length;
} else if (body.length <= 0xffff) {
header[1] = 0x80 | 126;
header.writeUInt16BE(body.length, 2);
} else {
header[1] = 0x80 | 127;
header.writeUInt32BE(Math.floor(body.length / 2 ** 32), 2);
header.writeUInt32BE(body.length >>> 0, 6);
}
const masked = Buffer.from(body);
for (let index = 0; index < masked.length; index += 1) {
masked[index] ^= mask[index % 4];
}
socket.write(Buffer.concat([header, mask, masked]));
}
function rounded(value) {
return Number(value.toFixed(3));
}
function percentile(values, percentileValue) {
if (values.length === 0) return 0;
const sorted = [...values].sort((a, b) => a - b);
const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
return rounded(sorted[index]);
}
function stats(values) {
if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
return {
min: rounded(Math.min(...values)),
p50: percentile(values, 50),
p95: percentile(values, 95),
p99: percentile(values, 99),
max: rounded(Math.max(...values)),
};
}
function buildMetrics({ samples, totalRequests, concurrency, timeoutMs, loadDurationMs, backendUrl, pipelineId, sessionType, fakeProviderState }) {
const okSamples = samples.filter((sample) => sample.ok);
const statusCounts = {};
for (const sample of samples) {
statusCounts[sample.status] = (statusCounts[sample.status] || 0) + 1;
}
const errorCount = samples.length - okSamples.length;
return {
probe: caseId,
backend_url: backendUrl,
pipeline_id: pipelineId,
session_type: sessionType,
total_requests: totalRequests,
completed_requests: samples.length,
concurrency,
timeout_ms: timeoutMs,
ok_count: okSamples.length,
error_count: errorCount,
timeout_count: samples.filter((sample) => sample.status === "timeout").length,
error_rate: samples.length === 0 ? 1 : rounded(errorCount / samples.length),
load_duration_ms: rounded(loadDurationMs),
throughput_rps: loadDurationMs <= 0 ? 0 : rounded(okSamples.length / (loadDurationMs / 1000)),
status_counts: statusCounts,
connected_ms: stats(samples.map((sample) => sample.connected_ms).filter(Number.isFinite)),
first_assistant_event_ms: stats(samples.map((sample) => sample.first_assistant_event_ms).filter(Number.isFinite)),
first_assistant_content_ms: stats(samples.map((sample) => sample.first_assistant_content_ms).filter(Number.isFinite)),
first_response_ms: stats(okSamples.map((sample) => sample.first_response_ms).filter(Number.isFinite)),
response_duration_ms: stats(okSamples.map((sample) => sample.response_duration_ms).filter(Number.isFinite)),
fake_provider: summarizeFakeProviderState(fakeProviderState),
provider_timing: buildProviderTimingMetrics(samples, fakeProviderState),
samples,
};
}
function buildThresholds(metrics) {
const thresholds = {
error_rate: { actual: metrics.error_rate, max: maxErrorRate, pass: metrics.error_rate <= maxErrorRate },
response_p95_ms: {
actual: metrics.response_duration_ms.p95,
max: responseP95BudgetMs,
pass: metrics.ok_count > 0 && metrics.response_duration_ms.p95 <= responseP95BudgetMs,
},
};
if (minErrorRate > 0) {
thresholds.error_rate_min = {
actual: metrics.error_rate,
min: minErrorRate,
pass: metrics.error_rate >= minErrorRate,
};
}
if (minErrorCount > 0) {
thresholds.error_count_min = {
actual: metrics.error_count,
min: minErrorCount,
pass: metrics.error_count >= minErrorCount,
};
}
if (minOkCount > 0) {
thresholds.ok_count_min = {
actual: metrics.ok_count,
min: minOkCount,
pass: metrics.ok_count >= minOkCount,
};
}
if (minProviderFaultCount > 0) {
const actual = metrics.fake_provider?.fault_count ?? 0;
thresholds.fake_provider_fault_count_min = {
actual,
min: minProviderFaultCount,
pass: actual >= minProviderFaultCount,
};
}
if (firstResponseP95BudgetMs > 0) {
thresholds.first_response_p95_ms = {
actual: metrics.first_response_ms.p95,
max: firstResponseP95BudgetMs,
pass: metrics.ok_count > 0 && metrics.first_response_ms.p95 <= firstResponseP95BudgetMs,
};
}
return thresholds;
}
function looksLikeEnvIssue(error) {
const message = String(error?.message || error || "");
return /fetch failed|ECONNREFUSED|ENOTFOUND|LANGBOT_.*not configured|Could not read recovery_key|Backend did not respond/i.test(message);
}
function safeReason(value) {
return redact(String(value || "")).slice(0, 1000);
}
@@ -0,0 +1,861 @@
#!/usr/bin/env node
import crypto from "node:crypto";
import net from "node:net";
import tls from "node:tls";
import { mkdir, writeFile } from "node:fs/promises";
import { resolve } from "node:path";
import { env, exit } from "node:process";
import {
apiJson,
appendLine,
ensureEvidence,
evidencePaths,
loadEnvFiles,
localIsoWithOffset,
redact,
resetAndAuthLocalUser,
writeResult,
} from "../../../scripts/e2e/lib/langbot-e2e.mjs";
import {
buildProviderTimingMetrics,
summarizeFakeProviderState,
} from "./lib/fake-provider-timing.mjs";
const DEFAULT_LOCAL_PASSWORD = "LangBotE2ELocalPass!2026";
await loadEnvFiles();
const caseId = env.LBS_CASE_ID || "langbot-debug-chat-cross-pipeline-isolation";
const paths = evidencePaths(caseId);
await ensureEvidence(paths);
const startedAt = new Date();
const metricsPath = resolve(paths.evidenceDir, "metrics.json");
const samplesPath = resolve(paths.evidenceDir, "samples.json");
const fakeProviderStatePath = resolve(paths.evidenceDir, "fake-provider-state.json");
const resetDiagnosticPath = resolve(paths.evidenceDir, "debug-chat-reset-diagnostic.json");
const backendUrl = env.LANGBOT_BACKEND_URL || "";
const fakeProviderUrl = env.LANGBOT_FAKE_PROVIDER_URL || "";
const sessionType = env.LANGBOT_DEBUG_CHAT_LOAD_SESSION_TYPE || env.LANGBOT_E2E_DEBUG_CHAT_SESSION_TYPE || "person";
const requestsPerPipeline = positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_REQUESTS, 6);
const concurrency = Math.min(requestsPerPipeline * 2, positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_CONCURRENCY, 4));
const timeoutMs = positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_TIMEOUT_MS, 30_000);
const stream = bool(env.LANGBOT_DEBUG_CHAT_LOAD_STREAM, true);
const resetBeforeRun = bool(env.LANGBOT_DEBUG_CHAT_LOAD_RESET, true);
const responseP95BudgetMs = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_RESPONSE_P95_MS, 5_000);
const maxErrorRate = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_MAX_ERROR_RATE, 0);
const promptTemplate = env.LANGBOT_DEBUG_CHAT_LOAD_PROMPT_TEMPLATE
|| "请只回复 \"{expected}\",不要解释,不要添加其他字符。";
const failureSignals = textList(env.LANGBOT_E2E_FAILURE_SIGNALS || env.LANGBOT_DEBUG_CHAT_LOAD_FAILURE_SIGNALS || "");
const pipelineTargets = [
{
label: "A",
expectedPrefix: "PIPEA",
otherPrefix: "PIPEB",
url: env.LANGBOT_FAKE_PROVIDER_PIPELINE_A_URL || "",
name: env.LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME || "",
},
{
label: "B",
expectedPrefix: "PIPEB",
otherPrefix: "PIPEA",
url: env.LANGBOT_FAKE_PROVIDER_PIPELINE_B_URL || "",
name: env.LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME || "",
},
];
const result = {
source: "automation",
case_id: caseId,
run_id: paths.runId,
status: "fail",
reason: "",
started_at: startedAt.toISOString(),
started_at_local: localIsoWithOffset(startedAt),
finished_at: "",
finished_at_local: "",
duration_ms: 0,
backend_url: backendUrl,
session_type: sessionType,
pipelines: [],
load_profile: {
requests_per_pipeline: requestsPerPipeline,
total_requests: requestsPerPipeline * 2,
concurrency,
timeout_ms: timeoutMs,
stream,
reset_before_run: resetBeforeRun,
},
evidence: {
network_log: paths.networkLog,
metrics_json: metricsPath,
samples_json: samplesPath,
fake_provider_state_json: fakeProviderStatePath,
debug_chat_reset_diagnostic_json: resetDiagnosticPath,
automation_result_json: paths.automationResultJson,
result_json: paths.resultJson,
},
evidence_collected: ["metrics", "network", "api_diagnostic", "filesystem"],
};
try {
if (!backendUrl) {
result.status = "env_issue";
throw new Error("LANGBOT_BACKEND_URL is not configured.");
}
if (!["person", "group"].includes(sessionType)) {
throw new Error(`LANGBOT_DEBUG_CHAT_LOAD_SESSION_TYPE must be person or group, got ${sessionType}.`);
}
for (const target of pipelineTargets) {
if (!target.url && !target.name) {
result.status = "env_issue";
throw new Error(`Set LANGBOT_FAKE_PROVIDER_PIPELINE_${target.label}_URL or LANGBOT_FAKE_PROVIDER_PIPELINE_${target.label}_NAME.`);
}
}
const backendReady = await backendReachable(backendUrl);
if (!backendReady) {
result.status = "env_issue";
throw new Error(`Backend did not respond at ${backendUrl}.`);
}
const user = env.LANGBOT_E2E_LOGIN_USER || "";
const password = env.LANGBOT_E2E_LOGIN_PASSWORD || DEFAULT_LOCAL_PASSWORD;
if (!user) {
result.status = "env_issue";
throw new Error("LANGBOT_E2E_LOGIN_USER is required so this probe can resolve/reset Debug Chat sessions.");
}
const auth = await resetAndAuthLocalUser({ backendUrl, user, password });
const pipelines = [];
for (const target of pipelineTargets) {
const pipeline = await resolvePipeline({
backendUrl,
token: auth.token,
pipelineUrl: target.url,
pipelineName: target.name,
});
pipelines.push({
...target,
id: pipeline.id,
name: pipeline.name || target.name,
wsUrl: websocketUrl(backendUrl, pipeline.id, sessionType),
});
}
result.pipelines = pipelines.map((pipeline) => ({
label: pipeline.label,
id: pipeline.id,
name: pipeline.name,
url: pipeline.url,
}));
if (resetBeforeRun) {
const resetDiagnostics = [];
for (const pipeline of pipelines) {
const reset = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(pipeline.id)}/ws/reset/${encodeURIComponent(sessionType)}`, {
method: "POST",
token: auth.token,
});
resetDiagnostics.push({
pipeline_label: pipeline.label,
pipeline_id: pipeline.id,
status: isApiFailure(reset) ? "fail" : "ready",
http_status: reset.status,
code: reset.json.code ?? null,
reason: isApiFailure(reset) ? reset.json.msg || "Debug Chat reset failed." : "Debug Chat session reset.",
});
}
await writeFile(resetDiagnosticPath, `${JSON.stringify(resetDiagnostics, null, 2)}\n`, "utf8");
const failedReset = resetDiagnostics.find((item) => item.status === "fail");
if (failedReset) throw new Error(failedReset.reason);
}
await resetFakeProvider(fakeProviderUrl);
const jobs = [];
for (let index = 0; index < requestsPerPipeline; index += 1) {
for (const pipeline of pipelines) {
jobs.push({ ...pipeline, index });
}
}
const loadStartedAt = performance.now();
const samples = await runLoad({
jobs,
concurrency,
timeoutMs,
promptTemplate,
stream,
failureSignals,
});
const loadDurationMs = performance.now() - loadStartedAt;
const fakeProviderState = await readFakeProviderState(fakeProviderUrl);
if (fakeProviderState) {
await writeFile(fakeProviderStatePath, `${JSON.stringify(fakeProviderState, null, 2)}\n`, "utf8");
}
const metrics = buildMetrics({
samples,
requestsPerPipeline,
concurrency,
timeoutMs,
loadDurationMs,
backendUrl,
sessionType,
fakeProviderState,
});
const thresholds = buildThresholds(metrics);
const passed = Object.values(thresholds).every((item) => item.pass);
result.status = passed ? "pass" : "fail";
result.reason = passed
? "Debug Chat cross-pipeline isolation probe passed all thresholds."
: "Debug Chat cross-pipeline isolation probe found leaks, errors, or latency threshold breaches.";
result.metrics_summary = {
requests_per_pipeline: metrics.requests_per_pipeline,
total_requests: metrics.total_requests,
concurrency: metrics.concurrency,
ok_count: metrics.ok_count,
error_count: metrics.error_count,
cross_pipeline_leak_count: metrics.cross_pipeline_leak_count,
timeout_count: metrics.timeout_count,
error_rate: metrics.error_rate,
response_p95_ms: metrics.response_duration_ms.p95,
first_response_p95_ms: metrics.first_response_ms.p95,
throughput_rps: metrics.throughput_rps,
status_counts: metrics.status_counts,
by_pipeline: metrics.by_pipeline,
fake_provider_request_count: metrics.fake_provider?.request_count ?? null,
fake_provider_duration_p95_ms: metrics.provider_timing?.provider_duration_ms.p95 ?? null,
langbot_overhead_estimate_p95_ms: metrics.provider_timing?.langbot_overhead_estimate_ms.p95 ?? null,
send_to_provider_start_p95_ms: metrics.provider_timing?.send_to_provider_start_ms.p95 ?? null,
provider_finish_to_ws_final_p95_ms: metrics.provider_timing?.provider_finish_to_ws_final_ms.p95 ?? null,
};
result.thresholds_summary = thresholds;
result.artifacts = {
metrics_json: metricsPath,
samples_json: samplesPath,
fake_provider_state_json: fakeProviderState ? fakeProviderStatePath : "",
network_log: paths.networkLog,
automation_result_json: paths.automationResultJson,
result_json: paths.resultJson,
};
await writeFile(metricsPath, `${JSON.stringify({ ...metrics, thresholds }, null, 2)}\n`, "utf8");
await writeFile(samplesPath, `${JSON.stringify(samples, null, 2)}\n`, "utf8");
} catch (error) {
if (!["env_issue", "blocked"].includes(result.status)) {
result.status = looksLikeEnvIssue(error) ? "env_issue" : "fail";
}
result.reason = result.reason || safeReason(error.message);
} finally {
const finishedAt = new Date();
result.finished_at = finishedAt.toISOString();
result.finished_at_local = localIsoWithOffset(finishedAt);
result.duration_ms = finishedAt.getTime() - startedAt.getTime();
await mkdir(paths.evidenceDir, { recursive: true });
await writeResult(paths, result);
console.log(JSON.stringify(result, null, 2));
}
exit(result.status === "pass" ? 0 : result.status === "env_issue" || result.status === "blocked" ? 2 : 1);
async function backendReachable(baseUrl) {
try {
const response = await fetch(`${baseUrl.replace(/\/$/, "")}/healthz`, {
signal: AbortSignal.timeout(3000),
});
return response.status < 500;
} catch {
return false;
}
}
async function resetFakeProvider(rootUrl) {
if (!rootUrl) return;
try {
await fetch(`${normalizeProviderRootUrl(rootUrl)}/__qa/reset`, {
method: "POST",
signal: AbortSignal.timeout(3000),
});
} catch {
// Missing fake-provider diagnostics should not hide the isolation result.
}
}
async function readFakeProviderState(rootUrl) {
if (!rootUrl) return null;
try {
const response = await fetch(`${normalizeProviderRootUrl(rootUrl)}/__qa/config`, {
signal: AbortSignal.timeout(3000),
});
const json = await response.json().catch(() => ({}));
return {
status: response.ok && json.ok === true ? "loaded" : "unavailable",
url: normalizeProviderRootUrl(rootUrl),
http_status: response.status,
model: json.model || "",
config: json.config || {},
request_count: Number.isFinite(json.request_count) ? json.request_count : null,
recent_requests: Array.isArray(json.recent_requests) ? json.recent_requests : [],
};
} catch (error) {
return {
status: "unavailable",
url: normalizeProviderRootUrl(rootUrl),
reason: safeReason(error.message),
request_count: null,
recent_requests: [],
};
}
}
function normalizeProviderRootUrl(value) {
const trimmed = String(value || "").trim().replace(/\/$/, "");
return trimmed.endsWith("/v1") ? trimmed.slice(0, -3) : trimmed;
}
function pipelineIdFromUrl(url) {
if (!url) return "";
try {
const parsed = new URL(url);
return parsed.searchParams.get("id") || "";
} catch {
return "";
}
}
async function resolvePipeline({ backendUrl, token, pipelineUrl, pipelineName }) {
const idFromUrl = pipelineIdFromUrl(pipelineUrl);
if (idFromUrl) {
const response = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(idFromUrl)}`, { token });
const pipeline = response.json.data?.pipeline;
if (isApiFailure(response) || !pipeline?.uuid) {
throw new Error(response.json.msg || `Could not load pipeline ${idFromUrl}.`);
}
return { id: pipeline.uuid, name: pipeline.name || "" };
}
if (!pipelineName) {
throw new Error("Set pipeline URL or name before running this probe.");
}
const response = await apiJson(backendUrl, "/api/v1/pipelines", { token });
if (isApiFailure(response)) {
throw new Error(response.json.msg || "Failed to list pipelines.");
}
const pipeline = (response.json.data?.pipelines || []).find((item) => item.name === pipelineName);
if (!pipeline?.uuid) {
throw new Error(`Could not find pipeline named ${pipelineName}.`);
}
return { id: pipeline.uuid, name: pipeline.name || pipelineName };
}
function isApiFailure(response) {
return response.status >= 400 || (response.json.code !== undefined && response.json.code !== 0);
}
function websocketUrl(baseUrl, pipelineId, sessionTypeValue) {
const parsed = new URL(baseUrl);
parsed.protocol = parsed.protocol === "https:" ? "wss:" : "ws:";
parsed.pathname = `/api/v1/pipelines/${encodeURIComponent(pipelineId)}/ws/connect`;
parsed.search = `?session_type=${encodeURIComponent(sessionTypeValue)}`;
return parsed.toString();
}
async function runLoad(options) {
const samples = [];
const queue = [...options.jobs];
const workers = Array.from({ length: options.concurrency }, async () => {
while (queue.length > 0) {
const job = queue.shift();
if (!job) continue;
const sample = await runSingleRequest({ ...options, job });
samples.push(sample);
}
});
await Promise.all(workers);
return samples.sort((left, right) => (
left.pipeline_label.localeCompare(right.pipeline_label) || left.index - right.index
));
}
function expectedForIndex(prefix, index) {
return `${prefix}-${String(index + 1).padStart(4, "0")}`;
}
function promptForIndex(template, expected) {
return template.replaceAll("{expected}", expected);
}
function runSingleRequest({
job,
timeoutMs,
promptTemplate,
stream,
failureSignals,
}) {
return new Promise((resolvePromise) => {
const expected = expectedForIndex(job.expectedPrefix, job.index);
const prompt = promptForIndex(promptTemplate, expected);
const sample = {
index: job.index,
pipeline_label: job.label,
pipeline_id: job.id,
pipeline_name: job.name,
status: "running",
ok: false,
expected_text: expected,
expected_prefix: job.expectedPrefix,
other_prefix: job.otherPrefix,
prompt,
response_text: "",
started_at: new Date().toISOString(),
started_epoch_ms: Date.now(),
connected_at: null,
connected_epoch_ms: null,
sent_at: null,
sent_epoch_ms: null,
first_assistant_event_at: null,
first_assistant_event_epoch_ms: null,
first_assistant_event_ms: null,
first_assistant_content_at: null,
first_assistant_content_epoch_ms: null,
first_assistant_content_ms: null,
first_response_at: null,
first_response_epoch_ms: null,
connected_ms: null,
first_response_ms: null,
response_duration_ms: null,
finished_at: null,
finished_epoch_ms: null,
event_count: 0,
same_pipeline_foreign_response_count: 0,
cross_pipeline_leak_count: 0,
last_foreign_response_text: "",
error: "",
close_code: null,
close_reason: "",
};
let closed = false;
let connectedAt = 0;
let sentAt = 0;
const startedPerf = performance.now();
let client = null;
const timer = setTimeout(() => {
finish("timeout", `Timed out after ${timeoutMs} ms.`);
}, timeoutMs);
client = openRawWebSocket(job.wsUrl, {
onOpen() {
connectedAt = performance.now();
const now = Date.now();
sample.connected_at = new Date(now).toISOString();
sample.connected_epoch_ms = now;
sample.connected_ms = rounded(connectedAt - startedPerf);
},
onMessage(text) {
sample.event_count += 1;
let data;
try {
data = JSON.parse(String(text || ""));
} catch (error) {
finish("error", `Invalid WebSocket JSON: ${error.message}`);
return;
}
appendLine(paths.networkLog, JSON.stringify({
pipeline_label: job.label,
request_index: job.index,
type: data.type,
session_type: data.session_type || "",
role: data.data?.role || "",
is_final: data.data?.is_final ?? null,
content_preview: redact(String(data.data?.content || data.message || "").slice(0, 200)),
})).catch(() => {});
if (data.type === "connected") {
sentAt = performance.now();
const now = Date.now();
sample.sent_at = new Date(now).toISOString();
sample.sent_epoch_ms = now;
client.send(JSON.stringify({
type: "message",
message: [{ type: "Plain", text: prompt }],
stream,
}));
return;
}
if (data.type === "error") {
finish("error", data.message || "WebSocket error message.");
return;
}
if (data.type !== "response" || data.data?.role !== "assistant") return;
const content = String(data.data.content || "");
markFirstAssistantEvent(sample, sentAt);
if (content) sample.response_text = content;
if (content) markFirstAssistantContent(sample, sentAt);
if (containsPipelineToken(content, job.otherPrefix)) {
sample.cross_pipeline_leak_count += 1;
finish("cross_pipeline_leak", `Pipeline ${job.label} received response from ${job.otherPrefix}: ${content}`);
return;
}
if (content.includes(expected) && sample.first_response_ms === null && sentAt > 0) {
const now = Date.now();
sample.first_response_at = new Date(now).toISOString();
sample.first_response_epoch_ms = now;
sample.first_response_ms = rounded(performance.now() - sentAt);
}
if (data.data.is_final === true) {
const ok = sample.response_text.includes(expected);
if (ok) {
if (sample.first_response_ms === null && sentAt > 0) {
const now = Date.now();
sample.first_response_at = new Date(now).toISOString();
sample.first_response_epoch_ms = now;
sample.first_response_ms = rounded(performance.now() - sentAt);
}
finish("pass", "");
} else if (matchesFailureSignal(sample.response_text, failureSignals)) {
finish("app_error", `Assistant final response matched a failure signal: ${sample.response_text}`);
} else if (containsPipelineToken(sample.response_text, job.expectedPrefix)) {
sample.same_pipeline_foreign_response_count += 1;
sample.last_foreign_response_text = sample.response_text;
} else {
finish("mismatch", `Final assistant response did not include ${expected}: ${sample.response_text}`);
}
}
},
onError(error) {
finish("connection_error", `WebSocket connection error: ${error.message}`);
},
onClose(event) {
sample.close_code = event.code;
sample.close_reason = event.reason || "";
if (!closed) finish("closed", `WebSocket closed before final assistant response: ${event.code}`);
},
});
function finish(status, reason) {
if (closed) return;
closed = true;
clearTimeout(timer);
sample.status = status;
sample.ok = status === "pass";
sample.error = status === "timeout" && sample.same_pipeline_foreign_response_count > 0
? `${reason || ""} Saw ${sample.same_pipeline_foreign_response_count} same-pipeline foreign assistant response(s); last=${sample.last_foreign_response_text}`
: reason || "";
if (sentAt > 0) sample.response_duration_ms = rounded(performance.now() - sentAt);
else sample.response_duration_ms = rounded(performance.now() - startedPerf);
const now = Date.now();
sample.finished_at = new Date(now).toISOString();
sample.finished_epoch_ms = now;
try {
client?.close();
} catch {
// Closing a failed socket should not hide the sample result.
}
resolvePromise(sample);
}
});
}
function markFirstAssistantEvent(sample, sentAt) {
if (sample.first_assistant_event_ms !== null || sentAt <= 0) return;
const now = Date.now();
sample.first_assistant_event_at = new Date(now).toISOString();
sample.first_assistant_event_epoch_ms = now;
sample.first_assistant_event_ms = rounded(performance.now() - sentAt);
}
function markFirstAssistantContent(sample, sentAt) {
if (sample.first_assistant_content_ms !== null || sentAt <= 0) return;
const now = Date.now();
sample.first_assistant_content_at = new Date(now).toISOString();
sample.first_assistant_content_epoch_ms = now;
sample.first_assistant_content_ms = rounded(performance.now() - sentAt);
}
function containsPipelineToken(text, prefix) {
const escaped = String(prefix).replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
return new RegExp(`${escaped}-\\d{4}`).test(String(text || ""));
}
function matchesFailureSignal(text, signals) {
const lower = String(text || "").toLowerCase();
return signals.some((signal) => lower.includes(signal.toLowerCase()));
}
function openRawWebSocket(wsUrl, handlers) {
const parsed = new URL(wsUrl);
const secure = parsed.protocol === "wss:";
const port = Number(parsed.port || (secure ? 443 : 80));
const host = parsed.hostname;
const path = `${parsed.pathname}${parsed.search}`;
const key = crypto.randomBytes(16).toString("base64");
const socket = secure
? tls.connect({ host, port, servername: host })
: net.connect({ host, port });
let opened = false;
let closed = false;
let buffer = Buffer.alloc(0);
socket.setNoDelay(true);
socket.on("connect", () => {
const originProtocol = secure ? "https" : "http";
const request = [
`GET ${path} HTTP/1.1`,
`Host: ${parsed.host}`,
"Upgrade: websocket",
"Connection: Upgrade",
`Sec-WebSocket-Key: ${key}`,
"Sec-WebSocket-Version: 13",
`Origin: ${originProtocol}://${parsed.host}`,
"",
"",
].join("\r\n");
socket.write(request);
});
socket.on("data", (chunk) => {
buffer = Buffer.concat([buffer, chunk]);
if (!opened) {
const headerEnd = buffer.indexOf("\r\n\r\n");
if (headerEnd === -1) return;
const headerText = buffer.slice(0, headerEnd).toString("utf8");
buffer = buffer.slice(headerEnd + 4);
if (!/^HTTP\/1\.1 101\b/i.test(headerText)) {
handlers.onError(new Error(`Handshake failed: ${headerText.split("\r\n")[0] || "missing status"}`));
socket.destroy();
return;
}
opened = true;
handlers.onOpen();
}
processFrames();
});
socket.on("error", (error) => {
if (!closed) handlers.onError(error);
});
socket.on("close", () => {
if (closed) return;
closed = true;
handlers.onClose({ code: null, reason: "" });
});
function processFrames() {
while (true) {
const frame = readFrame(buffer);
if (!frame) return;
buffer = buffer.slice(frame.consumed);
if (frame.opcode === 0x1) {
handlers.onMessage(frame.payload.toString("utf8"));
} else if (frame.opcode === 0x8) {
const code = frame.payload.length >= 2 ? frame.payload.readUInt16BE(0) : null;
const reason = frame.payload.length > 2 ? frame.payload.slice(2).toString("utf8") : "";
closed = true;
handlers.onClose({ code, reason });
socket.end();
return;
} else if (frame.opcode === 0x9) {
writeFrame(socket, 0xA, frame.payload);
}
}
}
return {
send(text) {
if (closed || !opened) return;
writeFrame(socket, 0x1, Buffer.from(text, "utf8"));
},
close() {
if (closed) return;
closed = true;
if (!socket.destroyed) {
if (opened) writeFrame(socket, 0x8, Buffer.alloc(0));
setTimeout(() => socket.end(), 50).unref();
}
},
};
}
function readFrame(buffer) {
if (buffer.length < 2) return null;
const first = buffer[0];
const second = buffer[1];
const opcode = first & 0x0f;
const masked = Boolean(second & 0x80);
let length = second & 0x7f;
let offset = 2;
if (length === 126) {
if (buffer.length < offset + 2) return null;
length = buffer.readUInt16BE(offset);
offset += 2;
} else if (length === 127) {
if (buffer.length < offset + 8) return null;
const high = buffer.readUInt32BE(offset);
const low = buffer.readUInt32BE(offset + 4);
length = high * 2 ** 32 + low;
offset += 8;
}
let mask = null;
if (masked) {
if (buffer.length < offset + 4) return null;
mask = buffer.slice(offset, offset + 4);
offset += 4;
}
if (buffer.length < offset + length) return null;
let payload = buffer.slice(offset, offset + length);
if (mask) {
payload = Buffer.from(payload);
for (let index = 0; index < payload.length; index += 1) {
payload[index] ^= mask[index % 4];
}
}
return {
opcode,
payload,
consumed: offset + length,
};
}
function writeFrame(socket, opcode, payload) {
const body = Buffer.isBuffer(payload) ? payload : Buffer.from(payload || "");
const mask = crypto.randomBytes(4);
const headerLength = body.length < 126 ? 2 : body.length <= 0xffff ? 4 : 10;
const header = Buffer.alloc(headerLength);
header[0] = 0x80 | opcode;
if (body.length < 126) {
header[1] = 0x80 | body.length;
} else if (body.length <= 0xffff) {
header[1] = 0x80 | 126;
header.writeUInt16BE(body.length, 2);
} else {
header[1] = 0x80 | 127;
header.writeUInt32BE(Math.floor(body.length / 2 ** 32), 2);
header.writeUInt32BE(body.length >>> 0, 6);
}
const masked = Buffer.from(body);
for (let index = 0; index < masked.length; index += 1) {
masked[index] ^= mask[index % 4];
}
socket.write(Buffer.concat([header, mask, masked]));
}
function buildMetrics({ samples, requestsPerPipeline, concurrency, timeoutMs, loadDurationMs, backendUrl, sessionType, fakeProviderState }) {
const okSamples = samples.filter((sample) => sample.ok);
const statusCounts = {};
const byPipeline = {};
for (const sample of samples) {
statusCounts[sample.status] = (statusCounts[sample.status] || 0) + 1;
if (!byPipeline[sample.pipeline_label]) {
byPipeline[sample.pipeline_label] = {
ok_count: 0,
error_count: 0,
cross_pipeline_leak_count: 0,
timeout_count: 0,
};
}
if (sample.ok) byPipeline[sample.pipeline_label].ok_count += 1;
else byPipeline[sample.pipeline_label].error_count += 1;
byPipeline[sample.pipeline_label].cross_pipeline_leak_count += sample.cross_pipeline_leak_count || 0;
if (sample.status === "timeout") byPipeline[sample.pipeline_label].timeout_count += 1;
}
const errorCount = samples.length - okSamples.length;
return {
probe: caseId,
backend_url: backendUrl,
session_type: sessionType,
requests_per_pipeline: requestsPerPipeline,
total_requests: requestsPerPipeline * 2,
completed_requests: samples.length,
concurrency,
timeout_ms: timeoutMs,
ok_count: okSamples.length,
error_count: errorCount,
timeout_count: samples.filter((sample) => sample.status === "timeout").length,
cross_pipeline_leak_count: samples.reduce((count, sample) => count + (sample.cross_pipeline_leak_count || 0), 0),
error_rate: samples.length === 0 ? 1 : rounded(errorCount / samples.length),
load_duration_ms: rounded(loadDurationMs),
throughput_rps: loadDurationMs <= 0 ? 0 : rounded(okSamples.length / (loadDurationMs / 1000)),
status_counts: statusCounts,
by_pipeline: byPipeline,
connected_ms: stats(samples.map((sample) => sample.connected_ms).filter(Number.isFinite)),
first_assistant_event_ms: stats(samples.map((sample) => sample.first_assistant_event_ms).filter(Number.isFinite)),
first_assistant_content_ms: stats(samples.map((sample) => sample.first_assistant_content_ms).filter(Number.isFinite)),
first_response_ms: stats(okSamples.map((sample) => sample.first_response_ms).filter(Number.isFinite)),
response_duration_ms: stats(okSamples.map((sample) => sample.response_duration_ms).filter(Number.isFinite)),
fake_provider: summarizeFakeProviderState(fakeProviderState),
provider_timing: buildProviderTimingMetrics(samples, fakeProviderState),
samples,
};
}
function buildThresholds(metrics) {
return {
cross_pipeline_leak_count: {
actual: metrics.cross_pipeline_leak_count,
max: 0,
pass: metrics.cross_pipeline_leak_count === 0,
},
error_rate: {
actual: metrics.error_rate,
max: maxErrorRate,
pass: metrics.error_rate <= maxErrorRate,
},
response_p95_ms: {
actual: metrics.response_duration_ms.p95,
max: responseP95BudgetMs,
pass: metrics.ok_count > 0 && metrics.response_duration_ms.p95 <= responseP95BudgetMs,
},
};
}
function positiveInteger(value, fallback) {
const parsed = Number.parseInt(String(value || ""), 10);
return Number.isInteger(parsed) && parsed > 0 ? parsed : fallback;
}
function positiveNumber(value, fallback) {
const parsed = Number(value || "");
return Number.isFinite(parsed) && parsed >= 0 ? parsed : fallback;
}
function bool(value, fallback) {
if (value === undefined || value === "") return fallback;
if (/^(1|true|yes|on)$/i.test(String(value))) return true;
if (/^(0|false|no|off)$/i.test(String(value))) return false;
return fallback;
}
function textList(value) {
return String(value || "")
.split(/\r?\n|,/)
.map((item) => item.trim())
.filter(Boolean);
}
function rounded(value) {
return Number(value.toFixed(3));
}
function percentile(values, percentileValue) {
if (values.length === 0) return 0;
const sorted = [...values].sort((a, b) => a - b);
const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
return rounded(sorted[index]);
}
function stats(values) {
if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
return {
min: rounded(Math.min(...values)),
p50: percentile(values, 50),
p95: percentile(values, 95),
p99: percentile(values, 99),
max: rounded(Math.max(...values)),
};
}
function looksLikeEnvIssue(error) {
const message = String(error?.message || error || "");
return /fetch failed|ECONNREFUSED|ENOTFOUND|LANGBOT_.*not configured|Could not read recovery_key|Backend did not respond/i.test(message);
}
function safeReason(value) {
return redact(String(value || "")).slice(0, 1000);
}
@@ -0,0 +1,159 @@
#!/usr/bin/env node
import { mkdir, writeFile } from "node:fs/promises";
import { join, resolve } from "node:path";
import { env, exit } from "node:process";
function pad(value, size = 2) {
return String(value).padStart(size, "0");
}
function localIsoWithOffset(date = new Date()) {
const offsetMinutes = -date.getTimezoneOffset();
const sign = offsetMinutes >= 0 ? "+" : "-";
const absolute = Math.abs(offsetMinutes);
return [
`${date.getFullYear()}-${pad(date.getMonth() + 1)}-${pad(date.getDate())}`,
`T${pad(date.getHours())}:${pad(date.getMinutes())}:${pad(date.getSeconds())}.${pad(date.getMilliseconds(), 3)}`,
`${sign}${pad(Math.floor(absolute / 60))}:${pad(absolute % 60)}`,
].join("");
}
function timestampSlug(date = new Date()) {
return date.toISOString().replace(/\.\d{3}Z$/, "Z").replace(/[^0-9A-Za-z]+/g, "-").replace(/^-|-$/g, "");
}
const scenarios = [
{
id: "provider-timeout",
target: "provider",
injected_fault: "fake provider request exceeds the configured timeout",
expected_status: "env_issue",
recovery_check: "provider route is reachable or the case remains outside product pass/fail",
cleanup: "stop fake provider or reset proxy route",
},
{
id: "plugin-runtime-disconnect",
target: "plugin-runtime",
injected_fault: "runtime control channel disconnects during an action",
expected_status: "fail",
recovery_check: "runtime reconnects and a deterministic plugin action succeeds",
cleanup: "restart the local plugin runtime process",
},
{
id: "mcp-stdio-server-exit",
target: "mcp",
injected_fault: "stdio server exits mid-call",
expected_status: "fail",
recovery_check: "server can be registered again and exposes the expected tool",
cleanup: "remove temporary MCP server registration",
},
{
id: "operator-missing-login",
target: "webui",
injected_fault: "browser profile is not authenticated",
expected_status: "blocked",
recovery_check: "authenticated profile can open the same WebUI origin",
cleanup: "no product cleanup; refresh local login state",
},
{
id: "transient-marketplace-timeout",
target: "marketplace",
injected_fault: "marketplace request times out once and then succeeds",
expected_status: "flaky",
recovery_check: "rerun passes with the same product revision and no code change",
cleanup: "clear retry-only evidence and keep the run classified as flaky",
},
];
function validateScenario(scenario) {
const missing = ["id", "target", "injected_fault", "expected_status", "recovery_check", "cleanup"]
.filter((key) => !scenario[key]);
const allowedStatuses = new Set(["pass", "fail", "blocked", "env_issue", "flaky"]);
return {
id: scenario.id,
pass: missing.length === 0 && allowedStatuses.has(scenario.expected_status),
missing,
expected_status: scenario.expected_status,
};
}
async function main() {
const root = resolve(env.LBS_ROOT || process.cwd());
const caseId = "langbot-fault-taxonomy-contract";
const runId = env.LBS_RUN_ID || `${timestampSlug()}-${caseId}`;
const evidenceDir = resolve(env.LBS_EVIDENCE_DIR || join(root, "reports", "evidence", runId));
await mkdir(evidenceDir, { recursive: true });
const startedAt = new Date();
const validations = scenarios.map(validateScenario);
const statusCounts = {};
for (const scenario of scenarios) {
statusCounts[scenario.expected_status] = (statusCounts[scenario.expected_status] || 0) + 1;
}
const metrics = {
probe: caseId,
scenario_count: scenarios.length,
status_counts: statusCounts,
scenarios,
validations,
};
const thresholds = {
scenario_count: { actual: scenarios.length, min: 5, pass: scenarios.length >= 5 },
invalid_scenario_count: {
actual: validations.filter((item) => !item.pass).length,
max: 0,
pass: validations.every((item) => item.pass),
},
cleanup_declared_count: {
actual: scenarios.filter((item) => item.cleanup).length,
min: scenarios.length,
pass: scenarios.every((item) => item.cleanup),
},
};
const status = Object.values(thresholds).every((item) => item.pass) ? "pass" : "fail";
const metricsPath = join(evidenceDir, "metrics.json");
const faultModelPath = join(evidenceDir, "fault-model.json");
const automationResultPath = join(evidenceDir, "automation-result.json");
const resultPath = join(evidenceDir, "result.json");
await writeFile(metricsPath, `${JSON.stringify(metrics, null, 2)}\n`, "utf8");
await writeFile(faultModelPath, `${JSON.stringify({ scenarios }, null, 2)}\n`, "utf8");
const finishedAt = new Date();
const result = {
source: "automation",
case_id: caseId,
run_id: runId,
status,
reason: status === "pass"
? "Fault taxonomy contract declares status, recovery, and cleanup for every scenario."
: "Fault taxonomy contract is missing required scenario fields.",
started_at: startedAt.toISOString(),
started_at_local: localIsoWithOffset(startedAt),
finished_at: finishedAt.toISOString(),
finished_at_local: localIsoWithOffset(finishedAt),
duration_ms: finishedAt.getTime() - startedAt.getTime(),
metrics_summary: {
scenario_count: metrics.scenario_count,
status_counts: metrics.status_counts,
invalid_scenario_count: thresholds.invalid_scenario_count.actual,
},
thresholds_summary: thresholds,
artifacts: {
metrics_json: metricsPath,
fault_model_json: faultModelPath,
automation_result_json: automationResultPath,
result_json: resultPath,
},
evidence_collected: ["metrics", "filesystem"],
};
const resultText = `${JSON.stringify(result, null, 2)}\n`;
await writeFile(automationResultPath, resultText, "utf8");
await writeFile(resultPath, resultText, "utf8");
console.log(JSON.stringify(result, null, 2));
exit(status === "pass" ? 0 : 1);
}
await main();
@@ -0,0 +1,212 @@
#!/usr/bin/env node
import { mkdir, writeFile } from "node:fs/promises";
import { join, resolve } from "node:path";
import { env, exit } from "node:process";
function pad(value, size = 2) {
return String(value).padStart(size, "0");
}
function localIsoWithOffset(date = new Date()) {
const offsetMinutes = -date.getTimezoneOffset();
const sign = offsetMinutes >= 0 ? "+" : "-";
const absolute = Math.abs(offsetMinutes);
return [
`${date.getFullYear()}-${pad(date.getMonth() + 1)}-${pad(date.getDate())}`,
`T${pad(date.getHours())}:${pad(date.getMinutes())}:${pad(date.getSeconds())}.${pad(date.getMilliseconds(), 3)}`,
`${sign}${pad(Math.floor(absolute / 60))}:${pad(absolute % 60)}`,
].join("");
}
function timestampSlug(date = new Date()) {
return date.toISOString().replace(/\.\d{3}Z$/, "Z").replace(/[^0-9A-Za-z]+/g, "-").replace(/^-|-$/g, "");
}
function percentile(values, percentileValue) {
if (values.length === 0) return 0;
const sorted = [...values].sort((a, b) => a - b);
const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
return Number(sorted[index].toFixed(3));
}
function stats(values) {
if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
return {
min: Number(Math.min(...values).toFixed(3)),
p50: percentile(values, 50),
p95: percentile(values, 95),
p99: percentile(values, 99),
max: Number(Math.max(...values).toFixed(3)),
};
}
function parseJsonList(value, fallback) {
if (!value) return fallback;
try {
const parsed = JSON.parse(value);
return Array.isArray(parsed) && parsed.every((item) => typeof item === "string") ? parsed : fallback;
} catch {
return fallback;
}
}
function joinUrl(baseUrl, path) {
const base = baseUrl.replace(/\/+$/, "");
const suffix = path.startsWith("/") ? path : `/${path}`;
return `${base}${suffix}`;
}
async function fetchOnce(url, timeoutMs) {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), timeoutMs);
const started = performance.now();
try {
const response = await fetch(url, { method: "GET", signal: controller.signal });
await response.arrayBuffer();
const latencyMs = performance.now() - started;
return {
url,
ok: response.status < 500,
status: response.status,
latency_ms: Number(latencyMs.toFixed(3)),
error: "",
};
} catch (error) {
const latencyMs = performance.now() - started;
return {
url,
ok: false,
status: 0,
latency_ms: Number(latencyMs.toFixed(3)),
error: error instanceof Error ? error.message : String(error),
};
} finally {
clearTimeout(timeout);
}
}
async function runBatches(urls, totalRequests, concurrency, timeoutMs) {
const queue = Array.from({ length: totalRequests }, (_, index) => urls[index % urls.length]);
const results = [];
while (queue.length > 0) {
const batch = queue.splice(0, concurrency);
results.push(...await Promise.all(batch.map((url) => fetchOnce(url, timeoutMs))));
}
return results;
}
async function main() {
const root = resolve(env.LBS_ROOT || process.cwd());
const caseId = "langbot-live-backend-latency";
const runId = env.LBS_RUN_ID || `${timestampSlug()}-${caseId}`;
const evidenceDir = resolve(env.LBS_EVIDENCE_DIR || join(root, "reports", "evidence", runId));
await mkdir(evidenceDir, { recursive: true });
const startedAt = new Date();
const backendUrl = env.LANGBOT_BACKEND_URL || "";
const endpoints = parseJsonList(env.LANGBOT_PERF_ENDPOINTS_JSON, ["/healthz"]);
const totalRequests = Number(env.LANGBOT_PERF_REQUESTS || "12");
const concurrency = Number(env.LANGBOT_PERF_CONCURRENCY || "2");
const timeoutMs = Number(env.LANGBOT_PERF_TIMEOUT_MS || "5000");
const p95BudgetMs = Number(env.LANGBOT_PERF_BACKEND_P95_MS || "1000");
const maxErrorRate = Number(env.LANGBOT_PERF_MAX_ERROR_RATE || "0");
const metricsPath = join(evidenceDir, "metrics.json");
const networkLogPath = join(evidenceDir, "network.log");
const automationResultPath = join(evidenceDir, "automation-result.json");
const resultPath = join(evidenceDir, "result.json");
let status = "fail";
let reason = "";
let results = [];
if (!backendUrl) {
status = "env_issue";
reason = "LANGBOT_BACKEND_URL is not configured.";
} else {
const urls = endpoints.map((path) => joinUrl(backendUrl, path));
results = await runBatches(urls, totalRequests, concurrency, timeoutMs);
const okCount = results.filter((item) => item.ok).length;
const errorCount = results.length - okCount;
const errorRate = results.length === 0 ? 1 : errorCount / results.length;
const latencies = results.filter((item) => item.ok).map((item) => item.latency_ms);
const latencyStats = stats(latencies);
const allConnectionFailures = results.length > 0 && results.every((item) => item.status === 0);
if (allConnectionFailures) {
status = "env_issue";
reason = `Backend did not respond at ${backendUrl}.`;
} else if (latencyStats.p95 <= p95BudgetMs && errorRate <= maxErrorRate) {
status = "pass";
reason = "Live backend latency probe passed all thresholds.";
} else {
status = "fail";
reason = "Live backend latency probe breached latency or error-rate thresholds.";
}
}
const statusCounts = {};
for (const item of results) {
const key = item.status === 0 ? "network_error" : String(item.status);
statusCounts[key] = (statusCounts[key] || 0) + 1;
}
const okResults = results.filter((item) => item.ok);
const metrics = {
probe: caseId,
backend_url: backendUrl,
endpoints,
total_requests: totalRequests,
concurrency,
timeout_ms: timeoutMs,
ok_count: okResults.length,
error_count: results.length - okResults.length,
error_rate: results.length === 0 ? 1 : Number(((results.length - okResults.length) / results.length).toFixed(4)),
latency_ms: stats(okResults.map((item) => item.latency_ms)),
status_counts: statusCounts,
};
const thresholds = {
backend_p95_ms: { actual: metrics.latency_ms.p95, max: p95BudgetMs, pass: metrics.latency_ms.p95 <= p95BudgetMs },
error_rate: { actual: metrics.error_rate, max: maxErrorRate, pass: metrics.error_rate <= maxErrorRate },
};
await writeFile(metricsPath, `${JSON.stringify({ ...metrics, samples: results }, null, 2)}\n`, "utf8");
await writeFile(networkLogPath, results.map((item) => JSON.stringify(item)).join("\n") + (results.length > 0 ? "\n" : ""), "utf8");
const finishedAt = new Date();
const result = {
source: "automation",
case_id: caseId,
run_id: runId,
status,
reason,
started_at: startedAt.toISOString(),
started_at_local: localIsoWithOffset(startedAt),
finished_at: finishedAt.toISOString(),
finished_at_local: localIsoWithOffset(finishedAt),
duration_ms: finishedAt.getTime() - startedAt.getTime(),
url: backendUrl,
metrics_summary: {
requests: metrics.total_requests,
concurrency: metrics.concurrency,
ok_count: metrics.ok_count,
error_rate: metrics.error_rate,
latency_p50_ms: metrics.latency_ms.p50,
latency_p95_ms: metrics.latency_ms.p95,
status_counts: metrics.status_counts,
},
thresholds_summary: thresholds,
artifacts: {
metrics_json: metricsPath,
network_log: networkLogPath,
automation_result_json: automationResultPath,
result_json: resultPath,
},
evidence_collected: ["metrics", "network", "api_diagnostic", "filesystem"],
};
const resultText = `${JSON.stringify(result, null, 2)}\n`;
await writeFile(automationResultPath, resultText, "utf8");
await writeFile(resultPath, resultText, "utf8");
console.log(JSON.stringify(result, null, 2));
exit(status === "pass" ? 0 : status === "env_issue" ? 2 : 1);
}
await main();
@@ -0,0 +1,205 @@
#!/usr/bin/env node
import { existsSync, readdirSync, statSync } from "node:fs";
import { mkdir, readFile, writeFile } from "node:fs/promises";
import { join, resolve } from "node:path";
import { env, exit } from "node:process";
function pad(value, size = 2) {
return String(value).padStart(size, "0");
}
function localIsoWithOffset(date = new Date()) {
const offsetMinutes = -date.getTimezoneOffset();
const sign = offsetMinutes >= 0 ? "+" : "-";
const absolute = Math.abs(offsetMinutes);
return [
`${date.getFullYear()}-${pad(date.getMonth() + 1)}-${pad(date.getDate())}`,
`T${pad(date.getHours())}:${pad(date.getMinutes())}:${pad(date.getSeconds())}.${pad(date.getMilliseconds(), 3)}`,
`${sign}${pad(Math.floor(absolute / 60))}:${pad(absolute % 60)}`,
].join("");
}
function timestampSlug(date = new Date()) {
return date.toISOString().replace(/\.\d{3}Z$/, "Z").replace(/[^0-9A-Za-z]+/g, "-").replace(/^-|-$/g, "");
}
function repoRootFromEnv(root) {
return env.LANGBOT_REPO ? resolve(env.LANGBOT_REPO) : resolve(root, "..");
}
function latestBackendLog(root) {
const explicit = env.LANGBOT_BACKEND_LOG;
if (explicit) return resolve(explicit);
const logsDir = join(repoRootFromEnv(root), "data", "logs");
if (!existsSync(logsDir)) return "";
const candidates = readdirSync(logsDir)
.filter((name) => /^langbot-.*\.log$/.test(name))
.map((name) => join(logsDir, name))
.filter((path) => {
try {
return statSync(path).isFile();
} catch {
return false;
}
})
.sort((left, right) => statSync(right).mtimeMs - statSync(left).mtimeMs);
return candidates[0] || "";
}
function parseSince(startedAt) {
if (env.LANGBOT_BACKEND_LOG_SINCE) return new Date(env.LANGBOT_BACKEND_LOG_SINCE);
const lookbackSeconds = Number(env.LANGBOT_BACKEND_LOG_LOOKBACK_SECONDS || "300");
return new Date(startedAt.getTime() - lookbackSeconds * 1000);
}
function parseTimestamp(line, year) {
const localMatch = line.match(/^\[(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})\.(\d{3})\]/);
if (localMatch) {
const [, month, day, hour, minute, second, millisecond] = localMatch;
return new Date(`${year}-${month}-${day}T${hour}:${minute}:${second}.${millisecond}+08:00`);
}
const accessMatch = line.match(/^\[(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2}) ([+-]\d{4})\]/);
if (accessMatch) {
const [, fullYear, month, day, hour, minute, second, offset] = accessMatch;
const normalizedOffset = `${offset.slice(0, 3)}:${offset.slice(3)}`;
return new Date(`${fullYear}-${month}-${day}T${hour}:${minute}:${second}${normalizedOffset}`);
}
return null;
}
function findingForLine(line, number) {
const rules = [
{ severity: "fail", kind: "python_traceback", pattern: /\bTraceback(?: \(most recent call last\))?/i },
{ severity: "fail", kind: "unretrieved_task_exception", pattern: /Task exception was never retrieved/i },
{ severity: "fail", kind: "unawaited_coroutine", pattern: /RuntimeWarning:\s+coroutine .* was never awaited/i },
{ severity: "fail", kind: "unclosed_client_session", pattern: /Unclosed client session/i },
{ severity: "fail", kind: "unclosed_connector", pattern: /Unclosed connector/i },
{ severity: "fail", kind: "import_error", pattern: /\bImportError\b/i },
{ severity: "fail", kind: "error_log", pattern: /\b(?:ERROR|CRITICAL)\b/ },
{ severity: "warning", kind: "warning_log", pattern: /\bWARNING\b/ },
];
for (const rule of rules) {
if (rule.pattern.test(line)) {
return {
severity: rule.severity,
kind: rule.kind,
line: number,
excerpt: line,
};
}
}
return null;
}
function scanLines(text, since, year) {
const findings = [];
const scanned = [];
let includeContinuation = false;
const lines = text.split(/\r?\n/);
for (const [index, line] of lines.entries()) {
const number = index + 1;
const timestamp = parseTimestamp(line, year);
if (timestamp) includeContinuation = timestamp >= since;
if (!includeContinuation) continue;
scanned.push({ number, text: line });
const finding = findingForLine(line, number);
if (finding) findings.push(finding);
}
return { findings, scanned, total_lines: lines.length };
}
async function main() {
const root = resolve(env.LBS_ROOT || process.cwd());
const caseId = "langbot-live-backend-log-health";
const runId = env.LBS_RUN_ID || `${timestampSlug()}-${caseId}`;
const evidenceDir = resolve(env.LBS_EVIDENCE_DIR || join(root, "reports", "evidence", runId));
await mkdir(evidenceDir, { recursive: true });
const startedAt = new Date();
const since = parseSince(startedAt);
const logPath = latestBackendLog(root);
const metricsPath = join(evidenceDir, "metrics.json");
const findingsPath = join(evidenceDir, "findings.json");
const scannedLogPath = join(evidenceDir, "scanned-backend.log");
const automationResultPath = join(evidenceDir, "automation-result.json");
const resultPath = join(evidenceDir, "result.json");
let status = "fail";
let reason = "";
let scan = { findings: [], scanned: [], total_lines: 0 };
if (!logPath || !existsSync(logPath)) {
status = "env_issue";
reason = "No LangBot backend log file was found. Set LANGBOT_BACKEND_LOG or LANGBOT_REPO.";
} else {
const text = await readFile(logPath, "utf8");
scan = scanLines(text, since, startedAt.getFullYear());
const failCount = scan.findings.filter((item) => item.severity === "fail").length;
status = failCount === 0 ? "pass" : "fail";
reason = status === "pass"
? "Live backend log health passed; no fail-severity findings in the scanned window."
: "Live backend log health found fail-severity backend log findings.";
}
const warningCount = scan.findings.filter((item) => item.severity === "warning").length;
const failCount = scan.findings.filter((item) => item.severity === "fail").length;
const metrics = {
probe: caseId,
backend_log: logPath,
since: since.toISOString(),
scanned_line_count: scan.scanned.length,
total_line_count: scan.total_lines,
fail_count: failCount,
warning_count: warningCount,
finding_count: scan.findings.length,
};
const thresholds = {
fail_count: { actual: failCount, max: 0, pass: failCount === 0 },
};
await writeFile(metricsPath, `${JSON.stringify(metrics, null, 2)}\n`, "utf8");
await writeFile(findingsPath, `${JSON.stringify(scan.findings, null, 2)}\n`, "utf8");
await writeFile(scannedLogPath, scan.scanned.map((item) => `${item.number}: ${item.text}`).join("\n") + (scan.scanned.length > 0 ? "\n" : ""), "utf8");
const finishedAt = new Date();
const result = {
source: "automation",
case_id: caseId,
run_id: runId,
status,
reason,
started_at: startedAt.toISOString(),
started_at_local: localIsoWithOffset(startedAt),
finished_at: finishedAt.toISOString(),
finished_at_local: localIsoWithOffset(finishedAt),
duration_ms: finishedAt.getTime() - startedAt.getTime(),
url: logPath,
metrics_summary: {
scanned_line_count: metrics.scanned_line_count,
fail_count: metrics.fail_count,
warning_count: metrics.warning_count,
finding_count: metrics.finding_count,
},
thresholds_summary: thresholds,
artifacts: {
metrics_json: metricsPath,
findings_json: findingsPath,
scanned_backend_log: scannedLogPath,
automation_result_json: automationResultPath,
result_json: resultPath,
},
evidence_collected: ["metrics", "backend_log", "filesystem"],
};
const resultText = `${JSON.stringify(result, null, 2)}\n`;
await writeFile(automationResultPath, resultText, "utf8");
await writeFile(resultPath, resultText, "utf8");
console.log(JSON.stringify(result, null, 2));
exit(status === "pass" ? 0 : status === "env_issue" ? 2 : 1);
}
await main();
@@ -0,0 +1,311 @@
#!/usr/bin/env node
import { mkdir, writeFile } from "node:fs/promises";
import { join, resolve } from "node:path";
import { env, exit } from "node:process";
function pad(value, size = 2) {
return String(value).padStart(size, "0");
}
function localIsoWithOffset(date = new Date()) {
const offsetMinutes = -date.getTimezoneOffset();
const sign = offsetMinutes >= 0 ? "+" : "-";
const absolute = Math.abs(offsetMinutes);
return [
`${date.getFullYear()}-${pad(date.getMonth() + 1)}-${pad(date.getDate())}`,
`T${pad(date.getHours())}:${pad(date.getMinutes())}:${pad(date.getSeconds())}.${pad(date.getMilliseconds(), 3)}`,
`${sign}${pad(Math.floor(absolute / 60))}:${pad(absolute % 60)}`,
].join("");
}
function timestampSlug(date = new Date()) {
return date.toISOString().replace(/\.\d{3}Z$/, "Z").replace(/[^0-9A-Za-z]+/g, "-").replace(/^-|-$/g, "");
}
function percentile(values, percentileValue) {
if (values.length === 0) return 0;
const sorted = [...values].sort((a, b) => a - b);
const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
return Number(sorted[index].toFixed(3));
}
function stats(values) {
if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
return {
min: Number(Math.min(...values).toFixed(3)),
p50: percentile(values, 50),
p95: percentile(values, 95),
p99: percentile(values, 99),
max: Number(Math.max(...values).toFixed(3)),
};
}
function joinUrl(baseUrl, path) {
const base = baseUrl.replace(/\/+$/, "");
const suffix = path.startsWith("/") ? path : `/${path}`;
return `${base}${suffix}`;
}
function parseJsonObject(value, fallback) {
if (!value) return fallback;
try {
const parsed = JSON.parse(value);
return parsed && typeof parsed === "object" && !Array.isArray(parsed) ? parsed : fallback;
} catch {
return fallback;
}
}
function controlPlaneEndpoints() {
return [
{
id: "healthz",
path: "/healthz",
expected_status: 200,
expected_code: 0,
p95_budget_ms: Number(env.LANGBOT_PERF_HEALTHZ_P95_MS || "500"),
required_data_fields: [],
},
{
id: "system_info",
path: "/api/v1/system/info",
expected_status: 200,
expected_code: 0,
p95_budget_ms: Number(env.LANGBOT_PERF_SYSTEM_INFO_P95_MS || "1000"),
required_data_fields: ["version", "edition", "enable_marketplace"],
},
];
}
async function fetchEndpoint(backendUrl, endpoint, timeoutMs) {
const url = joinUrl(backendUrl, endpoint.path);
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), timeoutMs);
const started = performance.now();
let bodyText = "";
let json = null;
let jsonValid = false;
let error = "";
try {
const response = await fetch(url, {
method: "GET",
headers: { "accept": "application/json" },
signal: controller.signal,
});
bodyText = await response.text();
try {
json = bodyText ? JSON.parse(bodyText) : null;
jsonValid = json !== null;
} catch (parseError) {
error = parseError instanceof Error ? parseError.message : String(parseError);
}
const data = json && typeof json === "object" && json.data && typeof json.data === "object" ? json.data : {};
const missingFields = endpoint.required_data_fields.filter((field) => !(field in data));
const statusOk = response.status === endpoint.expected_status;
const codeOk = !json || typeof json !== "object" ? false : json.code === endpoint.expected_code;
const shapeOk = jsonValid && missingFields.length === 0;
const latencyMs = performance.now() - started;
return {
endpoint_id: endpoint.id,
path: endpoint.path,
url,
status: response.status,
ok: statusOk && codeOk && shapeOk,
status_ok: statusOk,
code_ok: codeOk,
json_valid: jsonValid,
missing_fields: missingFields,
response_code: json && typeof json === "object" ? json.code : null,
latency_ms: Number(latencyMs.toFixed(3)),
error,
};
} catch (fetchError) {
const latencyMs = performance.now() - started;
return {
endpoint_id: endpoint.id,
path: endpoint.path,
url,
status: 0,
ok: false,
status_ok: false,
code_ok: false,
json_valid: false,
missing_fields: endpoint.required_data_fields,
response_code: null,
latency_ms: Number(latencyMs.toFixed(3)),
error: fetchError instanceof Error ? fetchError.message : String(fetchError),
};
} finally {
clearTimeout(timeout);
}
}
async function runBatches(backendUrl, endpoints, totalRequests, concurrency, timeoutMs) {
const queue = Array.from({ length: totalRequests }, (_, index) => endpoints[index % endpoints.length]);
const results = [];
while (queue.length > 0) {
const batch = queue.splice(0, concurrency);
results.push(...await Promise.all(batch.map((endpoint) => fetchEndpoint(backendUrl, endpoint, timeoutMs))));
}
return results;
}
function endpointMetrics(endpoints, results) {
return Object.fromEntries(endpoints.map((endpoint) => {
const samples = results.filter((item) => item.endpoint_id === endpoint.id);
const okSamples = samples.filter((item) => item.ok);
return [
endpoint.id,
{
path: endpoint.path,
requests: samples.length,
ok_count: okSamples.length,
error_rate: samples.length === 0 ? 1 : Number(((samples.length - okSamples.length) / samples.length).toFixed(4)),
latency_ms: stats(okSamples.map((item) => item.latency_ms)),
p95_budget_ms: endpoint.p95_budget_ms,
},
];
}));
}
async function main() {
const root = resolve(env.LBS_ROOT || process.cwd());
const caseId = "langbot-live-control-plane-api";
const runId = env.LBS_RUN_ID || `${timestampSlug()}-${caseId}`;
const evidenceDir = resolve(env.LBS_EVIDENCE_DIR || join(root, "reports", "evidence", runId));
await mkdir(evidenceDir, { recursive: true });
const startedAt = new Date();
const backendUrl = env.LANGBOT_BACKEND_URL || "";
const endpoints = controlPlaneEndpoints();
const configuredBudgets = parseJsonObject(env.LANGBOT_CONTROL_PLANE_P95_BUDGETS_JSON, {});
for (const endpoint of endpoints) {
const budget = configuredBudgets[endpoint.id];
if (typeof budget === "number" && Number.isFinite(budget)) endpoint.p95_budget_ms = budget;
}
const totalRequests = Number(env.LANGBOT_CONTROL_PLANE_REQUESTS || "20");
const concurrency = Number(env.LANGBOT_CONTROL_PLANE_CONCURRENCY || "4");
const timeoutMs = Number(env.LANGBOT_CONTROL_PLANE_TIMEOUT_MS || "5000");
const maxErrorRate = Number(env.LANGBOT_CONTROL_PLANE_MAX_ERROR_RATE || "0");
const metricsPath = join(evidenceDir, "metrics.json");
const endpointsPath = join(evidenceDir, "endpoints.json");
const networkLogPath = join(evidenceDir, "network.log");
const automationResultPath = join(evidenceDir, "automation-result.json");
const resultPath = join(evidenceDir, "result.json");
let status = "fail";
let reason = "";
let results = [];
if (!backendUrl) {
status = "env_issue";
reason = "LANGBOT_BACKEND_URL is not configured.";
} else {
results = await runBatches(backendUrl, endpoints, totalRequests, concurrency, timeoutMs);
const allConnectionFailures = results.length > 0 && results.every((item) => item.status === 0);
if (allConnectionFailures) {
status = "env_issue";
reason = `Backend did not respond at ${backendUrl}.`;
}
}
const okResults = results.filter((item) => item.ok);
const statusCounts = {};
for (const item of results) {
const key = item.status === 0 ? "network_error" : String(item.status);
statusCounts[key] = (statusCounts[key] || 0) + 1;
}
const perEndpoint = endpointMetrics(endpoints, results);
const responseShapeFailures = results.filter((item) => !item.json_valid || item.missing_fields.length > 0 || !item.code_ok).length;
const errorRate = results.length === 0 ? 1 : Number(((results.length - okResults.length) / results.length).toFixed(4));
const thresholds = {
error_rate: { actual: errorRate, max: maxErrorRate, pass: errorRate <= maxErrorRate },
response_shape_failures: { actual: responseShapeFailures, max: 0, pass: responseShapeFailures === 0 },
};
for (const endpoint of endpoints) {
const actual = perEndpoint[endpoint.id].latency_ms.p95;
thresholds[`${endpoint.id}_p95_ms`] = {
actual,
max: endpoint.p95_budget_ms,
pass: actual <= endpoint.p95_budget_ms,
};
}
if (status !== "env_issue") {
const passed = Object.values(thresholds).every((item) => item.pass);
status = passed ? "pass" : "fail";
reason = passed
? "Live control-plane API probe passed all thresholds."
: "Live control-plane API probe breached shape, latency, or error-rate thresholds.";
}
const metrics = {
probe: caseId,
backend_url: backendUrl,
total_requests: totalRequests,
concurrency,
timeout_ms: timeoutMs,
ok_count: okResults.length,
error_count: results.length - okResults.length,
error_rate: errorRate,
status_counts: statusCounts,
response_shape_failures: responseShapeFailures,
endpoints: perEndpoint,
};
await writeFile(metricsPath, `${JSON.stringify({ ...metrics, samples: results }, null, 2)}\n`, "utf8");
await writeFile(endpointsPath, `${JSON.stringify(endpoints, null, 2)}\n`, "utf8");
await writeFile(networkLogPath, results.map((item) => JSON.stringify(item)).join("\n") + (results.length > 0 ? "\n" : ""), "utf8");
const finishedAt = new Date();
const result = {
source: "automation",
case_id: caseId,
run_id: runId,
status,
reason,
started_at: startedAt.toISOString(),
started_at_local: localIsoWithOffset(startedAt),
finished_at: finishedAt.toISOString(),
finished_at_local: localIsoWithOffset(finishedAt),
duration_ms: finishedAt.getTime() - startedAt.getTime(),
url: backendUrl,
metrics_summary: {
requests: metrics.total_requests,
concurrency: metrics.concurrency,
ok_count: metrics.ok_count,
error_rate: metrics.error_rate,
response_shape_failures: metrics.response_shape_failures,
endpoints: Object.fromEntries(Object.entries(metrics.endpoints).map(([id, value]) => [
id,
{
path: value.path,
ok_count: value.ok_count,
error_rate: value.error_rate,
latency_p50_ms: value.latency_ms.p50,
latency_p95_ms: value.latency_ms.p95,
},
])),
status_counts: metrics.status_counts,
},
thresholds_summary: thresholds,
artifacts: {
metrics_json: metricsPath,
endpoints_json: endpointsPath,
network_log: networkLogPath,
automation_result_json: automationResultPath,
result_json: resultPath,
},
evidence_collected: ["metrics", "network", "api_diagnostic", "filesystem"],
};
const resultText = `${JSON.stringify(result, null, 2)}\n`;
await writeFile(automationResultPath, resultText, "utf8");
await writeFile(resultPath, resultText, "utf8");
console.log(JSON.stringify(result, null, 2));
exit(status === "pass" ? 0 : status === "env_issue" ? 2 : 1);
}
await main();
@@ -0,0 +1,162 @@
#!/usr/bin/env node
import { mkdir, writeFile } from "node:fs/promises";
import { join, resolve } from "node:path";
import { env, exit } from "node:process";
function pad(value, size = 2) {
return String(value).padStart(size, "0");
}
function localIsoWithOffset(date = new Date()) {
const offsetMinutes = -date.getTimezoneOffset();
const sign = offsetMinutes >= 0 ? "+" : "-";
const absolute = Math.abs(offsetMinutes);
return [
`${date.getFullYear()}-${pad(date.getMonth() + 1)}-${pad(date.getDate())}`,
`T${pad(date.getHours())}:${pad(date.getMinutes())}:${pad(date.getSeconds())}.${pad(date.getMilliseconds(), 3)}`,
`${sign}${pad(Math.floor(absolute / 60))}:${pad(absolute % 60)}`,
].join("");
}
function timestampSlug(date = new Date()) {
return date.toISOString().replace(/\.\d{3}Z$/, "Z").replace(/[^0-9A-Za-z]+/g, "-").replace(/^-|-$/g, "");
}
function percentile(values, percentileValue) {
if (values.length === 0) return 0;
const sorted = [...values].sort((a, b) => a - b);
const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
return Number(sorted[index].toFixed(3));
}
function stats(values) {
return {
min: Number(Math.min(...values).toFixed(3)),
p50: percentile(values, 50),
p95: percentile(values, 95),
p99: percentile(values, 99),
max: Number(Math.max(...values).toFixed(3)),
};
}
function threshold(actual, limit, operator) {
const pass = operator === "<=" ? actual <= limit : actual >= limit;
return { actual, [operator === "<=" ? "max" : "min"]: limit, pass };
}
function makeSample(index) {
const ingress = 1 + (index % 5) * 0.22;
const pipeline = 2.8 + (index % 7) * 0.31;
const persistence = 1.1 + (index % 4) * 0.2;
const pluginIpc = 1.9 + (index % 6) * 0.27;
const rag = index % 3 === 0 ? 4.4 : 0.8 + (index % 5) * 0.18;
const streaming = 1.5 + (index % 8) * 0.24;
const provider = 80 + (index % 13) * 11;
const externalTool = index % 4 === 0 ? 25 + (index % 9) * 3 : 0;
const network = 8 + (index % 10) * 1.7;
const overhead = ingress + pipeline + persistence + pluginIpc + rag + streaming;
const external = provider + externalTool + network;
const total = overhead + external;
return {
index,
segments_ms: {
ingress,
pipeline,
persistence,
plugin_ipc: pluginIpc,
rag,
streaming,
provider,
external_tool: externalTool,
network,
},
langbot_overhead_ms: Number(overhead.toFixed(3)),
external_latency_ms: Number(external.toFixed(3)),
e2e_latency_ms: Number(total.toFixed(3)),
accounting_gap_ms: Number((total - external - overhead).toFixed(6)),
};
}
async function main() {
const root = resolve(env.LBS_ROOT || process.cwd());
const caseId = "langbot-overhead-accounting-contract";
const runId = env.LBS_RUN_ID || `${timestampSlug()}-${caseId}`;
const evidenceDir = resolve(env.LBS_EVIDENCE_DIR || join(root, "reports", "evidence", runId));
await mkdir(evidenceDir, { recursive: true });
const startedAt = new Date();
const sampleCount = Number(env.LANGBOT_PERF_CONTRACT_SAMPLES || "80");
const overheadP95BudgetMs = Number(env.LANGBOT_PERF_OVERHEAD_P95_MS || "25");
const samples = Array.from({ length: sampleCount }, (_, index) => makeSample(index));
const overheads = samples.map((sample) => sample.langbot_overhead_ms);
const e2e = samples.map((sample) => sample.e2e_latency_ms);
const external = samples.map((sample) => sample.external_latency_ms);
const gaps = samples.map((sample) => Math.abs(sample.accounting_gap_ms));
const memory = process.memoryUsage();
const metrics = {
probe: caseId,
sample_count: sampleCount,
langbot_overhead_ms: stats(overheads),
e2e_latency_ms: stats(e2e),
external_latency_ms: stats(external),
accounting_gap_max_ms: Number(Math.max(...gaps).toFixed(6)),
samples,
};
const thresholds = {
sample_count: threshold(sampleCount, 50, ">="),
langbot_overhead_p95_ms: threshold(metrics.langbot_overhead_ms.p95, overheadP95BudgetMs, "<="),
accounting_gap_max_ms: threshold(metrics.accounting_gap_max_ms, 0.001, "<="),
};
const status = Object.values(thresholds).every((item) => item.pass) ? "pass" : "fail";
const metricsPath = join(evidenceDir, "metrics.json");
const thresholdsPath = join(evidenceDir, "thresholds.json");
const resourceLogPath = join(evidenceDir, "resource-log.json");
const automationResultPath = join(evidenceDir, "automation-result.json");
const resultPath = join(evidenceDir, "result.json");
await writeFile(metricsPath, `${JSON.stringify(metrics, null, 2)}\n`, "utf8");
await writeFile(thresholdsPath, `${JSON.stringify(thresholds, null, 2)}\n`, "utf8");
await writeFile(resourceLogPath, `${JSON.stringify({ memory, pid: process.pid }, null, 2)}\n`, "utf8");
const finishedAt = new Date();
const result = {
source: "automation",
case_id: caseId,
run_id: runId,
status,
reason: status === "pass"
? "Overhead accounting contract passed all thresholds."
: "Overhead accounting contract breached one or more thresholds.",
started_at: startedAt.toISOString(),
started_at_local: localIsoWithOffset(startedAt),
finished_at: finishedAt.toISOString(),
finished_at_local: localIsoWithOffset(finishedAt),
duration_ms: finishedAt.getTime() - startedAt.getTime(),
metrics_summary: {
sample_count: metrics.sample_count,
langbot_overhead_p95_ms: metrics.langbot_overhead_ms.p95,
e2e_latency_p95_ms: metrics.e2e_latency_ms.p95,
external_latency_p95_ms: metrics.external_latency_ms.p95,
accounting_gap_max_ms: metrics.accounting_gap_max_ms,
},
thresholds_summary: thresholds,
artifacts: {
metrics_json: metricsPath,
thresholds_json: thresholdsPath,
resource_log_json: resourceLogPath,
automation_result_json: automationResultPath,
result_json: resultPath,
},
evidence_collected: ["metrics", "resource_log", "filesystem"],
};
const resultText = `${JSON.stringify(result, null, 2)}\n`;
await writeFile(automationResultPath, resultText, "utf8");
await writeFile(resultPath, resultText, "utf8");
console.log(JSON.stringify(result, null, 2));
exit(status === "pass" ? 0 : 1);
}
await main();
@@ -0,0 +1,134 @@
export function summarizeFakeProviderState(state) {
if (!state) return null;
const recentRequests = Array.isArray(state.recent_requests) ? state.recent_requests : [];
const chatRequests = recentRequests.filter((request) => String(request?.path || "").includes("/chat/completions"));
const successfulRequests = chatRequests.filter((request) => request?.status === "ok");
const faultRequests = chatRequests.filter((request) => (
request?.should_fail === true
|| request?.status === "http_fault"
|| (Number.isFinite(request?.http_status) && request.http_status >= 400)
));
return {
status: state.status || "unknown",
url: state.url || "",
request_count: Number.isFinite(state.request_count) ? state.request_count : recentRequests.length,
recent_request_count: recentRequests.length,
chat_request_count: chatRequests.length,
fault_count: faultRequests.length,
streamed_request_count: chatRequests.filter((request) => request?.stream === true).length,
duration_ms: stats(chatRequests.map((request) => numberOrNull(request?.duration_ms)).filter(Number.isFinite)),
successful_duration_ms: stats(successfulRequests.map((request) => numberOrNull(request?.duration_ms)).filter(Number.isFinite)),
first_chunk_ms: stats(successfulRequests.map((request) => numberOrNull(request?.first_chunk_ms)).filter(Number.isFinite)),
first_content_chunk_ms: stats(successfulRequests.map((request) => numberOrNull(request?.first_content_chunk_ms)).filter(Number.isFinite)),
content_chunk_count: stats(successfulRequests.map((request) => numberOrNull(request?.content_chunk_count)).filter(Number.isFinite)),
config: state.config || {},
};
}
export function buildProviderTimingMetrics(samples, state) {
const recentRequests = Array.isArray(state?.recent_requests) ? state.recent_requests : [];
const byExpectedText = new Map();
for (const request of recentRequests) {
const expected = String(request?.expected_text || "");
if (!expected) continue;
if (!byExpectedText.has(expected)) byExpectedText.set(expected, []);
byExpectedText.get(expected).push(request);
}
const segments = [];
const missingExpectedText = [];
for (const sample of samples) {
const expected = String(sample?.expected_text || "");
if (!expected) continue;
const request = (byExpectedText.get(expected) || []).shift();
if (!request) {
missingExpectedText.push(expected);
continue;
}
const segment = buildTimingSegment(sample, request);
if (segment) segments.push(segment);
}
const values = (key) => segments.map((segment) => numberOrNull(segment[key])).filter(Number.isFinite);
return {
matched_request_count: segments.length,
missing_provider_match_count: missingExpectedText.length,
missing_expected_text: missingExpectedText.slice(0, 20),
send_to_provider_start_ms: stats(values("send_to_provider_start_ms")),
provider_duration_ms: stats(values("provider_duration_ms")),
provider_finish_to_ws_final_ms: stats(values("provider_finish_to_ws_final_ms")),
langbot_overhead_estimate_ms: stats(values("langbot_overhead_estimate_ms")),
e2e_minus_provider_ms: stats(values("e2e_minus_provider_ms")),
provider_first_content_to_ws_first_content_ms: stats(values("provider_first_content_to_ws_first_content_ms")),
segments,
};
}
function buildTimingSegment(sample, request) {
const sentEpochMs = numberOrNull(sample.sent_epoch_ms);
const finishedEpochMs = numberOrNull(sample.finished_epoch_ms);
const providerStartedEpochMs = numberOrNull(request.started_epoch_ms);
const providerFinishedEpochMs = numberOrNull(request.finished_epoch_ms);
const providerFirstContentEpochMs = numberOrNull(request.first_content_chunk_epoch_ms);
const wsFirstContentEpochMs = numberOrNull(sample.first_assistant_content_epoch_ms);
const responseDurationMs = numberOrNull(sample.response_duration_ms);
const providerDurationMs = numberOrNull(request.duration_ms);
const sendToProviderStartMs = finiteDelta(providerStartedEpochMs, sentEpochMs);
const providerFinishToWsFinalMs = finiteDelta(finishedEpochMs, providerFinishedEpochMs);
const e2eMinusProviderMs = Number.isFinite(responseDurationMs) && Number.isFinite(providerDurationMs)
? rounded(responseDurationMs - providerDurationMs)
: null;
const overheadEstimateMs = Number.isFinite(sendToProviderStartMs) && Number.isFinite(providerFinishToWsFinalMs)
? rounded(sendToProviderStartMs + providerFinishToWsFinalMs)
: e2eMinusProviderMs;
return {
sample_index: sample.index,
pipeline_label: sample.pipeline_label || "",
expected_text: sample.expected_text || "",
provider_request_id: request.id || "",
provider_request_number: request.request_number ?? null,
response_duration_ms: responseDurationMs,
provider_duration_ms: providerDurationMs,
send_to_provider_start_ms: sendToProviderStartMs,
provider_finish_to_ws_final_ms: providerFinishToWsFinalMs,
langbot_overhead_estimate_ms: overheadEstimateMs,
e2e_minus_provider_ms: e2eMinusProviderMs,
provider_first_content_to_ws_first_content_ms: finiteDelta(wsFirstContentEpochMs, providerFirstContentEpochMs),
provider_status: request.status || "",
provider_http_status: request.http_status ?? null,
};
}
function finiteDelta(left, right) {
return Number.isFinite(left) && Number.isFinite(right) ? rounded(left - right) : null;
}
export function stats(values) {
if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
return {
min: rounded(Math.min(...values)),
p50: percentile(values, 50),
p95: percentile(values, 95),
p99: percentile(values, 99),
max: rounded(Math.max(...values)),
};
}
export function percentile(values, percentileValue) {
if (values.length === 0) return 0;
const sorted = [...values].sort((a, b) => a - b);
const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
return rounded(sorted[index]);
}
export function rounded(value) {
return Number(value.toFixed(3));
}
function numberOrNull(value) {
const number = Number(value);
return Number.isFinite(number) ? number : null;
}
@@ -0,0 +1,268 @@
# Performance And Reliability Testing
Use this reference when a QA request asks whether LangBot is fast enough,
stable under load, or resilient to controlled faults.
## Scope
Treat `skills/` as the QA control plane:
- Cases define intent, readiness, thresholds, and required evidence.
- Probe scripts collect metrics, traces, resource logs, and artifacts.
- Reports classify the same run as `pass`, `fail`, `blocked`,
`env_issue`, or `flaky`.
Do not turn `skills/` into a load generator or chaos engine. Call a focused
tool from a `mode: probe` case when the test needs one, for example k6,
Locust, pytest-benchmark, Playwright trace collection, Toxiproxy, Docker, or a
Kubernetes disruption tool.
## LangBot Performance Model
For LangBot, performance is the cost LangBot adds around external systems:
```text
LangBot overhead = end-to-end latency - provider latency - external tool latency - network/fault injection latency
```
Measure user experience and internal composition separately:
- WebUI load and interaction latency.
- Debug Chat send-to-first-visible-token and send-to-completion latency.
- Pipeline, RAG, plugin runtime, MCP, AgentRunner, and persistence segment
latency.
- Queue wait time, concurrency, throughput, timeout rate, and p95/p99 latency.
- Startup, plugin install, knowledge-base ingestion, migration, and recovery
time.
Do not report a single message round-trip time as "LangBot performance" unless
the report also explains external provider/tool/network time.
## Evidence Contract
Performance and reliability cases should declare the evidence they need:
- `metrics`: machine-readable latency, throughput, error-rate, or recovery
metrics, usually `metrics.json`.
- `resource_log`: CPU, memory, process, connection, queue, or file descriptor
samples.
- `trace`: browser, HTTP, database, or runtime trace artifacts.
- `profile`: CPU, memory, or flamegraph profile artifacts.
- `backend_log`, `network`, `api_diagnostic`, and `filesystem` as supporting
evidence when relevant.
Automation should write `automation-result.json` with these fields when
available:
```json
{
"status": "pass",
"reason": "Probe passed all thresholds.",
"metrics_summary": {
"langbot_overhead_p95_ms": 12.4,
"error_rate": 0
},
"thresholds_summary": {
"langbot_overhead_p95_ms": { "actual": 12.4, "max": 50, "pass": true }
},
"artifacts": {
"metrics_json": "/path/to/metrics.json"
},
"evidence_collected": ["metrics", "filesystem"]
}
```
Synthetic contract probes are useful for checking the QA harness, but they are
not live product performance results. Label them as contract probes in the case
title, checks, and report.
## Chaos And Reliability Rules
Chaos tests must be narrow and reversible:
- Declare the fault model in `fault_model_json`.
- Record blast radius, target component, injection method, duration, and abort
conditions.
- Capture recovery checks and cleanup steps in the case.
- Classify unavailable dependencies as `env_issue` unless the target behavior
is LangBot's handling of that dependency failure.
- Do not run destructive fault injection against a shared or production-like
instance without explicit operator approval.
Recommended first fault models:
- Provider timeout or HTTP 429 from a fake provider endpoint.
- Plugin runtime disconnect/reconnect in a local instance.
- MCP stdio server exits mid-call.
- RAG parser fixture fails once and recovers on retry.
- Backend API endpoint returns 5xx from a controlled local proxy.
## Starter Live Probes
The starter gate separates QA-harness contracts from live product checks:
- `langbot-overhead-accounting-contract` verifies that reports can carry
overhead accounting metrics. It uses deterministic synthetic samples and is
not live product performance.
- `langbot-fault-taxonomy-contract` verifies that fault scenarios declare
expected status, recovery, and cleanup before destructive chaos tests are
added.
- `langbot-live-backend-latency` checks the unauthenticated `/healthz`
endpoint for basic backend responsiveness.
- `langbot-live-control-plane-api` checks `/healthz` and
`/api/v1/system/info` for HTTP 200, JSON `code: 0`, response shape, and
per-endpoint p95 latency.
- `langbot-live-backend-log-health` scans the recent backend log window for
fail-severity runtime findings. It is the reliability guard that should fail
the gate when HTTP probes pass but backend logs contain Traceback, ImportError,
ERROR, unclosed sessions, or unawaited coroutine signals.
Do not treat these starter live probes as Debug Chat or model-provider
performance. They are control-plane readiness checks; user-facing performance
needs browser/WebSocket/message-path measurements.
## Debug Chat Load And Fake Provider Baseline
Use `langbot-fake-provider-debug-chat-load` before real-provider load checks.
The setup automation starts a local OpenAI-compatible fake provider, registers
it as a normal LangBot provider/model, configures a local-agent pipeline, resets
Debug Chat, and then drives concurrent WebSocket messages through the live
backend.
This is not a mocked backend test. It still exercises:
- provider/model persistence and runtime reload;
- LiteLLM OpenAI-compatible requester path;
- local-agent runner selection and pipeline execution;
- Debug Chat WebSocket adapter and broadcast behavior;
- backend concurrency, timeout, and error-rate accounting.
The fake provider is deterministic and can inject controlled latency or faults
with `LANGBOT_FAKE_PROVIDER_*` variables, so it is the baseline for LangBot
message-path overhead. The probe uses unique expected response tokens per
request because Debug Chat broadcasts messages to every connection in the same
session; unique tokens prevent one connection from counting another
connection's response as its own.
When the fake provider is used, reports also include provider-side timing in
`metrics.json`:
- `fake_provider.duration_ms` and `fake_provider.first_content_chunk_ms`
measure the controlled provider itself.
- `provider_timing.send_to_provider_start_ms` estimates WebSocket ingress,
pipeline dispatch, runner setup, and requester time before the provider
receives the request.
- `provider_timing.provider_finish_to_ws_final_ms` estimates the path from
provider completion back to the final Debug Chat WebSocket response.
- `provider_timing.langbot_overhead_estimate_ms` is the sum of those two
LangBot-side segments when wall-clock timestamps can be matched by the
unique expected response token.
After the baseline passes, run `langbot-fake-provider-debug-chat-slow-load` to
keep the same live backend path while injecting deterministic streaming latency.
Run `langbot-fake-provider-debug-chat-cross-pipeline-isolation` to open
concurrent Debug Chat connections against two fake-provider pipelines and fail
if one pipeline receives the other pipeline's response token. This targets
global pipeline-state regressions in the WebSocket Debug Chat path.
Run `langbot-fake-provider-debug-chat-fault-recovery` to inject bounded HTTP
provider failures and require both observed failures and later successful
requests. The fault-recovery case is deliberately sequential because failed
Debug Chat responses do not carry a unique success token that can be attributed
to one concurrent connection.
Use `langbot-space-debug-chat-concurrency-smoke` after the fake-provider
baseline. It runs a deliberately small real Space-provider batch and reports
user-visible latency, not pure LangBot overhead. Space/model/network failures
are dependency findings until the fake baseline shows the same symptom.
If a Space smoke passes but log guard finds telemetry posting Tracebacks,
classify that separately as `telemetry-proxy-noise` instead of clearing the
proxy or treating the Debug Chat path as failed.
Useful commands:
```bash
rtk bin/lbs test run langbot-fake-provider-debug-chat-load --run-id langbot-fake-load-local
rtk bin/lbs test run langbot-fake-provider-debug-chat-slow-load --run-id langbot-fake-slow-local
rtk bin/lbs test run langbot-fake-provider-debug-chat-cross-pipeline-isolation --run-id langbot-fake-cross-pipeline-local
rtk bin/lbs test run langbot-fake-provider-debug-chat-fault-recovery --run-id langbot-fake-fault-local
rtk bin/lbs test run langbot-space-debug-chat-concurrency-smoke --run-id langbot-space-smoke-local
rtk bin/lbs suite run langbot-debug-chat-load-gate --run-id langbot-debug-chat-load-local --include-manual-check
```
## Gate Layers
Use the smallest gate that answers the quality question:
- `langbot-performance-contract-gate`: fast synthetic checks for report shape,
threshold accounting, and fault taxonomy. Good for PR feedback when no live
service is running.
- `langbot-live-backend-gate`: live backend `/healthz`,
`/api/v1/system/info`, and backend log health. Good after starting a local
LangBot backend.
- `langbot-user-path-performance-gate`: browser-visible user path performance,
starting with Pipeline Debug Chat send-to-visible-completion latency. Run it
only when the browser profile and target pipeline are ready.
- `langbot-debug-chat-load-gate`: WebSocket Debug Chat load checks, starting
with controlled fake-provider baseline, slow-provider, cross-pipeline
isolation, and fault-recovery profiles, plus an optional low-volume real
Space-provider smoke.
- `langbot-performance-reliability-gate`: combined starter gate for synthetic
contracts plus live backend checks.
Keep environment diagnostics separate from product regressions. For example, a
SOCKS proxy without Python `socksio` support should be fixed or clearly
classified by `bin/lbs env doctor`; do not hide the resulting backend
Traceback in reports.
## Debug Chat Performance
`pipeline-debug-chat-performance` reuses the browser Debug Chat automation and
adds `metrics.json`, `metrics_summary`, and `thresholds_summary` to
`automation-result.json`.
Current metric:
```text
response_duration_ms = prompt send -> expected assistant response visible and stable
```
This is a user-path metric, not pure LangBot overhead. If it regresses, inspect
provider latency, model route health, plugin/runtime logs, WebSocket behavior,
and browser console/network evidence before attributing the whole duration to
LangBot.
### User-Path Gate Runbook
1. Start the backend and frontend. The frontend must be launched with
`VITE_API_BASE_URL="$LANGBOT_BACKEND_URL"` so browser API calls reach the
backend.
2. Run `node scripts/e2e/ensure-local-agent-pipeline.mjs --write-env`. The
setup refreshes the local QA login, skips the wizard, prepares a Debug Chat
pipeline, scans Space models, tests candidates, writes tested fallback
models, and writes the selected pipeline/model env values to
`skills/.env.local`.
3. If setup returns `env_issue`, read `model_tests` and provider errors first.
A missing Space key, failed Space scan, or unavailable model route is not a
LangBot performance regression.
4. Run
`bin/lbs suite run langbot-user-path-performance-gate --include-manual-check`.
5. Interpret `response_p95_ms` as browser-visible send-to-completion time. It
includes provider latency; use backend logs and model test evidence to
separate LangBot overhead from the external model route.
The setup keeps a `max-round` value in the generated pipeline config only
because the current backend truncator still reads that field directly. Do not
use it as a quality requirement for future local-agent behavior.
## Running The First Gate
Start with the reusable suite:
```bash
rtk bin/lbs suite plan langbot-performance-reliability-gate
rtk bin/lbs suite start langbot-performance-reliability-gate --run-id langbot-perf-rel-local
```
Run synthetic contract probes first. Run live probes only after the selected
backend/frontend instance is reachable and the run owner accepts any fault
scope.
@@ -0,0 +1,16 @@
id: langbot-debug-chat-load-gate
title: "LangBot Debug Chat load gate"
description: "Message-path load checks for Pipeline Debug Chat: controlled fake-provider baseline, slow-provider and fault-recovery profiles, plus optional real Space-provider smoke."
type: performance
priority: p1
tags:
- performance
- debug-chat
- websocket
- load
cases:
- langbot-fake-provider-debug-chat-load
- langbot-fake-provider-debug-chat-slow-load
- langbot-fake-provider-debug-chat-cross-pipeline-isolation
- langbot-fake-provider-debug-chat-fault-recovery
- langbot-space-debug-chat-concurrency-smoke
@@ -0,0 +1,14 @@
id: langbot-live-backend-gate
title: "LangBot live backend reliability gate"
description: "Live backend control-plane responsiveness and runtime log health checks for a locally running LangBot instance."
type: reliability
priority: p1
tags:
- performance
- reliability
- live-backend
- metrics
cases:
- langbot-live-backend-latency
- langbot-live-control-plane-api
- langbot-live-backend-log-health
@@ -0,0 +1,13 @@
id: langbot-performance-contract-gate
title: "LangBot performance contract gate"
description: "Fast synthetic contract checks for performance metric accounting and non-destructive reliability fault taxonomy."
type: contract
priority: p1
tags:
- performance
- reliability
- contract
- metrics
cases:
- langbot-overhead-accounting-contract
- langbot-fault-taxonomy-contract
@@ -0,0 +1,16 @@
id: langbot-performance-reliability-gate
title: "LangBot performance and reliability starter gate"
description: "Starter gate for LangBot performance accounting, live backend control-plane latency, and non-destructive fault taxonomy checks."
type: reliability
priority: p1
tags:
- performance
- reliability
- metrics
- chaos
cases:
- langbot-overhead-accounting-contract
- langbot-fault-taxonomy-contract
- langbot-live-backend-latency
- langbot-live-control-plane-api
- langbot-live-backend-log-health
@@ -0,0 +1,12 @@
id: langbot-user-path-performance-gate
title: "LangBot user-path performance gate"
description: "Browser-visible performance checks for user-facing LangBot paths such as Pipeline Debug Chat."
type: performance
priority: p1
tags:
- performance
- browser
- debug-chat
- user-path
cases:
- pipeline-debug-chat-performance
@@ -0,0 +1,23 @@
id: telemetry-proxy-noise
title: "Telemetry posting fails through the proxy while the target flow succeeds"
date: 2026-06-25
category: env_issue
symptoms:
- "The target Debug Chat or provider smoke request completes successfully."
- "The same log window contains a Traceback for telemetry posting."
- "The traceback references the Space telemetry endpoint."
patterns:
- "Failed to post telemetry"
- "https://space.langbot.app/api/v1/telemetry"
- "httpx.ConnectError"
likely_causes:
- "The backend process inherited proxy settings that are required for model/provider access but unreliable for telemetry posting."
- "The telemetry endpoint is temporarily unreachable through the local proxy route."
- "TLS or proxy negotiation failed for the non-critical telemetry request."
fix_steps:
- "Keep the proxy configuration needed for model/provider access; do not clear it only to hide telemetry noise."
- "Check that uppercase and lowercase proxy variables are consistent before rerunning a live Space smoke."
- "Classify the target flow and log-health result separately: a successful Debug Chat run can still have an environment log-health finding."
verification: "A rerun shows the target case success patterns and no telemetry Traceback in the scanned log window, or the report explicitly records the telemetry issue as environment noise."
related_cases:
- langbot-space-debug-chat-concurrency-smoke
+35
View File
@@ -1,5 +1,7 @@
import { existsSync } from "node:fs"; import { existsSync } from "node:fs";
import { spawnSync } from "node:child_process";
import { Socket } from "node:net"; import { Socket } from "node:net";
import { join } from "node:path";
import type { CommandContext } from "../types.ts"; import type { CommandContext } from "../types.ts";
import { parseOptions } from "../cli.ts"; import { parseOptions } from "../cli.ts";
import { loadEnv } from "../fs.ts"; import { loadEnv } from "../fs.ts";
@@ -88,6 +90,37 @@ function compareProxyPair(env: Record<string, string>, upper: string, lower: str
return null; return null;
} }
function envValue(env: Record<string, string>, key: string): string {
return process.env[key] ?? env[key] ?? "";
}
function activeSocksProxy(env: Record<string, string>): { key: string; value: string } | null {
for (const key of ["ALL_PROXY", "all_proxy", "HTTPS_PROXY", "https_proxy", "HTTP_PROXY", "http_proxy"]) {
const value = envValue(env, key);
if (/^socks/i.test(value)) return { key, value };
}
return null;
}
function checkSocksio(env: Record<string, string>): string | null {
const proxy = activeSocksProxy(env);
if (!proxy) return null;
const repo = env.LANGBOT_REPO;
const python = repo ? join(repo, ".venv", "bin", "python") : "";
if (!python || !existsSync(python)) {
return `SOCKS proxy ${proxy.key} is configured (${redactEnvValue(proxy.key, proxy.value)}), but LangBot venv python was not found; after creating the venv, verify it can import socksio.`;
}
const result = spawnSync(python, ["-c", "import socksio"], {
encoding: "utf8",
timeout: 5000,
});
if (result.status === 0) return null;
return `SOCKS proxy ${proxy.key} is configured (${redactEnvValue(proxy.key, proxy.value)}), but ${python} cannot import socksio; run \`${python} -m pip install socksio\` or start LangBot without SOCKS proxy env.`;
}
export async function commandEnvDoctor(ctx: CommandContext): Promise<number> { export async function commandEnvDoctor(ctx: CommandContext): Promise<number> {
const env = loadEnv(ctx.root); const env = loadEnv(ctx.root);
const failures: string[] = []; const failures: string[] = [];
@@ -117,6 +150,8 @@ export async function commandEnvDoctor(ctx: CommandContext): Promise<number> {
]) { ]) {
if (mismatch) failures.push(mismatch); if (mismatch) failures.push(mismatch);
} }
const socksioFailure = checkSocksio(env);
if (socksioFailure) failures.push(socksioFailure);
for (const [label, result] of await Promise.all([ for (const [label, result] of await Promise.all([
checkUrl("LANGBOT_BACKEND_URL", env.LANGBOT_BACKEND_URL).then((result) => ["LANGBOT_BACKEND_URL", result] as const), checkUrl("LANGBOT_BACKEND_URL", env.LANGBOT_BACKEND_URL).then((result) => ["LANGBOT_BACKEND_URL", result] as const),
+44 -3
View File
@@ -465,6 +465,41 @@ function outputTail(value: string | Buffer | null | undefined): string {
return String(value ?? "").trim().slice(-4000); return String(value ?? "").trim().slice(-4000);
} }
function exitStatusFromResultStatus(status: string): number {
if (status === "pass") return 0;
if (status === "blocked" || status === "env_issue" || status === "flaky") return 2;
return 1;
}
function executionStatusFromExitStatus(status: number): string {
if (status === 0) return "ok";
if (status === 2) return "classified";
return "nonzero";
}
function executionFromCaseResultFile(caseItem: Record<string, unknown>): Record<string, unknown> | null {
const resultPath = join(String(caseItem.evidence_dir), "result.json");
if (!existsSync(resultPath)) return null;
try {
const parsed = JSON.parse(readFileSync(resultPath, "utf8")) as Record<string, unknown>;
if (
parsed.case_id !== caseItem.id ||
parsed.run_id !== caseItem.run_id ||
typeof parsed.status !== "string"
) return null;
const exitStatus = exitStatusFromResultStatus(parsed.status);
return {
status: executionStatusFromExitStatus(exitStatus),
exit_status: exitStatus,
reason: typeof parsed.reason === "string" ? parsed.reason : "result.json completed",
result_status: parsed.status,
result_json: resultPath,
};
} catch {
return null;
}
}
function executionProblemStatus(executions: Array<Record<string, unknown>>): string { function executionProblemStatus(executions: Array<Record<string, unknown>>): string {
const statuses = executions.map((item) => String(item.status)); const statuses = executions.map((item) => String(item.status));
if (statuses.includes("nonzero")) return "fail"; if (statuses.includes("nonzero")) return "fail";
@@ -523,12 +558,18 @@ export function commandSuiteRun(ctx: CommandContext): number {
encoding: "utf8", encoding: "utf8",
stdio: options.json === true ? "pipe" : "inherit", stdio: options.json === true ? "pipe" : "inherit",
}); });
const status = result.error ? 1 : result.status ?? 1; const fileExecution = result.error ? executionFromCaseResultFile(caseItem) : null;
const status = typeof fileExecution?.exit_status === "number"
? fileExecution.exit_status
: result.error ? 1 : result.status ?? 1;
executions.push({ executions.push({
id: caseItem.id, id: caseItem.id,
status: status === 0 ? "ok" : "nonzero", status: fileExecution?.status ?? executionStatusFromExitStatus(status),
exit_status: status, exit_status: status,
reason: result.error?.message || "", reason: fileExecution?.reason ?? result.error?.message ?? "",
result_status: fileExecution?.result_status,
result_json: fileExecution?.result_json,
spawn_error: fileExecution && result.error ? result.error.message : undefined,
stdout: outputTail(result.stdout), stdout: outputTail(result.stdout),
stderr: outputTail(result.stderr), stderr: outputTail(result.stderr),
}); });
+95 -14
View File
@@ -271,7 +271,7 @@ function reportTemplate(mode: string): Record<string, string> {
target_tested: "Probe target, endpoint, file, command, or service actually checked", target_tested: "Probe target, endpoint, file, command, or service actually checked",
execution_path: "automation script | shell command | direct API | other", execution_path: "automation script | shell command | direct API | other",
probe_result: "What the probe observed", probe_result: "What the probe observed",
logs_or_artifacts: "Log, filesystem, API, or other artifact paths collected", metrics_or_artifacts: "Metrics, logs, filesystem artifacts, traces, or profiles collected",
diagnostics: "Extra diagnostics used, if any", diagnostics: "Extra diagnostics used, if any",
matched_troubleshooting: "Troubleshooting ids matched, if any", matched_troubleshooting: "Troubleshooting ids matched, if any",
assets_to_update: "New case/reference/troubleshooting entries to add", assets_to_update: "New case/reference/troubleshooting entries to add",
@@ -320,7 +320,7 @@ function manualEvidenceTemplate(mode: string): ManualEvidenceTemplate {
target_tested: "TODO: probe target, endpoint, file, command, or service actually checked", target_tested: "TODO: probe target, endpoint, file, command, or service actually checked",
execution_path: "TODO: automation script | shell command | direct API | other", execution_path: "TODO: automation script | shell command | direct API | other",
probe_result: "TODO: observed probe result", probe_result: "TODO: observed probe result",
logs_or_artifacts: "TODO: evidence paths or skipped reason", metrics_or_artifacts: "TODO: metrics, logs, filesystem artifacts, traces, or profiles collected",
diagnostics: "TODO: additional diagnostics used, if any", diagnostics: "TODO: additional diagnostics used, if any",
matched_troubleshooting: "TODO: troubleshooting ids matched, if any", matched_troubleshooting: "TODO: troubleshooting ids matched, if any",
assets_to_update: "TODO: case/reference/troubleshooting updates to make", assets_to_update: "TODO: case/reference/troubleshooting updates to make",
@@ -1099,6 +1099,41 @@ function executionTail(value: string | Buffer | null | undefined): string {
return String(value ?? "").trim().slice(-4000); return String(value ?? "").trim().slice(-4000);
} }
function exitStatusFromResultStatus(status: string): number {
if (status === "pass") return 0;
if (status === "blocked" || status === "env_issue" || status === "flaky") return 2;
return 1;
}
function executionStatusFromExitStatus(status: number): string {
if (status === 0) return "ok";
if (status === 2) return "classified";
return "nonzero";
}
function executionFromAutomationResultFile(
evidenceDir: string,
caseId: string,
runId: string,
): { status: string; exit_status: number; reason: string; result_status: string; path: string } | null {
const resultPath = join(evidenceDir, "automation-result.json");
if (!existsSync(resultPath)) return null;
try {
const parsed = JSON.parse(readFileSync(resultPath, "utf8")) as Record<string, unknown>;
if (parsed.case_id !== caseId || parsed.run_id !== runId || typeof parsed.status !== "string") return null;
const exitStatus = exitStatusFromResultStatus(parsed.status);
return {
status: executionStatusFromExitStatus(exitStatus),
exit_status: exitStatus,
reason: typeof parsed.reason === "string" ? parsed.reason : "automation-result.json completed",
result_status: parsed.status,
path: resultPath,
};
} catch {
return null;
}
}
function runSetupAutomation( function runSetupAutomation(
ctx: CommandContext, ctx: CommandContext,
item: StructuredItem, item: StructuredItem,
@@ -1224,6 +1259,30 @@ export function commandTestRun(ctx: CommandContext): number {
}); });
if (result.error) { if (result.error) {
const fileExecution = executionFromAutomationResultFile(
run.automation.evidence_dir,
String(run.case.id),
run.run_id,
);
if (fileExecution) {
if (options.json !== true) {
console.error(`WARN: automation spawn reported an error, but ${fileExecution.path} completed: ${result.error.message}`);
}
if (options.json === true) {
console.log(JSON.stringify({
run,
setup_executions: setupExecutions,
automation_execution: {
...fileExecution,
spawn_error: result.error.message,
stdout: executionTail(result.stdout),
stderr: executionTail(result.stderr),
},
exit_status: fileExecution.exit_status,
}, null, 2));
}
return fileExecution.exit_status;
}
if (options.json !== true) console.error(`ERROR: failed to run automation: ${result.error.message}`); if (options.json !== true) console.error(`ERROR: failed to run automation: ${result.error.message}`);
if (options.json === true) { if (options.json === true) {
console.log(JSON.stringify({ console.log(JSON.stringify({
@@ -1247,7 +1306,7 @@ export function commandTestRun(ctx: CommandContext): number {
run, run,
setup_executions: setupExecutions, setup_executions: setupExecutions,
automation_execution: { automation_execution: {
status: status === 0 ? "ok" : "nonzero", status: executionStatusFromExitStatus(status),
exit_status: status, exit_status: status,
stdout: executionTail(result.stdout), stdout: executionTail(result.stdout),
stderr: executionTail(result.stderr), stderr: executionTail(result.stderr),
@@ -1311,6 +1370,7 @@ function renderMarkdownReport(report: TestReport): string {
const environment = report.environment; const environment = report.environment;
const logGuard = report.log_guard; const logGuard = report.log_guard;
const troubleshooting = report.troubleshooting; const troubleshooting = report.troubleshooting;
const automation = report.automation_result;
const lines: string[] = []; const lines: string[] = [];
lines.push(`# Test Report: ${reportCase.id}`); lines.push(`# Test Report: ${reportCase.id}`);
@@ -1323,20 +1383,41 @@ function renderMarkdownReport(report: TestReport): string {
lines.push(`Type: ${reportCase.type}`); lines.push(`Type: ${reportCase.type}`);
lines.push(""); lines.push("");
lines.push("## Result"); lines.push("## Result");
lines.push(`- result: ${evidence.result}`); if (automation.status === "loaded" && automation.result) {
for (const [key, value] of Object.entries(evidence)) { lines.push(`- result: ${automation.result}`);
if (key !== "result") lines.push(`- ${key}: ${value}`); if (automation.reason) lines.push(`- reason: ${automation.reason}`);
if (automation.url) lines.push(`- target_tested: ${automation.url}`);
if (automation.path) lines.push(`- automation_result: ${automation.path}`);
if (automation.artifacts) lines.push(`- artifacts: ${JSON.stringify(automation.artifacts)}`);
} else {
lines.push(`- result: ${evidence.result}`);
for (const [key, value] of Object.entries(evidence)) {
if (key !== "result") lines.push(`- ${key}: ${value}`);
}
} }
lines.push(""); lines.push("");
lines.push("## Automation Result"); lines.push("## Automation Result");
lines.push(`- status: ${report.automation_result.status}`); lines.push(`- status: ${automation.status}`);
if (report.automation_result.path) lines.push(`- path: ${report.automation_result.path}`); if (automation.path) lines.push(`- path: ${automation.path}`);
if (report.automation_result.result) lines.push(`- result: ${report.automation_result.result}`); if (automation.result) lines.push(`- result: ${automation.result}`);
if (report.automation_result.reason) lines.push(`- reason: ${report.automation_result.reason}`); if (automation.reason) lines.push(`- reason: ${automation.reason}`);
if (report.automation_result.started_at_local) lines.push(`- started_at_local: ${report.automation_result.started_at_local}`); if (automation.duration_ms !== undefined) lines.push(`- duration_ms: ${automation.duration_ms}`);
if (report.automation_result.finished_at_local) lines.push(`- finished_at_local: ${report.automation_result.finished_at_local}`); if (automation.started_at_local) lines.push(`- started_at_local: ${automation.started_at_local}`);
if (report.automation_result.url) lines.push(`- url: ${report.automation_result.url}`); if (automation.finished_at_local) lines.push(`- finished_at_local: ${automation.finished_at_local}`);
if (report.automation_result.expected_text) lines.push(`- expected_text: ${report.automation_result.expected_text}`); if (automation.url) lines.push(`- url: ${automation.url}`);
if (automation.expected_text) lines.push(`- expected_text: ${automation.expected_text}`);
if (automation.metrics_summary) {
lines.push("- metrics_summary:");
lines.push(` ${JSON.stringify(automation.metrics_summary)}`);
}
if (automation.thresholds_summary) {
lines.push("- thresholds_summary:");
lines.push(` ${JSON.stringify(automation.thresholds_summary)}`);
}
if (automation.artifacts) {
lines.push("- artifacts:");
lines.push(` ${JSON.stringify(automation.artifacts)}`);
}
lines.push(""); lines.push("");
lines.push("## Environment"); lines.push("## Environment");
for (const [key, value] of Object.entries(environment)) lines.push(`- ${key}=${value}`); for (const [key, value] of Object.entries(environment)) lines.push(`- ${key}=${value}`);
+55
View File
@@ -126,6 +126,9 @@ function validateCaseItem(root: string, item: StructuredItem, skillNames: Set<st
...validateEnvKeyScalar(item, "automation_pipeline_url_env"), ...validateEnvKeyScalar(item, "automation_pipeline_url_env"),
...validateEnvKeyScalar(item, "automation_pipeline_name_env"), ...validateEnvKeyScalar(item, "automation_pipeline_name_env"),
...validateJsonScalar(item, "automation_filesystem_checks_json"), ...validateJsonScalar(item, "automation_filesystem_checks_json"),
...validateJsonScalar(item, "metrics_thresholds_json"),
...validateJsonScalar(item, "load_profile_json"),
...validateJsonScalar(item, "fault_model_json"),
...listValue(item.fields, "setup_automation").flatMap((entry) => ( ...listValue(item.fields, "setup_automation").flatMap((entry) => (
validateSetupAutomationEntry(root, entry, caseIds).map((error) => `${item.path}: ${error}`) validateSetupAutomationEntry(root, entry, caseIds).map((error) => `${item.path}: ${error}`)
)), )),
@@ -183,10 +186,62 @@ function validateCaseItem(root: string, item: StructuredItem, skillNames: Set<st
if (timeout && (!/^\d+$/.test(timeout) || Number.parseInt(timeout, 10) <= 0)) { if (timeout && (!/^\d+$/.test(timeout) || Number.parseInt(timeout, 10) <= 0)) {
errors.push(`${item.path}: 'automation_response_timeout_ms' must be a positive integer string`); errors.push(`${item.path}: 'automation_response_timeout_ms' must be a positive integer string`);
} }
for (const key of [
"automation_debug_chat_load_requests",
"automation_debug_chat_load_concurrency",
"automation_debug_chat_load_timeout_ms",
"automation_debug_chat_load_response_p95_ms",
"automation_debug_chat_load_first_response_p95_ms",
]) {
const value = scalar(item.fields, key);
if (value && (!/^\d+$/.test(value) || Number.parseInt(value, 10) <= 0)) {
errors.push(`${item.path}: '${key}' must be a positive integer string`);
}
}
for (const key of [
"automation_debug_chat_load_min_error_count",
"automation_debug_chat_load_min_ok_count",
"automation_debug_chat_load_min_provider_fault_count",
"automation_fake_provider_first_token_delay_ms",
"automation_fake_provider_chunk_delay_ms",
"automation_fake_provider_chunk_count",
"automation_fake_provider_fail_first_n",
"automation_fake_provider_fail_every_n",
]) {
const value = scalar(item.fields, key);
if (value && (!/^\d+$/.test(value) || Number.parseInt(value, 10) < 0)) {
errors.push(`${item.path}: '${key}' must be a non-negative integer string`);
}
}
for (const key of ["automation_debug_chat_load_max_error_rate", "automation_debug_chat_load_min_error_rate"]) {
const value = scalar(item.fields, key);
if (value && (!/^(?:0(?:\.\d+)?|1(?:\.0+)?)$/.test(value))) {
errors.push(`${item.path}: '${key}' must be a number string between 0 and 1`);
}
}
const fakeProviderFaultStatus = scalar(item.fields, "automation_fake_provider_fault_status");
if (fakeProviderFaultStatus) {
const parsed = Number.parseInt(fakeProviderFaultStatus, 10);
if (!/^\d+$/.test(fakeProviderFaultStatus) || parsed < 400 || parsed > 599) {
errors.push(`${item.path}: 'automation_fake_provider_fault_status' must be an HTTP 4xx or 5xx status string`);
}
}
const streamOutput = scalar(item.fields, "automation_stream_output"); const streamOutput = scalar(item.fields, "automation_stream_output");
if (streamOutput && !["0", "1", "false", "true"].includes(streamOutput)) { if (streamOutput && !["0", "1", "false", "true"].includes(streamOutput)) {
errors.push(`${item.path}: 'automation_stream_output' must be one of 0, 1, false, or true`); errors.push(`${item.path}: 'automation_stream_output' must be one of 0, 1, false, or true`);
} }
for (const key of [
"automation_debug_chat_load_stream",
"automation_debug_chat_load_reset",
"automation_debug_chat_load_fail_on_final_mismatch",
"automation_fake_provider_fail_after_first_chunk",
"automation_fake_provider_dynamic_response",
]) {
const value = scalar(item.fields, key);
if (value && !["0", "1", "false", "true"].includes(value)) {
errors.push(`${item.path}: '${key}' must be one of 0, 1, false, or true`);
}
}
const imageBase64Fixture = scalar(item.fields, "automation_image_base64_fixture"); const imageBase64Fixture = scalar(item.fields, "automation_image_base64_fixture");
if (imageBase64Fixture && !existsSync(join(root, imageBase64Fixture))) { if (imageBase64Fixture && !existsSync(join(root, imageBase64Fixture))) {
errors.push(`${item.path}: automation image fixture does not exist: ${imageBase64Fixture}`); errors.push(`${item.path}: automation image fixture does not exist: ${imageBase64Fixture}`);
+27 -2
View File
@@ -9,7 +9,18 @@ export const requiredEnvKeys = [
]; ];
export const caseModeValues = ["agent-browser", "probe"]; export const caseModeValues = ["agent-browser", "probe"];
export const caseTypeValues = ["smoke", "regression", "feature", "provider", "exploratory"]; export const caseTypeValues = [
"smoke",
"regression",
"feature",
"provider",
"exploratory",
"contract",
"performance",
"reliability",
"chaos",
"security",
];
export const casePriorityValues = ["p0", "p1", "p2"]; export const casePriorityValues = ["p0", "p1", "p2"];
export const caseRiskValues = ["low", "medium", "high"]; export const caseRiskValues = ["low", "medium", "high"];
export const caseEvidenceValues = [ export const caseEvidenceValues = [
@@ -21,10 +32,24 @@ export const caseEvidenceValues = [
"frontend_log", "frontend_log",
"api_diagnostic", "api_diagnostic",
"filesystem", "filesystem",
"metrics",
"trace",
"profile",
"resource_log",
]; ];
export const testResultStatusValues = ["pass", "fail", "blocked", "env_issue", "flaky"]; export const testResultStatusValues = ["pass", "fail", "blocked", "env_issue", "flaky"];
export const troubleshootingCategoryValues = ["product", "env_issue", "external_dependency", "blocked", "flaky"]; export const troubleshootingCategoryValues = ["product", "env_issue", "external_dependency", "blocked", "flaky"];
export const suiteTypeValues = ["smoke", "regression", "release_gate", "exploratory"]; export const suiteTypeValues = [
"smoke",
"regression",
"release_gate",
"exploratory",
"contract",
"performance",
"reliability",
"chaos",
"security",
];
export const suiteRequiredStrings = ["id", "title", "description", "type", "priority"]; export const suiteRequiredStrings = ["id", "title", "description", "type", "priority"];
export const suiteRequiredLists = ["tags", "cases"]; export const suiteRequiredLists = ["tags", "cases"];
+20
View File
@@ -91,6 +91,7 @@ export type AutomationResultEvidence = {
path?: string; path?: string;
result?: string; result?: string;
reason?: string; reason?: string;
duration_ms?: number;
started_at?: string; started_at?: string;
started_at_local?: string; started_at_local?: string;
finished_at?: string; finished_at?: string;
@@ -98,6 +99,9 @@ export type AutomationResultEvidence = {
url?: string; url?: string;
prompt?: string; prompt?: string;
expected_text?: string; expected_text?: string;
metrics_summary?: Record<string, unknown>;
thresholds_summary?: Record<string, unknown>;
artifacts?: Record<string, unknown>;
}; };
type MutableScanState = { type MutableScanState = {
@@ -594,6 +598,18 @@ function stringField(data: Record<string, unknown>, key: string): string | undef
return typeof value === "string" && value.trim() ? value : undefined; return typeof value === "string" && value.trim() ? value : undefined;
} }
function numberField(data: Record<string, unknown>, key: string): number | undefined {
const value = data[key];
return typeof value === "number" && Number.isFinite(value) ? value : undefined;
}
function objectField(data: Record<string, unknown>, key: string): Record<string, unknown> | undefined {
const value = data[key];
return value && typeof value === "object" && !Array.isArray(value)
? value as Record<string, unknown>
: undefined;
}
function evidenceDirFromOptions(options: Record<string, string | boolean>): string | undefined { function evidenceDirFromOptions(options: Record<string, string | boolean>): string | undefined {
const explicit = typeof options["evidence-dir"] === "string" ? options["evidence-dir"] : undefined; const explicit = typeof options["evidence-dir"] === "string" ? options["evidence-dir"] : undefined;
if (explicit) return resolve(explicit); if (explicit) return resolve(explicit);
@@ -628,6 +644,7 @@ export function readAutomationResultEvidence(options: Record<string, string | bo
path: resultPath, path: resultPath,
result: stringField(result, "status"), result: stringField(result, "status"),
reason: stringField(result, "reason"), reason: stringField(result, "reason"),
duration_ms: numberField(result, "duration_ms"),
started_at: stringField(result, "started_at"), started_at: stringField(result, "started_at"),
started_at_local: stringField(result, "started_at_local"), started_at_local: stringField(result, "started_at_local"),
finished_at: stringField(result, "finished_at"), finished_at: stringField(result, "finished_at"),
@@ -635,6 +652,9 @@ export function readAutomationResultEvidence(options: Record<string, string | bo
url: stringField(result, "url"), url: stringField(result, "url"),
prompt: redactSecrets(stringField(result, "prompt") ?? ""), prompt: redactSecrets(stringField(result, "prompt") ?? ""),
expected_text: stringField(result, "expected_text"), expected_text: stringField(result, "expected_text"),
metrics_summary: objectField(result, "metrics_summary"),
thresholds_summary: objectField(result, "thresholds_summary"),
artifacts: objectField(result, "artifacts"),
}; };
} catch (error) { } catch (error) {
return { status: "invalid", path: resultPath, reason: String(error) }; return { status: "invalid", path: resultPath, reason: String(error) };
+26
View File
@@ -114,6 +114,32 @@ export function automationEnvDefaults(item: StructuredItem, env: EnvSource = pro
["automation_expected_runner_id", "LANGBOT_E2E_EXPECTED_RUNNER_ID"], ["automation_expected_runner_id", "LANGBOT_E2E_EXPECTED_RUNNER_ID"],
["automation_reset_debug_chat", "LANGBOT_E2E_RESET_DEBUG_CHAT"], ["automation_reset_debug_chat", "LANGBOT_E2E_RESET_DEBUG_CHAT"],
["automation_debug_chat_session_type", "LANGBOT_E2E_DEBUG_CHAT_SESSION_TYPE"], ["automation_debug_chat_session_type", "LANGBOT_E2E_DEBUG_CHAT_SESSION_TYPE"],
["automation_debug_chat_response_p95_ms", "LANGBOT_E2E_DEBUG_CHAT_RESPONSE_P95_MS"],
["automation_debug_chat_max_error_rate", "LANGBOT_E2E_DEBUG_CHAT_MAX_ERROR_RATE"],
["automation_debug_chat_load_requests", "LANGBOT_DEBUG_CHAT_LOAD_REQUESTS"],
["automation_debug_chat_load_concurrency", "LANGBOT_DEBUG_CHAT_LOAD_CONCURRENCY"],
["automation_debug_chat_load_timeout_ms", "LANGBOT_DEBUG_CHAT_LOAD_TIMEOUT_MS"],
["automation_debug_chat_load_response_p95_ms", "LANGBOT_DEBUG_CHAT_LOAD_RESPONSE_P95_MS"],
["automation_debug_chat_load_first_response_p95_ms", "LANGBOT_DEBUG_CHAT_LOAD_FIRST_RESPONSE_P95_MS"],
["automation_debug_chat_load_max_error_rate", "LANGBOT_DEBUG_CHAT_LOAD_MAX_ERROR_RATE"],
["automation_debug_chat_load_min_error_rate", "LANGBOT_DEBUG_CHAT_LOAD_MIN_ERROR_RATE"],
["automation_debug_chat_load_min_error_count", "LANGBOT_DEBUG_CHAT_LOAD_MIN_ERROR_COUNT"],
["automation_debug_chat_load_min_ok_count", "LANGBOT_DEBUG_CHAT_LOAD_MIN_OK_COUNT"],
["automation_debug_chat_load_min_provider_fault_count", "LANGBOT_DEBUG_CHAT_LOAD_MIN_PROVIDER_FAULT_COUNT"],
["automation_debug_chat_load_expected_prefix", "LANGBOT_DEBUG_CHAT_LOAD_EXPECTED_PREFIX"],
["automation_debug_chat_load_prompt_template", "LANGBOT_DEBUG_CHAT_LOAD_PROMPT_TEMPLATE"],
["automation_debug_chat_load_stream", "LANGBOT_DEBUG_CHAT_LOAD_STREAM"],
["automation_debug_chat_load_reset", "LANGBOT_DEBUG_CHAT_LOAD_RESET"],
["automation_debug_chat_load_fail_on_final_mismatch", "LANGBOT_DEBUG_CHAT_LOAD_FAIL_ON_FINAL_MISMATCH"],
["automation_fake_provider_response_text", "LANGBOT_FAKE_PROVIDER_RESPONSE_TEXT"],
["automation_fake_provider_first_token_delay_ms", "LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS"],
["automation_fake_provider_chunk_delay_ms", "LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS"],
["automation_fake_provider_chunk_count", "LANGBOT_FAKE_PROVIDER_CHUNK_COUNT"],
["automation_fake_provider_fail_first_n", "LANGBOT_FAKE_PROVIDER_FAIL_FIRST_N"],
["automation_fake_provider_fail_every_n", "LANGBOT_FAKE_PROVIDER_FAIL_EVERY_N"],
["automation_fake_provider_fault_status", "LANGBOT_FAKE_PROVIDER_FAULT_STATUS"],
["automation_fake_provider_fail_after_first_chunk", "LANGBOT_FAKE_PROVIDER_FAIL_AFTER_FIRST_CHUNK"],
["automation_fake_provider_dynamic_response", "LANGBOT_FAKE_PROVIDER_DYNAMIC_RESPONSE"],
["automation_filesystem_checks_json", "LANGBOT_E2E_FILESYSTEM_CHECKS_JSON"], ["automation_filesystem_checks_json", "LANGBOT_E2E_FILESYSTEM_CHECKS_JSON"],
["automation_plugin_package", "LANGBOT_E2E_PLUGIN_PACKAGE"], ["automation_plugin_package", "LANGBOT_E2E_PLUGIN_PACKAGE"],
["automation_expected_plugin_id", "LANGBOT_E2E_EXPECTED_PLUGIN_ID"], ["automation_expected_plugin_id", "LANGBOT_E2E_EXPECTED_PLUGIN_ID"],
+159 -1
View File
@@ -1,6 +1,6 @@
import assert from "node:assert/strict"; import assert from "node:assert/strict";
import { test } from "node:test"; import { test } from "node:test";
import { appendFileSync, existsSync, mkdtempSync, mkdirSync, readFileSync, rmSync, writeFileSync } from "node:fs"; import { appendFileSync, chmodSync, existsSync, mkdtempSync, mkdirSync, readFileSync, rmSync, writeFileSync } from "node:fs";
import { spawnSync } from "node:child_process"; import { spawnSync } from "node:child_process";
import { tmpdir } from "node:os"; import { tmpdir } from "node:os";
import { join } from "node:path"; import { join } from "node:path";
@@ -676,6 +676,82 @@ test("suite run JSON captures failed case output", () => {
} }
}); });
test("suite run preserves classified env_issue automation results", () => {
const tmp = mkdtempSync(join(tmpdir(), "lbs-suite-run-env-issue-"));
try {
const skillDir = join(tmp, "skills", "langbot-testing");
const casesDir = join(skillDir, "cases");
const suitesDir = join(skillDir, "suites");
const scriptsDir = join(tmp, "scripts");
mkdirSync(casesDir, { recursive: true });
mkdirSync(suitesDir, { recursive: true });
mkdirSync(scriptsDir, { recursive: true });
writeFileSync(join(skillDir, "SKILL.md"), "---\nname: langbot-testing\ndescription: Testing.\n---\n\n# Testing\n");
writeFileSync(join(tmp, "skills", ".env"), "");
writeFileSync(
join(casesDir, "env-case.yaml"),
[
"id: env-case",
"title: Env Case",
"mode: probe",
"area: qa",
"type: smoke",
"priority: p2",
"risk: low",
"ci_eligible: true",
"automation: scripts/env-issue.mjs",
"evidence_required:",
" - filesystem",
].join("\n"),
);
writeFileSync(
join(suitesDir, "mini.yaml"),
[
"id: mini",
"title: Mini",
"description: Mini suite.",
"type: smoke",
"priority: p2",
"tags:",
" - qa",
"cases:",
" - env-case",
].join("\n"),
);
writeFileSync(
join(scriptsDir, "env-issue.mjs"),
[
"import { mkdirSync, writeFileSync } from 'node:fs';",
"import { join } from 'node:path';",
"mkdirSync(process.env.LBS_EVIDENCE_DIR, { recursive: true });",
"const result = {",
" case_id: process.env.LBS_CASE_ID,",
" run_id: process.env.LBS_RUN_ID,",
" status: 'env_issue',",
" reason: 'backend not reachable',",
" evidence_collected: ['filesystem']",
"};",
"writeFileSync(join(process.env.LBS_EVIDENCE_DIR, 'result.json'), JSON.stringify(result));",
"writeFileSync(join(process.env.LBS_EVIDENCE_DIR, 'automation-result.json'), JSON.stringify({ ...result, source: 'automation' }));",
"process.exit(2);",
].join("\n"),
);
const result = capture(() => commandSuiteRun({
root: tmp,
args: ["suite", "run", "mini", "--run-id", "mini-run", "--evidence-dir", join(tmp, "evidence"), "--json"],
}));
assert.equal(result.code, 2);
const payload = JSON.parse(result.output);
assert.equal(payload.executions[0].status, "classified");
assert.equal(payload.report.status, "env_issue");
assert.equal(payload.report.execution_status, "ok");
} finally {
rmSync(tmp, { recursive: true, force: true });
}
});
test("suite run failure cannot be masked by stale pass result", () => { test("suite run failure cannot be masked by stale pass result", () => {
const tmp = mkdtempSync(join(tmpdir(), "lbs-suite-run-stale-pass-")); const tmp = mkdtempSync(join(tmpdir(), "lbs-suite-run-stale-pass-"));
try { try {
@@ -1369,6 +1445,56 @@ test("env doctor does not require proxy variables", async () => {
} }
}); });
test("env doctor reports missing socksio for active SOCKS proxy", async () => {
const tmp = mkdtempSync(join(tmpdir(), "lbs-env-doctor-socksio-"));
const originalAllProxy = process.env.ALL_PROXY;
const originalAllProxyLower = process.env.all_proxy;
try {
delete process.env.ALL_PROXY;
delete process.env.all_proxy;
const skillsDir = join(tmp, "skills");
const repoDir = join(tmp, "LangBot");
const webDir = join(repoDir, "web");
const venvBin = join(repoDir, ".venv", "bin");
const browserProfile = join(tmp, "browser-profile");
const chromium = join(tmp, "chromium");
mkdirSync(skillsDir, { recursive: true });
mkdirSync(webDir, { recursive: true });
mkdirSync(venvBin, { recursive: true });
mkdirSync(browserProfile, { recursive: true });
writeFileSync(chromium, "");
const python = join(venvBin, "python");
writeFileSync(python, "#!/bin/sh\nexit 1\n");
chmodSync(python, 0o755);
writeFileSync(
join(skillsDir, ".env"),
[
"LANGBOT_BACKEND_URL=http://127.0.0.1:59996",
"LANGBOT_FRONTEND_URL=http://127.0.0.1:59996",
"LANGBOT_DEV_FRONTEND_URL=http://127.0.0.1:59996",
`LANGBOT_REPO=${repoDir}`,
`LANGBOT_WEB_REPO=${webDir}`,
`LANGBOT_BROWSER_PROFILE=${browserProfile}`,
`LANGBOT_CHROMIUM_EXECUTABLE=${chromium}`,
"ALL_PROXY=socks5://127.0.0.1:7890",
].join("\n"),
);
const result = await captureAsync(() => commandEnvDoctor({ root: tmp, args: ["env", "doctor"] }));
assert.equal(result.code, 1);
assert.match(result.output, /FAIL: SOCKS proxy ALL_PROXY is configured/);
assert.match(result.output, /cannot import socksio/);
assert.match(result.output, /-m pip install socksio/);
} finally {
if (originalAllProxy === undefined) delete process.env.ALL_PROXY;
else process.env.ALL_PROXY = originalAllProxy;
if (originalAllProxyLower === undefined) delete process.env.all_proxy;
else process.env.all_proxy = originalAllProxyLower;
rmSync(tmp, { recursive: true, force: true });
}
});
test("env show redacts secret-like values by default", () => { test("env show redacts secret-like values by default", () => {
const tmp = mkdtempSync(join(tmpdir(), "lbs-env-show-redact-")); const tmp = mkdtempSync(join(tmpdir(), "lbs-env-show-redact-"));
try { try {
@@ -2521,6 +2647,38 @@ test("test report renders a reusable evidence template", () => {
assert.match(result.output, /no log files provided/); assert.match(result.output, /no log files provided/);
}); });
test("test report promotes loaded automation evidence into result section", () => {
const tmp = mkdtempSync(join(tmpdir(), "lbs-report-automation-"));
try {
writeFileSync(
join(tmp, "automation-result.json"),
JSON.stringify({
status: "pass",
reason: "latency thresholds passed",
url: "http://127.0.0.1:5300",
artifacts: { metrics_json: join(tmp, "metrics.json") },
}),
);
const result = capture(() => commandTestReport(ctx([
"test",
"report",
"langbot-live-backend-latency",
"--evidence-dir",
tmp,
"--no-auto-log",
])));
assert.equal(result.code, 0);
assert.match(result.output, /## Result\n- result: pass\n- reason: latency thresholds passed/);
assert.match(result.output, /- target_tested: http:\/\/127\.0\.0\.1:5300/);
assert.doesNotMatch(result.output, /target_tested: TODO/);
assert.match(result.output, /## Automation Result/);
} finally {
rmSync(tmp, { recursive: true, force: true });
}
});
test("validate rejects dangling case references and missing automation scripts", () => { test("validate rejects dangling case references and missing automation scripts", () => {
const tmp = mkdtempSync(join(tmpdir(), "lbs-validate-strict-")); const tmp = mkdtempSync(join(tmpdir(), "lbs-validate-strict-"));
try { try {