fix(box): trust Box-reported skill paths when filesystem is not shared

In separated deployments (Docker Compose, k8s sidecar, --standalone-box, remote runtime.endpoint) the Box runtime owns its own filesystem, so the skill package_root it reports via list_skills is not resolvable on the LangBot side. LangBot's reload_skills and build_skill_extra_mounts validated those paths with os.path.isdir() against its own filesystem, which silently dropped every skill in such deployments — breaking the sandbox skill feature for the nsjail/SaaS backend. Add BoxService.shares_filesystem_with_box, derived from the connector transport (stdio = shared, WebSocket = separated), with an explicit override seam for tests/embedders. Gate both isdir() guards on it: keep local validation in shared-fs stdio mode, trust Box-reported paths otherwise. The Box runtime only reports skills found on its own filesystem, so those paths are valid there by construction. Adds topology-derivation tests (real connector, no mocks) and skill-retention tests for both shared and separated filesystems.
docs(docker): move k8s deployment docs to wiki, drop README_K8S.md
2026-06-08 06:46:02 +00:00 · 2026-06-07 12:46:52 -04:00 · 2026-06-07 11:36:39 -04:00 · 2026-06-07 11:18:27 -04:00 · 2026-06-07 08:57:43 -04:00 · 2026-06-07 08:43:30 -04:00
165 changed files with 9684 additions and 4963 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,81 +1,134 @@
 # AGENTS.md

-This file is for guiding code agents (like Claude Code, GitHub Copilot, OpenAI Codex, etc.) to work in LangBot project.
+This file guides code agents (Claude Code, GitHub Copilot, OpenAI Codex, etc.) working in the LangBot project. `CLAUDE.md` is a symlink to this file.

 ## Project Overview

-LangBot is a open-source LLM native instant messaging bot development platform, aiming to provide an out-of-the-box IM robot development experience, with Agent, RAG, MCP and other LLM application functions, supporting global instant messaging platforms, and providing rich API interfaces, supporting custom development.
+LangBot is an open-source, LLM-native instant-messaging bot development platform. It aims to provide an out-of-the-box IM bot development experience with Agent, RAG, MCP and other LLM application capabilities, supporting mainstream global IM platforms and exposing rich APIs for custom development.

-LangBot has a comprehensive frontend, all operations can be performed through the frontend. The project splited into these major parts:
+LangBot has a comprehensive web frontend — almost every operation can be performed through it.

- `./src/langbot`: The main python package of the project, below are the main modules in this package:
-    - `./pkg`: The core python package of the project backend.
-        - `./pkg/platform`: The platform module of the project, containing the logic of message platform adapters, bot managers, message session managers, etc.
-        - `./pkg/provider`: The provider module of the project, containing the logic of LLM providers, tool providers, etc.
-        - `./pkg/pipeline`: The pipeline module of the project, containing the logic of pipelines, stages, query pool, etc.
-        - `./pkg/api`: The api module of the project, containing the http api controllers and services.
-        - `./pkg/plugin`: LangBot bridge for connecting with plugin system.
-    - `./libs`: Some SDKs we previously developed for the project, such as `qq_official_api`, `wecom_api`, etc.
-    - `./templates`: Templates of config files, components, etc.
-    - `./web`: Frontend codebase, built with Next.js + **shadcn** + **Tailwind CSS**.
-    - `./docker`: docker-compose deployment files.
+- **Python**: `>=3.11,<4.0`, dependencies managed by `uv`. Package version is in `pyproject.toml`.
+- **Frontend**: `web/` is a **Vite + React Router 7 + shadcn/ui + Tailwind CSS** SPA, managed by `pnpm`. (Note: this is NOT Next.js — the `dev` script is `vite`.)
+- **Backend framework**: Quart (the async flavour of Flask). The HTTP API and the pre-built web UI are both served by the backend on `http://127.0.0.1:5300`.

-## Backend Development
+## Repository Layout

-We use `uv` to manage dependencies.
+```
+LangBot/
+├── main.py                     # Entrypoint shim -> langbot.__main__.main()
+├── pyproject.toml              # Python project + deps (uv), pins langbot-plugin==<x.y.z>
+├── src/langbot/
+│   ├── __main__.py             # Real entrypoint, CLI args (--standalone-runtime, --standalone-box, --debug)
+│   ├── pkg/                    # Core backend package
+│   │   ├── api/                # HTTP API controllers + services (Quart)
+│   │   ├── core/               # App bootstrap, stages, task manager
+│   │   ├── platform/           # IM platform adapters, bot managers, session managers
+│   │   ├── provider/           # LLM providers, requesters, tool providers
+│   │   ├── pipeline/           # Pipelines, stages, query pool
+│   │   ├── plugin/             # Bridge connecting LangBot to the plugin runtime (see below)
+│   │   ├── box/                # Code-sandbox subsystem (Docker / nsjail / E2B backends)
+│   │   ├── skill/              # Skill subsystem
+│   │   ├── rag/ , vector/      # RAG + vector store
+│   │   ├── command/            # Built-in commands
+│   │   ├── persistence/        # ORM models + Alembic migrations (SQLite & PostgreSQL)
+│   │   ├── storage/            # Object/file storage abstractions
+│   │   ├── config/, entity/, discover/, utils/, telemetry/, survey/
+│   ├── libs/                   # Vendored SDKs (qq_official_api, wecom_api, etc.)
+│   └── templates/              # Config/component templates (e.g. templates/config.yaml)
+├── web/                        # Frontend SPA (Vite + React Router 7 + shadcn + Tailwind)
+└── docker/                     # docker-compose deployment files
+```
+
+## Development Environment Setup
+
+Full guide lives in the wiki: **["开发配置" / Dev Config](https://docs.langbot.app/zh/develop/dev-config)**. Summary:
+
+### Backend

 ```bash
 pip install uv
-uv sync --dev
+uv sync --dev          # uv creates a .venv/ for you; point your editor's interpreter at it
+uv run main.py         # serves API + web UI on http://127.0.0.1:5300
 ```

-Start the backend and run the project in development mode.
+On first run the config file is generated at `data/config.yaml`. DB is SQLite by default (zero setup); PostgreSQL is supported. Migrations run automatically on startup.

-```bash
-uv run main.py
-```
+### Frontend

-Then you can access the project at `http://127.0.0.1:5300`.
-
-## Frontend Development
-
-We use `pnpm` to manage dependencies.
+Requires Node.js + [pnpm](https://pnpm.io/installation).

 ```bash
 cd web
-cp .env.example .env
+cp .env.example .env   # Windows: copy .env.example .env
 pnpm install
-pnpm dev
+pnpm dev               # http://127.0.0.1:3000  (npm install / npm run dev also work)
 ```

-Then you can access the project at `http://127.0.0.1:3000`.
+`pnpm dev` reads `VITE_API_BASE_URL` from `web/.env` so the dev frontend can reach the backend on port `5300`. In production the frontend is pre-built into static files served by the backend on the same origin.

-## Plugin System Architecture
+### Code formatting

-LangBot is composed of various internal components such as Large Language Model tools, commands, messaging platform adapters, LLM requesters, and more. To meet extensibility and flexibility requirements, we have implemented a production-grade plugin system.
+The repo runs lint + format checks in CI. Install the pre-commit hooks so the same checks run locally before each commit:

-Each plugin runs in an independent process, managed uniformly by the Plugin Runtime. It has two operating modes: `stdio` and `websocket`. When LangBot is started directly by users (not running in a container), it uses `stdio` mode, which is common for personal users or lightweight environments. When LangBot runs in a container, it uses `websocket` mode, designed specifically for production environments.
+```bash
+uv run pre-commit install
+```

-Plugin Runtime automatically starts each installed plugin and interacts through stdio. In plugin development scenarios, developers can use the lbp command-line tool to start plugins and connect to the running Runtime via WebSocket for debugging.
+## Plugin System

-> Plugin SDK, CLI, Runtime, and entities definitions shared between LangBot and plugins are contained in the [`langbot-plugin-sdk`](https://github.com/langbot-app/langbot-plugin-sdk) repository.
+LangBot's plugin system (Plugin SDK, CLI `lbp`, Plugin Runtime, and the shared entity/API definitions) lives in a **separate repository**: [`langbot-plugin-sdk`](https://github.com/langbot-app/langbot-plugin-sdk). LangBot depends on it via the pinned `langbot-plugin` package in `pyproject.toml`.

-## Some Development Tips and Standards
+### Architecture (what to know inside this repo)

- LangBot is a global project, any comments in code should be in English, and user experience should be considered in all aspects.
- Thus you should consider the i18n support in all aspects.
- LangBot is widely adopted in both toC and toB scenarios, so you should consider the compatibility and security in all aspects.
- If you were asked to make a commit, please follow the commit message format: 
-    - format: <type>(<scope>): <subject>
-    - type: must be a specific type, such as feat (new feature), fix (bug fix), docs (documentation), style (code style), refactor (refactoring), perf (performance optimization), etc.
-    - scope: the scope of the commit, such as the package name, the file name, the function name, the class name, the module name, etc.
-    - subject: the subject of the commit, such as the description of the commit, the reason for the commit, the impact of the commit, etc.
- LangBot uses [Alembic](https://alembic.sqlalchemy.org/) to manage database migrations, supporting both SQLite and PostgreSQL. Migration files are located in `src/langbot/pkg/persistence/alembic/versions/`. If you changed the definition of database entities (ORM models), generate a new migration script by running `uv run python -m langbot.pkg.persistence.alembic_runner autogenerate "description of your change"` in the project root (requires `data/config.yaml` to exist). Review and edit the generated script before committing. Migrations are executed automatically on LangBot startup. For data migrations (e.g. modifying JSON field content), you need to manually add the migration code in the generated script.
+- Plugins run as independent processes managed by the **Plugin Runtime**. The Runtime supports two control transports: `stdio` and `websocket`.
+- When LangBot is started directly by a user (not in a container), it spawns and connects to the Runtime over **stdio** (lightweight/personal use).
+- When LangBot runs in a container, it connects to a standalone Runtime over **WebSocket** (production).
+- The bridge code lives in `src/langbot/pkg/plugin/` (`connector.py`, `handler.py`).
+- Relevant config (`data/config.yaml`): `plugin.runtime_ws_url` (e.g. `ws://langbot_plugin_runtime:5400/control/ws`). Start LangBot with `--standalone-runtime` to make it connect to an externally-launched Runtime over WebSocket instead of spawning one over stdio.
+
+### Debugging the Plugin Runtime / CLI / SDK
+
+This is documented in detail in the **SDK repo's `AGENTS.md`** and in the wiki page **["调试插件运行时、CLI、SDK" / Plugin Runtime](https://docs.langbot.app/zh/develop/plugin-runtime)**. The short version:
+
+- Clone `LangBot` and `langbot-plugin-sdk` as siblings under one parent dir so the editor resolves shared entities.
+- Start a standalone Runtime from the SDK repo: `uv run --no-sync lbp rt` (control port `5400`, debug port `5401`).
+- To make LangBot use a locally-modified SDK: from the SDK dir, with LangBot's `.venv` active, run `uv pip install .`, then launch LangBot with `uv run --no-sync main.py --standalone-runtime` (keep `--no-sync` so your local SDK isn't overwritten).
+
+### Debugging the Box (sandbox) runtime
+
+The Box subsystem (`src/langbot/pkg/box/`) is the code sandbox. It picks the first available backend among **Docker / nsjail / E2B**. The standalone Box runtime is launched via the SDK CLI: `lbp box`. Backend selection details, the `lbp box` flags, and the SDK-side architecture are documented in the SDK repo's `AGENTS.md`.
+
+Relevant config (`data/config.yaml`, `box:` section): `box.enabled` (master switch — disabling it also disables the native sandbox tools, skill add/edit, and stdio-mode MCP servers), `box.backend` (`'local'` = Docker/nsjail auto-pick, or `'docker'` / `'nsjail'` / `'e2b'`; also settable via `BOX__BACKEND`), and `box.runtime.endpoint` (external Box runtime base URL, e.g. `ws://127.0.0.1:5410`; empty = local auto-managed runtime). Like the plugin runtime, LangBot can connect to an externally-launched Box runtime by setting that endpoint and starting with `--standalone-box`.
+
+> A common false "No supported sandbox backend (Docker / nsjail / E2B) is available" comes from Docker being installed and running but the current user not being in the `docker` group → `docker info` gets `permission denied` on the socket. Fix: `sudo usermod -aG docker <user>` and restart the backend in a shell that has the new group.
+
+## Development Standards
+
+- LangBot is a global project: **all code comments and docstrings must be in English**, and every user-facing string must support **i18n** (`en_US` + `zh_Hans` at minimum, plus `ja_JP` where the repo already has it).
+- LangBot is adopted in both toC and toB scenarios — always consider compatibility and security.
+- **Commit message format**: `<type>(<scope>): <subject>`
+  - `type`: one of `feat`, `fix`, `docs`, `style`, `refactor`, `perf`, `test`, `chore`, etc.
+  - `scope`: the affected package/module/file/class.
+  - `subject`: concise description of the change.
+
+### Database migrations (Alembic)
+
+LangBot uses [Alembic](https://alembic.sqlalchemy.org/) for migrations, supporting both SQLite and PostgreSQL from a single set of scripts. Migration files live in `src/langbot/pkg/persistence/alembic/versions/`.
+
+If you change ORM model definitions, generate a migration:
+
+```bash
+# Run from the project root (requires data/config.yaml to exist)
+uv run python -m langbot.pkg.persistence.alembic_runner autogenerate "description of your change"
+```
+
+Review and edit the generated script before committing. Migrations execute automatically on startup. `autogenerate` detects schema changes (add/drop columns, tables, type changes) but **data migrations** (e.g. mutating JSON field contents) must be hand-written into the generated script. `env.py` sets `render_as_batch=True`, so SQLite's ALTER TABLE limits are handled automatically — no need to branch per database. More in the wiki ["开发配置"](https://docs.langbot.app/zh/develop/dev-config#数据库迁移).

 ## Some Principles

 - Keep it simple, stupid.
- Entities should not be multiplied unnecessarily
+- Entities should not be multiplied unnecessarily.
 - 八荣八耻

    以瞎猜接口为耻，以认真查询为荣。
@@ -85,4 +138,4 @@ Plugin Runtime automatically starts each installed plugin and interacts through
    以跳过验证为耻，以主动测试为荣。
    以破坏架构为耻，以遵循规范为荣。
    以假装理解为耻，以诚实无知为荣。
-    以盲目修改为耻，以谨慎重构为荣。
+    以盲目修改为耻，以谨慎重构为荣。
--- a/README.md
+++ b/README.md
@@ -38,7 +38,7 @@ LangBot is an **open-source, production-grade platform** for building AI-powered

 ### Key Capabilities

- **AI Conversations & Agents** — Multi-turn dialogues, tool calling, multi-modal support, streaming output. Built-in RAG (knowledge base) with deep integration to [Dify](https://dify.ai), [Coze](https://coze.com), [n8n](https://n8n.io), [Langflow](https://langflow.org).
+- **AI Conversations & Agents** — Multi-turn dialogues, tool calling, multi-modal support, streaming output. Built-in RAG (knowledge base) with deep integration to [Dify](https://dify.ai), [Coze](https://coze.com), [n8n](https://n8n.io), [Langflow](https://langflow.org), [Deerflow](https://deerflow.tech), [Weknora](https://weknora.weixin.qq.com).
 - **Universal IM Platform Support** — One codebase for Discord, Telegram, Slack, LINE, QQ, WeChat, WeCom, Lark, DingTalk, KOOK.
 - **Production-Ready** — Access control, rate limiting, sensitive word filtering, comprehensive monitoring, and exception handling. Trusted by enterprises.
 - **Plugin Ecosystem** — Hundreds of plugins, event-driven architecture, component extensions, and [MCP protocol](https://modelcontextprotocol.io/) support.
@@ -78,7 +78,7 @@ docker compose up -d
 [![Deploy on Zeabur](https://zeabur.com/button.svg)](https://zeabur.com/en-US/templates/ZKTBDH)
 [![Deploy on Railway](https://railway.com/button.svg)](https://railway.app/template/yRrAyL?referralCode=vogKPF)

-**More options:** [Docker](https://link.langbot.app/en/docs/docker) · [Manual](https://link.langbot.app/en/docs/manual-deploy) · [BTPanel](https://link.langbot.app/en/docs/bt-panel) · [Kubernetes](./docker/README_K8S.md)
+**More options:** [Docker](https://link.langbot.app/en/docs/docker) · [Manual](https://link.langbot.app/en/docs/manual-deploy) · [BTPanel](https://link.langbot.app/en/docs/bt-panel) · [Kubernetes](https://docs.langbot.app/en/deploy/langbot/kubernetes)

 ---

--- a/README_CN.md
+++ b/README_CN.md
@@ -13,7 +13,7 @@
 [English](README.md) / 简体中文 / [繁體中文](README_TW.md) / [日本語](README_JP.md) / [Español](README_ES.md) / [Français](README_FR.md) / [한국어](README_KO.md) / [Русский](README_RU.md) / [Tiếng Việt](README_VI.md)

 [![Discord](https://img.shields.io/discord/1335141740050649118?logo=discord&labelColor=%20%235462eb&logoColor=%20%23f5f5f5&color=%20%235462eb)](https://discord.gg/wdNEHETs87)
-[![QQ Group](https://img.shields.io/badge/%E7%A4%BE%E5%8C%BAQQ%E7%BE%A4-1030838208-blue)](https://qm.qq.com/q/DxZZcNxM1W)
+[![QQ Group](https://img.shields.io/badge/%E7%A4%BE%E5%8C%BAQQ%E7%BE%A4-1030838208-blue)](https://qm.qq.com/q/IrlV8QFacU)
 [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/langbot-app/LangBot)
 [![GitHub release (latest by date)](https://img.shields.io/github/v/release/langbot-app/LangBot)](https://github.com/langbot-app/LangBot/releases/latest)
 <img src="https://img.shields.io/badge/python-3.10 ~ 3.13 -blue.svg" alt="python">
@@ -38,7 +38,7 @@ LangBot 是一个**开源的生产级平台**，用于构建 AI 驱动的即时

 ### 核心能力

- **AI 对话与 Agent** — 多轮对话、工具调用、多模态、流式输出。自带 RAG（知识库），深度集成 [Dify](https://dify.ai)、[Coze](https://coze.com)、[n8n](https://n8n.io)、[Langflow](https://langflow.org) 等 LLMOps 平台。
+- **AI 对话与 Agent** — 多轮对话、工具调用、多模态、流式输出。自带 RAG（知识库），深度集成 [Dify](https://dify.ai)、[Coze](https://coze.com)、[n8n](https://n8n.io)、[Langflow](https://langflow.org)、[Deerflow](https://deerflow.tech)、[Weknora](https://weknora.weixin.qq.com)等 LLMOps 平台。
 - **全平台支持** — 一套代码，覆盖 QQ、微信、企业微信、飞书、钉钉、Discord、Telegram、Slack、LINE、KOOK 等平台。
 - **生产就绪** — 访问控制、限速、敏感词过滤、全面监控与异常处理，已被多家企业采用。
 - **插件生态** — 数百个插件，跨进程的事件驱动架构，组件扩展，适配 [MCP 协议](https://modelcontextprotocol.io/)。
@@ -78,7 +78,7 @@ docker compose up -d
 [![Deploy on Zeabur](https://zeabur.com/button.svg)](https://zeabur.com/zh-CN/templates/ZKTBDH)
 [![Deploy on Railway](https://railway.com/button.svg)](https://railway.app/template/yRrAyL?referralCode=vogKPF)

-**更多方式：** [Docker](https://link.langbot.app/zh/docs/docker) · [手动部署](https://link.langbot.app/zh/docs/manual-deploy) · [宝塔面板](https://link.langbot.app/zh/docs/bt-panel) · [Kubernetes](./docker/README_K8S.md)
+**更多方式：** [Docker](https://link.langbot.app/zh/docs/docker) · [手动部署](https://link.langbot.app/zh/docs/manual-deploy) · [宝塔面板](https://link.langbot.app/zh/docs/bt-panel) · [Kubernetes](https://docs.langbot.app/zh/deploy/langbot/kubernetes)

 ---

--- a/README_ES.md
+++ b/README_ES.md
@@ -37,7 +37,7 @@ LangBot es una **plataforma de código abierto y grado de producción** para con

 ### Capacidades Clave

- **Conversaciones e Agentes IA** — Diálogos de múltiples turnos, llamadas a herramientas, soporte multimodal, salida en streaming. RAG (base de conocimientos) incorporado con integración profunda con [Dify](https://dify.ai), [Coze](https://coze.com), [n8n](https://n8n.io), [Langflow](https://langflow.org).
+- **Conversaciones e Agentes IA** — Diálogos de múltiples turnos, llamadas a herramientas, soporte multimodal, salida en streaming. RAG (base de conocimientos) incorporado con integración profunda con [Dify](https://dify.ai), [Coze](https://coze.com), [n8n](https://n8n.io), [Langflow](https://langflow.org), [Deerflow](https://deerflow.tech)、[Weknora](https://weknora.weixin.qq.com).
 - **Soporte Universal de Plataformas de MI** — Un solo código base para Discord, Telegram, Slack, LINE, QQ, WeChat, WeCom, Lark, DingTalk, KOOK.
 - **Listo para Producción** — Control de acceso, limitación de velocidad, filtrado de palabras sensibles, monitoreo completo y manejo de excepciones. De confianza para empresas.
 - **Ecosistema de Plugins** — Cientos de plugins, arquitectura basada en eventos, extensiones de componentes y soporte del [protocolo MCP](https://modelcontextprotocol.io/).
@@ -77,7 +77,7 @@ docker compose up -d
 [![Deploy on Zeabur](https://zeabur.com/button.svg)](https://zeabur.com/en-US/templates/ZKTBDH)
 [![Deploy on Railway](https://railway.com/button.svg)](https://railway.app/template/yRrAyL?referralCode=vogKPF)

-**Más opciones:** [Docker](https://link.langbot.app/en/docs/docker) · [Manual](https://link.langbot.app/en/docs/manual-deploy) · [BTPanel](https://link.langbot.app/en/docs/bt-panel) · [Kubernetes](./docker/README_K8S.md)
+**Más opciones:** [Docker](https://link.langbot.app/en/docs/docker) · [Manual](https://link.langbot.app/en/docs/manual-deploy) · [BTPanel](https://link.langbot.app/en/docs/bt-panel) · [Kubernetes](https://docs.langbot.app/en/deploy/langbot/kubernetes)

 ---

--- a/README_FR.md
+++ b/README_FR.md
@@ -37,7 +37,7 @@ LangBot est une **plateforme open-source de niveau production** pour créer des

 ### Capacités Clés

- **Conversations IA & Agents** — Dialogues multi-tours, appels d'outils, support multimodal, sortie en streaming. RAG (base de connaissances) intégré avec intégration profonde de [Dify](https://dify.ai), [Coze](https://coze.com), [n8n](https://n8n.io), [Langflow](https://langflow.org).
+- **Conversations IA & Agents** — Dialogues multi-tours, appels d'outils, support multimodal, sortie en streaming. RAG (base de connaissances) intégré avec intégration profonde de [Dify](https://dify.ai), [Coze](https://coze.com), [n8n](https://n8n.io), [Langflow](https://langflow.org), [Deerflow](https://deerflow.tech), [Weknora](https://weknora.weixin.qq.com).
 - **Support Universel des Plateformes de MI** — Un seul code pour Discord, Telegram, Slack, LINE, QQ, WeChat, WeCom, Lark, DingTalk, KOOK.
 - **Prêt pour la Production** — Contrôle d'accès, limitation de débit, filtrage de mots sensibles, surveillance complète et gestion des exceptions. Approuvé par les entreprises.
 - **Écosystème de Plugins** — Des centaines de plugins, architecture événementielle, extensions de composants, et support du [protocole MCP](https://modelcontextprotocol.io/).
@@ -77,7 +77,7 @@ docker compose up -d
 [![Deploy on Zeabur](https://zeabur.com/button.svg)](https://zeabur.com/en-US/templates/ZKTBDH)
 [![Deploy on Railway](https://railway.com/button.svg)](https://railway.app/template/yRrAyL?referralCode=vogKPF)

-**Plus d'options :** [Docker](https://link.langbot.app/en/docs/docker) · [Manuel](https://link.langbot.app/en/docs/manual-deploy) · [BTPanel](https://link.langbot.app/en/docs/bt-panel) · [Kubernetes](./docker/README_K8S.md)
+**Plus d'options :** [Docker](https://link.langbot.app/en/docs/docker) · [Manuel](https://link.langbot.app/en/docs/manual-deploy) · [BTPanel](https://link.langbot.app/en/docs/bt-panel) · [Kubernetes](https://docs.langbot.app/en/deploy/langbot/kubernetes)

 ---

--- a/README_JP.md
+++ b/README_JP.md
@@ -37,7 +37,7 @@ LangBot は、AI搭載のインスタントメッセージングボットを構

 ### 主な機能

- **AI対話とエージェント** — マルチターン対話、ツール呼び出し、マルチモーダル対応、ストリーミング出力。RAG（ナレッジベース）を内蔵し、[Dify](https://dify.ai)、[Coze](https://coze.com)、[n8n](https://n8n.io)、[Langflow](https://langflow.org) と深く統合。
+- **AI対話とエージェント** — マルチターン対話、ツール呼び出し、マルチモーダル対応、ストリーミング出力。RAG（ナレッジベース）を内蔵し、[Dify](https://dify.ai)、[Coze](https://coze.com)、[n8n](https://n8n.io)、[Langflow](https://langflow.org)、[Deerflow](https://deerflow.tech)、[Weknora](https://weknora.weixin.qq.com) と深く統合。
 - **ユニバーサルIMプラットフォーム対応** — 単一のコードベースで Discord、Telegram、Slack、LINE、QQ、WeChat、WeCom、Lark、DingTalk、KOOK に対応。
 - **本番環境対応** — アクセス制御、レート制限、センシティブワードフィルタリング、包括的な監視、例外処理を搭載。エンタープライズの信頼に応える品質。
 - **プラグインエコシステム** — 数百のプラグイン、イベント駆動アーキテクチャ、コンポーネント拡張、[MCPプロトコル](https://modelcontextprotocol.io/)対応。
@@ -77,7 +77,7 @@ docker compose up -d
 [![Deploy on Zeabur](https://zeabur.com/button.svg)](https://zeabur.com/en-US/templates/ZKTBDH)
 [![Deploy on Railway](https://railway.com/button.svg)](https://railway.app/template/yRrAyL?referralCode=vogKPF)

-**その他:** [Docker](https://link.langbot.app/en/docs/docker) · [手動デプロイ](https://link.langbot.app/en/docs/manual-deploy) · [BTPanel](https://link.langbot.app/en/docs/bt-panel) · [Kubernetes](./docker/README_K8S.md)
+**その他:** [Docker](https://link.langbot.app/en/docs/docker) · [手動デプロイ](https://link.langbot.app/en/docs/manual-deploy) · [BTPanel](https://link.langbot.app/en/docs/bt-panel) · [Kubernetes](https://docs.langbot.app/en/deploy/langbot/kubernetes)

 ---

--- a/README_KO.md
+++ b/README_KO.md
@@ -37,7 +37,7 @@ LangBot은 AI 기반 인스턴트 메시징 봇을 구축하기 위한 **오픈

 ### 핵심 기능

- **AI 대화 및 에이전트** — 멀티턴 대화, 도구 호출, 멀티모달 지원, 스트리밍 출력. 내장 RAG(지식 베이스)와 [Dify](https://dify.ai), [Coze](https://coze.com), [n8n](https://n8n.io), [Langflow](https://langflow.org) 심층 통합.
+- **AI 대화 및 에이전트** — 멀티턴 대화, 도구 호출, 멀티모달 지원, 스트리밍 출력. 내장 RAG(지식 베이스)와 [Dify](https://dify.ai), [Coze](https://coze.com), [n8n](https://n8n.io), [Langflow](https://langflow.org), [Deerflow](https://deerflow.tech), [Weknora](https://weknora.weixin.qq.com) 심층 통합.
 - **유니버설 IM 플랫폼 지원** — 단일 코드베이스로 Discord, Telegram, Slack, LINE, QQ, WeChat, WeCom, Lark, DingTalk, KOOK 지원.
 - **프로덕션 레디** — 접근 제어, 속도 제한, 민감어 필터링, 종합 모니터링 및 예외 처리. 기업 환경에서 검증됨.
 - **플러그인 생태계** — 수백 개의 플러그인, 이벤트 기반 아키텍처, 컴포넌트 확장, [MCP 프로토콜](https://modelcontextprotocol.io/) 지원.
@@ -77,7 +77,7 @@ docker compose up -d
 [![Deploy on Zeabur](https://zeabur.com/button.svg)](https://zeabur.com/en-US/templates/ZKTBDH)
 [![Deploy on Railway](https://railway.com/button.svg)](https://railway.app/template/yRrAyL?referralCode=vogKPF)

-**더 많은 옵션:** [Docker](https://link.langbot.app/en/docs/docker) · [수동 배포](https://link.langbot.app/en/docs/manual-deploy) · [BTPanel](https://link.langbot.app/en/docs/bt-panel) · [Kubernetes](./docker/README_K8S.md)
+**더 많은 옵션:** [Docker](https://link.langbot.app/en/docs/docker) · [수동 배포](https://link.langbot.app/en/docs/manual-deploy) · [BTPanel](https://link.langbot.app/en/docs/bt-panel) · [Kubernetes](https://docs.langbot.app/en/deploy/langbot/kubernetes)

 ---

--- a/README_RU.md
+++ b/README_RU.md
@@ -37,7 +37,7 @@ LangBot — это **платформа с открытым исходным к

 ### Ключевые возможности

- **ИИ-диалоги и агенты** — Многораундовые диалоги, вызов инструментов, мультимодальная поддержка, потоковый вывод. Встроенная реализация RAG (база знаний) с глубокой интеграцией в [Dify](https://dify.ai), [Coze](https://coze.com), [n8n](https://n8n.io), [Langflow](https://langflow.org).
+- **ИИ-диалоги и агенты** — Многораундовые диалоги, вызов инструментов, мультимодальная поддержка, потоковый вывод. Встроенная реализация RAG (база знаний) с глубокой интеграцией в [Dify](https://dify.ai), [Coze](https://coze.com), [n8n](https://n8n.io), [Langflow](https://langflow.org), [Deerflow](https://deerflow.tech), [Weknora](https://weknora.weixin.qq.com).
 - **Универсальная поддержка IM-платформ** — Единая кодовая база для Discord, Telegram, Slack, LINE, QQ, WeChat, WeCom, Lark, DingTalk, KOOK.
 - **Готовность к продакшену** — Контроль доступа, ограничение скорости, фильтрация чувствительных слов, комплексный мониторинг и обработка исключений. Проверено в корпоративной среде.
 - **Экосистема плагинов** — Сотни плагинов, событийно-ориентированная архитектура, расширения компонентов и поддержка [протокола MCP](https://modelcontextprotocol.io/).
@@ -77,7 +77,7 @@ docker compose up -d
 [![Deploy on Zeabur](https://zeabur.com/button.svg)](https://zeabur.com/en-US/templates/ZKTBDH)
 [![Deploy on Railway](https://railway.com/button.svg)](https://railway.app/template/yRrAyL?referralCode=vogKPF)

-**Другие варианты:** [Docker](https://link.langbot.app/en/docs/docker) · [Ручная установка](https://link.langbot.app/en/docs/manual-deploy) · [BTPanel](https://link.langbot.app/en/docs/bt-panel) · [Kubernetes](./docker/README_K8S.md)
+**Другие варианты:** [Docker](https://link.langbot.app/en/docs/docker) · [Ручная установка](https://link.langbot.app/en/docs/manual-deploy) · [BTPanel](https://link.langbot.app/en/docs/bt-panel) · [Kubernetes](https://docs.langbot.app/en/deploy/langbot/kubernetes)

 ---

--- a/README_TW.md
+++ b/README_TW.md
@@ -39,7 +39,7 @@ LangBot 是一個**開源的生產級平台**，用於建構 AI 驅動的即時

 ### 核心能力

- **AI 對話與 Agent** — 多輪對話、工具調用、多模態、流式輸出。自帶 RAG（知識庫），深度整合 [Dify](https://dify.ai)、[Coze](https://coze.com)、[n8n](https://n8n.io)、[Langflow](https://langflow.org) 等 LLMOps 平台。
+- **AI 對話與 Agent** — 多輪對話、工具調用、多模態、流式輸出。自帶 RAG（知識庫），深度整合 [Dify](https://dify.ai)、[Coze](https://coze.com)、[n8n](https://n8n.io)、[Langflow](https://langflow.org)、 [Deerflow](https://deerflow.tech)、[Weknora](https://weknora.weixin.qq.com)等 LLMOps 平台。
 - **全平台支援** — 一套程式碼，覆蓋 QQ、微信、企業微信、飛書、釘釘、Discord、Telegram、Slack、LINE、KOOK 等平台。
 - **生產就緒** — 存取控制、限速、敏感詞過濾、全面監控與異常處理，已被多家企業採用。
 - **外掛生態** — 數百個外掛，事件驅動架構，組件擴展，適配 [MCP 協議](https://modelcontextprotocol.io/)。
@@ -79,7 +79,7 @@ docker compose up -d
 [![Deploy on Zeabur](https://zeabur.com/button.svg)](https://zeabur.com/zh-CN/templates/ZKTBDH)
 [![Deploy on Railway](https://railway.com/button.svg)](https://railway.app/template/yRrAyL?referralCode=vogKPF)

-**更多方式：** [Docker](https://link.langbot.app/zh/docs/docker) · [手動部署](https://link.langbot.app/zh/docs/manual-deploy) · [寶塔面板](https://link.langbot.app/zh/docs/bt-panel) · [Kubernetes](./docker/README_K8S.md)
+**更多方式：** [Docker](https://link.langbot.app/zh/docs/docker) · [手動部署](https://link.langbot.app/zh/docs/manual-deploy) · [寶塔面板](https://link.langbot.app/zh/docs/bt-panel) · [Kubernetes](https://docs.langbot.app/zh/deploy/langbot/kubernetes)

 ---

--- a/README_VI.md
+++ b/README_VI.md
@@ -37,7 +37,7 @@ LangBot là một **nền tảng mã nguồn mở, cấp sản xuất** để x

 ### Khả năng chính

- **Hội thoại AI & Agent** — Đối thoại nhiều lượt, gọi công cụ, hỗ trợ đa phương thức, đầu ra streaming. RAG (cơ sở kiến thức) tích hợp sẵn với tích hợp sâu vào [Dify](https://dify.ai), [Coze](https://coze.com), [n8n](https://n8n.io), [Langflow](https://langflow.org).
+- **Hội thoại AI & Agent** — Đối thoại nhiều lượt, gọi công cụ, hỗ trợ đa phương thức, đầu ra streaming. RAG (cơ sở kiến thức) tích hợp sẵn với tích hợp sâu vào [Dify](https://dify.ai), [Coze](https://coze.com), [n8n](https://n8n.io), [Langflow](https://langflow.org), [Deerflow](https://deerflow.tech), [Weknora](https://weknora.weixin.qq.com).
 - **Hỗ trợ đa nền tảng IM** — Một mã nguồn cho Discord, Telegram, Slack, LINE, QQ, WeChat, WeCom, Lark, DingTalk, KOOK.
 - **Sẵn sàng cho sản xuất** — Kiểm soát truy cập, giới hạn tốc độ, lọc từ nhạy cảm, giám sát toàn diện và xử lý ngoại lệ. Được doanh nghiệp tin dùng.
 - **Hệ sinh thái Plugin** — Hàng trăm plugin, kiến trúc hướng sự kiện, mở rộng thành phần, và hỗ trợ [giao thức MCP](https://modelcontextprotocol.io/).
@@ -77,7 +77,7 @@ docker compose up -d
 [![Deploy on Zeabur](https://zeabur.com/button.svg)](https://zeabur.com/en-US/templates/ZKTBDH)
 [![Deploy on Railway](https://railway.com/button.svg)](https://railway.app/template/yRrAyL?referralCode=vogKPF)

-**Thêm tùy chọn:** [Docker](https://link.langbot.app/en/docs/docker) · [Thủ công](https://link.langbot.app/en/docs/manual-deploy) · [BTPanel](https://link.langbot.app/en/docs/bt-panel) · [Kubernetes](./docker/README_K8S.md)
+**Thêm tùy chọn:** [Docker](https://link.langbot.app/en/docs/docker) · [Thủ công](https://link.langbot.app/en/docs/manual-deploy) · [BTPanel](https://link.langbot.app/en/docs/bt-panel) · [Kubernetes](https://docs.langbot.app/en/deploy/langbot/kubernetes)

 ---

--- a/docker/README_K8S.md
+++ b/docker/README_K8S.md
@@ -1,629 +0,0 @@
-# LangBot Kubernetes 部署指南 / Kubernetes Deployment Guide
-
-[简体中文](#简体中文) | [English](#english)
-
---
-
-## 简体中文
-
-### 概述
-
-本指南提供了在 Kubernetes 集群中部署 LangBot 的完整步骤。Kubernetes 部署配置基于 `docker-compose.yaml`，适用于生产环境的容器化部署。
-
-### 前置要求
-
- Kubernetes 集群（版本 1.19+）
- `kubectl` 命令行工具已配置并可访问集群
- 集群中有可用的存储类（StorageClass）用于持久化存储（可选但推荐）
- 至少 2 vCPU 和 4GB RAM 的可用资源
-
-### 架构说明
-
-Kubernetes 部署包含以下组件：
-
-1. **langbot**: 主应用服务
-   - 提供 Web UI（端口 5300）
-   - 处理平台 webhook（端口 2280-2290）
-   - 数据持久化卷
-   
-2. **langbot-plugin-runtime**: 插件运行时服务
-   - WebSocket 通信（端口 5400）
-   - 插件数据持久化卷
-
-3. **持久化存储**:
-   - `langbot-data`: LangBot 主数据
-   - `langbot-plugins`: 插件文件
-   - `langbot-plugin-runtime-data`: 插件运行时数据
-
-### 快速开始
-
-#### 1. 下载部署文件
-
-```bash
-# 克隆仓库
-git clone https://github.com/langbot-app/LangBot
-cd LangBot/docker
-
-# 或直接下载 kubernetes.yaml
-wget https://raw.githubusercontent.com/langbot-app/LangBot/main/docker/kubernetes.yaml
-```
-
-#### 2. 部署到 Kubernetes
-
-```bash
-# 应用所有配置
-kubectl apply -f kubernetes.yaml
-
-# 检查部署状态
-kubectl get all -n langbot
-
-# 查看 Pod 日志
-kubectl logs -n langbot -l app=langbot -f
-```
-
-#### 3. 访问 LangBot
-
-默认情况下，LangBot 服务使用 ClusterIP 类型，只能在集群内部访问。您可以选择以下方式之一来访问：
-
-**选项 A: 端口转发（推荐用于测试）**
-
-```bash
-kubectl port-forward -n langbot svc/langbot 5300:5300
-```
-
-然后访问 http://localhost:5300
-
-**选项 B: NodePort（适用于开发环境）**
-
-编辑 `kubernetes.yaml`，取消注释 NodePort Service 部分，然后：
-
-```bash
-kubectl apply -f kubernetes.yaml
-# 获取节点 IP
-kubectl get nodes -o wide
-# 访问 http://<NODE_IP>:30300
-```
-
-**选项 C: LoadBalancer（适用于云环境）**
-
-编辑 `kubernetes.yaml`，取消注释 LoadBalancer Service 部分，然后：
-
-```bash
-kubectl apply -f kubernetes.yaml
-# 获取外部 IP
-kubectl get svc -n langbot langbot-loadbalancer
-# 访问 http://<EXTERNAL_IP>
-```
-
-**选项 D: Ingress（推荐用于生产环境）**
-
-确保集群中已安装 Ingress Controller（如 nginx-ingress），然后：
-
-1. 编辑 `kubernetes.yaml` 中的 Ingress 配置
-2. 修改域名为您的实际域名
-3. 应用配置：
-
-```bash
-kubectl apply -f kubernetes.yaml
-# 访问 http://langbot.yourdomain.com
-```
-
-### 配置说明
-
-#### 环境变量
-
-在 `ConfigMap` 中配置环境变量：
-
-```yaml
-apiVersion: v1
-kind: ConfigMap
-metadata:
-  name: langbot-config
-  namespace: langbot
-data:
-  TZ: "Asia/Shanghai"  # 修改为您的时区
-```
-
-#### 存储配置
-
-默认使用动态存储分配。如果您有特定的 StorageClass，请在 PVC 中指定：
-
-```yaml
-spec:
-  storageClassName: your-storage-class-name
-  accessModes:
-    - ReadWriteOnce
-  resources:
-    requests:
-      storage: 10Gi
-```
-
-#### 资源限制
-
-根据您的需求调整资源限制：
-
-```yaml
-resources:
-  requests:
-    memory: "1Gi"
-    cpu: "500m"
-  limits:
-    memory: "4Gi"
-    cpu: "2000m"
-```
-
-### 常用操作
-
-#### 查看日志
-
-```bash
-# 查看 LangBot 主服务日志
-kubectl logs -n langbot -l app=langbot -f
-
-# 查看插件运行时日志
-kubectl logs -n langbot -l app=langbot-plugin-runtime -f
-```
-
-#### 重启服务
-
-```bash
-# 重启 LangBot
-kubectl rollout restart deployment/langbot -n langbot
-
-# 重启插件运行时
-kubectl rollout restart deployment/langbot-plugin-runtime -n langbot
-```
-
-#### 更新镜像
-
-```bash
-# 更新到最新版本
-kubectl set image deployment/langbot -n langbot langbot=rockchin/langbot:latest
-kubectl set image deployment/langbot-plugin-runtime -n langbot langbot-plugin-runtime=rockchin/langbot:latest
-
-# 检查更新状态
-kubectl rollout status deployment/langbot -n langbot
-```
-
-#### 扩容（不推荐）
-
-注意：由于 LangBot 使用 ReadWriteOnce 的持久化存储，不支持多副本扩容。如需高可用，请考虑使用 ReadWriteMany 存储或其他架构方案。
-
-#### 备份数据
-
-```bash
-# 备份 PVC 数据
-kubectl exec -n langbot -it <langbot-pod-name> -- tar czf /tmp/backup.tar.gz /app/data
-kubectl cp langbot/<langbot-pod-name>:/tmp/backup.tar.gz ./backup.tar.gz
-```
-
-### 卸载
-
-```bash
-# 删除所有资源（保留 PVC）
-kubectl delete deployment,service,configmap -n langbot --all
-
-# 删除 PVC（会删除数据）
-kubectl delete pvc -n langbot --all
-
-# 删除命名空间
-kubectl delete namespace langbot
-```
-
-### 故障排查
-
-#### Pod 无法启动
-
-```bash
-# 查看 Pod 状态
-kubectl get pods -n langbot
-
-# 查看详细信息
-kubectl describe pod -n langbot <pod-name>
-
-# 查看事件
-kubectl get events -n langbot --sort-by='.lastTimestamp'
-```
-
-#### 存储问题
-
-```bash
-# 检查 PVC 状态
-kubectl get pvc -n langbot
-
-# 检查 PV
-kubectl get pv
-```
-
-#### 网络访问问题
-
-```bash
-# 检查 Service
-kubectl get svc -n langbot
-
-# 检查端口转发
-kubectl port-forward -n langbot svc/langbot 5300:5300
-```
-
-### 生产环境建议
-
-1. **使用特定版本标签**：避免使用 `latest` 标签，使用具体版本号如 `rockchin/langbot:v1.0.0`
-2. **配置资源限制**：根据实际负载调整 CPU 和内存限制
-3. **使用 Ingress + TLS**：配置 HTTPS 访问和证书管理
-4. **配置监控和告警**：集成 Prometheus、Grafana 等监控工具
-5. **定期备份**：配置自动备份策略保护数据
-6. **使用专用 StorageClass**：为生产环境配置高性能存储
-7. **配置亲和性规则**：确保 Pod 调度到合适的节点
-
-### 高级配置
-
-#### 使用 Secrets 管理敏感信息
-
-如果需要配置 API 密钥等敏感信息：
-
-```yaml
-apiVersion: v1
-kind: Secret
-metadata:
-  name: langbot-secrets
-  namespace: langbot
-type: Opaque
-data:
-  api_key: <base64-encoded-value>
-```
-
-然后在 Deployment 中引用：
-
-```yaml
-env:
- name: API_KEY
-  valueFrom:
-    secretKeyRef:
-      name: langbot-secrets
-      key: api_key
-```
-
-#### 配置水平自动扩缩容（HPA）
-
-注意：需要确保使用 ReadWriteMany 存储类型
-
-```yaml
-apiVersion: autoscaling/v2
-kind: HorizontalPodAutoscaler
-metadata:
-  name: langbot-hpa
-  namespace: langbot
-spec:
-  scaleTargetRef:
-    apiVersion: apps/v1
-    kind: Deployment
-    name: langbot
-  minReplicas: 1
-  maxReplicas: 3
-  metrics:
-  - type: Resource
-    resource:
-      name: cpu
-      target:
-        type: Utilization
-        averageUtilization: 70
-```
-
-### 参考资源
-
- [LangBot 官方文档](https://docs.langbot.app)
- [Docker 部署文档](https://link.langbot.app/zh/docs/docker)
- [Kubernetes 官方文档](https://kubernetes.io/docs/)
-
---
-
-## English
-
-### Overview
-
-This guide provides complete steps for deploying LangBot in a Kubernetes cluster. The Kubernetes deployment configuration is based on `docker-compose.yaml` and is suitable for production containerized deployments.
-
-### Prerequisites
-
- Kubernetes cluster (version 1.19+)
- `kubectl` command-line tool configured with cluster access
- Available StorageClass in the cluster for persistent storage (optional but recommended)
- At least 2 vCPU and 4GB RAM of available resources
-
-### Architecture
-
-The Kubernetes deployment includes the following components:
-
-1. **langbot**: Main application service
-   - Provides Web UI (port 5300)
-   - Handles platform webhooks (ports 2280-2290)
-   - Data persistence volume
-   
-2. **langbot-plugin-runtime**: Plugin runtime service
-   - WebSocket communication (port 5400)
-   - Plugin data persistence volume
-
-3. **Persistent Storage**:
-   - `langbot-data`: LangBot main data
-   - `langbot-plugins`: Plugin files
-   - `langbot-plugin-runtime-data`: Plugin runtime data
-
-### Quick Start
-
-#### 1. Download Deployment Files
-
-```bash
-# Clone repository
-git clone https://github.com/langbot-app/LangBot
-cd LangBot/docker
-
-# Or download kubernetes.yaml directly
-wget https://raw.githubusercontent.com/langbot-app/LangBot/main/docker/kubernetes.yaml
-```
-
-#### 2. Deploy to Kubernetes
-
-```bash
-# Apply all configurations
-kubectl apply -f kubernetes.yaml
-
-# Check deployment status
-kubectl get all -n langbot
-
-# View Pod logs
-kubectl logs -n langbot -l app=langbot -f
-```
-
-#### 3. Access LangBot
-
-By default, LangBot service uses ClusterIP type, accessible only within the cluster. Choose one of the following methods to access:
-
-**Option A: Port Forwarding (Recommended for testing)**
-
-```bash
-kubectl port-forward -n langbot svc/langbot 5300:5300
-```
-
-Then visit http://localhost:5300
-
-**Option B: NodePort (Suitable for development)**
-
-Edit `kubernetes.yaml`, uncomment the NodePort Service section, then:
-
-```bash
-kubectl apply -f kubernetes.yaml
-# Get node IP
-kubectl get nodes -o wide
-# Visit http://<NODE_IP>:30300
-```
-
-**Option C: LoadBalancer (Suitable for cloud environments)**
-
-Edit `kubernetes.yaml`, uncomment the LoadBalancer Service section, then:
-
-```bash
-kubectl apply -f kubernetes.yaml
-# Get external IP
-kubectl get svc -n langbot langbot-loadbalancer
-# Visit http://<EXTERNAL_IP>
-```
-
-**Option D: Ingress (Recommended for production)**
-
-Ensure an Ingress Controller (e.g., nginx-ingress) is installed in the cluster, then:
-
-1. Edit the Ingress configuration in `kubernetes.yaml`
-2. Change the domain to your actual domain
-3. Apply configuration:
-
-```bash
-kubectl apply -f kubernetes.yaml
-# Visit http://langbot.yourdomain.com
-```
-
-### Configuration
-
-#### Environment Variables
-
-Configure environment variables in ConfigMap:
-
-```yaml
-apiVersion: v1
-kind: ConfigMap
-metadata:
-  name: langbot-config
-  namespace: langbot
-data:
-  TZ: "Asia/Shanghai"  # Change to your timezone
-```
-
-#### Storage Configuration
-
-Uses dynamic storage provisioning by default. If you have a specific StorageClass, specify it in PVC:
-
-```yaml
-spec:
-  storageClassName: your-storage-class-name
-  accessModes:
-    - ReadWriteOnce
-  resources:
-    requests:
-      storage: 10Gi
-```
-
-#### Resource Limits
-
-Adjust resource limits based on your needs:
-
-```yaml
-resources:
-  requests:
-    memory: "1Gi"
-    cpu: "500m"
-  limits:
-    memory: "4Gi"
-    cpu: "2000m"
-```
-
-### Common Operations
-
-#### View Logs
-
-```bash
-# View LangBot main service logs
-kubectl logs -n langbot -l app=langbot -f
-
-# View plugin runtime logs
-kubectl logs -n langbot -l app=langbot-plugin-runtime -f
-```
-
-#### Restart Services
-
-```bash
-# Restart LangBot
-kubectl rollout restart deployment/langbot -n langbot
-
-# Restart plugin runtime
-kubectl rollout restart deployment/langbot-plugin-runtime -n langbot
-```
-
-#### Update Images
-
-```bash
-# Update to latest version
-kubectl set image deployment/langbot -n langbot langbot=rockchin/langbot:latest
-kubectl set image deployment/langbot-plugin-runtime -n langbot langbot-plugin-runtime=rockchin/langbot:latest
-
-# Check update status
-kubectl rollout status deployment/langbot -n langbot
-```
-
-#### Scaling (Not Recommended)
-
-Note: Due to LangBot using ReadWriteOnce persistent storage, multi-replica scaling is not supported. For high availability, consider using ReadWriteMany storage or alternative architectures.
-
-#### Backup Data
-
-```bash
-# Backup PVC data
-kubectl exec -n langbot -it <langbot-pod-name> -- tar czf /tmp/backup.tar.gz /app/data
-kubectl cp langbot/<langbot-pod-name>:/tmp/backup.tar.gz ./backup.tar.gz
-```
-
-### Uninstall
-
-```bash
-# Delete all resources (keep PVCs)
-kubectl delete deployment,service,configmap -n langbot --all
-
-# Delete PVCs (will delete data)
-kubectl delete pvc -n langbot --all
-
-# Delete namespace
-kubectl delete namespace langbot
-```
-
-### Troubleshooting
-
-#### Pods Not Starting
-
-```bash
-# Check Pod status
-kubectl get pods -n langbot
-
-# View detailed information
-kubectl describe pod -n langbot <pod-name>
-
-# View events
-kubectl get events -n langbot --sort-by='.lastTimestamp'
-```
-
-#### Storage Issues
-
-```bash
-# Check PVC status
-kubectl get pvc -n langbot
-
-# Check PV
-kubectl get pv
-```
-
-#### Network Access Issues
-
-```bash
-# Check Service
-kubectl get svc -n langbot
-
-# Test port forwarding
-kubectl port-forward -n langbot svc/langbot 5300:5300
-```
-
-### Production Recommendations
-
-1. **Use specific version tags**: Avoid using `latest` tag, use specific version like `rockchin/langbot:v1.0.0`
-2. **Configure resource limits**: Adjust CPU and memory limits based on actual load
-3. **Use Ingress + TLS**: Configure HTTPS access and certificate management
-4. **Configure monitoring and alerts**: Integrate monitoring tools like Prometheus, Grafana
-5. **Regular backups**: Configure automated backup strategy to protect data
-6. **Use dedicated StorageClass**: Configure high-performance storage for production
-7. **Configure affinity rules**: Ensure Pods are scheduled to appropriate nodes
-
-### Advanced Configuration
-
-#### Using Secrets for Sensitive Information
-
-If you need to configure sensitive information like API keys:
-
-```yaml
-apiVersion: v1
-kind: Secret
-metadata:
-  name: langbot-secrets
-  namespace: langbot
-type: Opaque
-data:
-  api_key: <base64-encoded-value>
-```
-
-Then reference in Deployment:
-
-```yaml
-env:
- name: API_KEY
-  valueFrom:
-    secretKeyRef:
-      name: langbot-secrets
-      key: api_key
-```
-
-#### Configure Horizontal Pod Autoscaling (HPA)
-
-Note: Requires ReadWriteMany storage type
-
-```yaml
-apiVersion: autoscaling/v2
-kind: HorizontalPodAutoscaler
-metadata:
-  name: langbot-hpa
-  namespace: langbot
-spec:
-  scaleTargetRef:
-    apiVersion: apps/v1
-    kind: Deployment
-    name: langbot
-  minReplicas: 1
-  maxReplicas: 3
-  metrics:
-  - type: Resource
-    resource:
-      name: cpu
-      target:
-        type: Utilization
-        averageUtilization: 70
-```
-
-### References
-
- [LangBot Official Documentation](https://docs.langbot.app)
- [Docker Deployment Guide](https://link.langbot.app/zh/docs/docker)
- [Kubernetes Official Documentation](https://kubernetes.io/docs/)
--- a/docker/docker-compose.yaml
+++ b/docker/docker-compose.yaml
@@ -1,5 +1,5 @@
 # Docker Compose configuration for LangBot
-# For Kubernetes deployment, see kubernetes.yaml and README_K8S.md
+# For Kubernetes deployment, see kubernetes.yaml and the deployment guide at https://docs.langbot.app
 version: "3"

 services:
--- a/docker/kubernetes.yaml
+++ b/docker/kubernetes.yaml
@@ -1,6 +1,8 @@
 # Kubernetes Deployment for LangBot
 # This file provides Kubernetes deployment manifests for LangBot based on docker-compose.yaml
-# 
+#
+# Full deployment guide (zh/en/ja): https://docs.langbot.app -> Installation -> Kubernetes
+#
 # Usage:
 #   kubectl apply -f kubernetes.yaml
 #
@@ -8,13 +10,15 @@
 #   - A Kubernetes cluster (1.19+)
 #   - kubectl configured to communicate with your cluster
 #   - (Optional) A StorageClass for dynamic volume provisioning
+#   - For the Box sandbox runtime: a node with a reachable Docker daemon
+#     (the box mounts the node's /var/run/docker.sock). See the deployment guide.
 #
 # Components:
 #   - Namespace: langbot
 #   - PersistentVolumeClaims for data persistence
-#   - Deployments for langbot and langbot_plugin_runtime
+#   - Deployments for langbot, langbot-plugin-runtime, and langbot-box (sandbox)
 #   - Services for network access
-#   - ConfigMap for timezone configuration
+#   - ConfigMap for timezone + runtime endpoints

 ---
 # Namespace
@@ -83,6 +87,11 @@ metadata:
 data:
  TZ: "Asia/Shanghai"
  PLUGIN__RUNTIME_WS_URL: "ws://langbot-plugin-runtime:5400/control/ws"
+  # Box sandbox runtime endpoint. LangBot connects to the Box runtime over
+  # WebSocket. The hostname MUST match the langbot-box Service name. Note the
+  # in-container default ("langbot_box") uses an underscore, which is an
+  # invalid Kubernetes DNS name — so the endpoint is always set explicitly here.
+  BOX__RUNTIME__ENDPOINT: "ws://langbot-box:5410"

 ---
 # Deployment for LangBot Plugin Runtime
@@ -169,6 +178,136 @@ spec:
    protocol: TCP
    name: runtime

+---
+# Deployment for LangBot Box (sandbox) runtime
+#
+# The Box runtime backs LangBot's sandbox tools (exec / read / write / edit /
+# glob / grep), the `activate` skill tool, skill add/edit, and stdio-mode MCP
+# servers. It is OPTIONAL: if you do not deploy it, set `BOX__ENABLED=false` on
+# the langbot Deployment (or `box.enabled: false` in config.yaml) so the
+# dashboard renders cleanly with sandbox features disabled.
+#
+# IMPORTANT — how the sandbox actually runs:
+#   The bundled image ships only the Docker CLI (no dockerd, no nsjail). The Box
+#   runtime therefore creates sandbox containers by talking to a Docker daemon
+#   over the mounted socket (`/var/run/docker.sock`). Because that daemon
+#   resolves bind-mount paths on the NODE filesystem, the Box workspace root
+#   must be the SAME absolute path inside the box container, inside every
+#   sandbox container it spawns, AND on the node. That is why this manifest uses
+#   a hostPath at a fixed absolute path (/app/data/box) and pins langbot + box
+#   to the same node via podAffinity. A normal PVC will NOT work for the box
+#   workspace, because the node's dockerd cannot see paths that exist only
+#   inside the pod's mount namespace.
+#
+# Security note: mounting the host Docker socket grants the Box runtime (and any
+# code executed in the sandbox) effective root on the node. Only deploy Box on
+# nodes you trust for this workload, ideally a dedicated node pool. For a
+# stronger isolation boundary, switch box.backend to 'e2b' (set E2B_API_KEY) and
+# drop the docker.sock mount + hostPath entirely.
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: langbot-box
+  namespace: langbot
+  labels:
+    app: langbot-box
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: langbot-box
+  template:
+    metadata:
+      labels:
+        app: langbot-box
+    spec:
+      # Pin to the same node as langbot so they share the hostPath box root.
+      affinity:
+        podAffinity:
+          requiredDuringSchedulingIgnoredDuringExecution:
+          - labelSelector:
+              matchLabels:
+                app: langbot
+            topologyKey: kubernetes.io/hostname
+      containers:
+      - name: langbot-box
+        image: rockchin/langbot:latest
+        imagePullPolicy: Always
+        # Launched through the same CLI entry point as the plugin runtime.
+        # No flag => WebSocket control transport (default), listening on 5410.
+        command: ["uv", "run", "--no-sync", "-m", "langbot_plugin.cli.__init__", "box"]
+        ports:
+        - containerPort: 5410
+          name: box-rpc
+          protocol: TCP
+        env:
+        - name: TZ
+          valueFrom:
+            configMapKeyRef:
+              name: langbot-config
+              key: TZ
+        # The Box runtime does NOT read box.local.* / BOX__* from its own env;
+        # it receives its configuration from LangBot via the INIT RPC action.
+        # Do not add BOX__* here — they would be silently ignored.
+        volumeMounts:
+        # Box workspace root — identical path on node, box, and sandbox
+        # containers (see the IMPORTANT note above).
+        - name: box-root
+          mountPath: /app/data/box
+        # Host Docker socket — the sandbox backend uses it to create containers.
+        - name: docker-sock
+          mountPath: /var/run/docker.sock
+        resources:
+          requests:
+            memory: "256Mi"
+            cpu: "100m"
+          limits:
+            memory: "1Gi"
+            cpu: "1000m"
+        livenessProbe:
+          tcpSocket:
+            port: 5410
+          initialDelaySeconds: 20
+          periodSeconds: 10
+          timeoutSeconds: 5
+          failureThreshold: 3
+        readinessProbe:
+          tcpSocket:
+            port: 5410
+          initialDelaySeconds: 10
+          periodSeconds: 5
+          timeoutSeconds: 3
+          failureThreshold: 3
+      volumes:
+      - name: box-root
+        hostPath:
+          path: /app/data/box
+          type: DirectoryOrCreate
+      - name: docker-sock
+        hostPath:
+          path: /var/run/docker.sock
+          type: Socket
+      restartPolicy: Always
+
+---
+# Service for LangBot Box runtime
+apiVersion: v1
+kind: Service
+metadata:
+  name: langbot-box
+  namespace: langbot
+  labels:
+    app: langbot-box
+spec:
+  type: ClusterIP
+  selector:
+    app: langbot-box
+  ports:
+  - port: 5410
+    targetPort: 5410
+    protocol: TCP
+    name: box-rpc
+
 ---
 # Deployment for LangBot
 apiVersion: apps/v1
@@ -213,11 +352,36 @@ spec:
            configMapKeyRef:
              name: langbot-config
              key: PLUGIN__RUNTIME_WS_URL
+        # Box (sandbox) runtime endpoint. Connects LangBot to the langbot-box
+        # Service over WebSocket. Remove this (and the langbot-box Deployment)
+        # and set BOX__ENABLED=false if you do not want the sandbox.
+        - name: BOX__RUNTIME__ENDPOINT
+          valueFrom:
+            configMapKeyRef:
+              name: langbot-config
+              key: BOX__RUNTIME__ENDPOINT
+        # box.local.* config — forwarded to the Box runtime via INIT RPC. The
+        # host_root MUST match the box-root hostPath mountPath below AND the box
+        # Deployment's box-root mountPath, so that skill package paths resolve
+        # identically on both sides and on the node's Docker daemon.
+        - name: BOX__LOCAL__HOST_ROOT
+          value: "/app/data/box"
+        - name: BOX__LOCAL__DEFAULT_WORKSPACE
+          value: "default"
+        - name: BOX__LOCAL__SKILLS_ROOT
+          value: "skills"
+        - name: BOX__LOCAL__ALLOWED_MOUNT_ROOTS
+          value: "/app/data/box"
        volumeMounts:
        - name: data
          mountPath: /app/data
        - name: plugins
          mountPath: /app/plugins
+        # Same node-level box root as the langbot-box Deployment. Mounted over
+        # the data PVC's /app/data/box subpath so both LangBot and the Box
+        # runtime (and the node's dockerd) agree on one absolute path.
+        - name: box-root
+          mountPath: /app/data/box
        resources:
          requests:
            memory: "1Gi"
@@ -250,6 +414,13 @@ spec:
      - name: plugins
        persistentVolumeClaim:
          claimName: langbot-plugins
+      # Node-level box workspace root, shared with the langbot-box Deployment.
+      # hostPath (not PVC) because the node's Docker daemon must see the same
+      # absolute path when bind-mounting workspaces into sandbox containers.
+      - name: box-root
+        hostPath:
+          path: /app/data/box
+          type: DirectoryOrCreate
      restartPolicy: Always

 ---
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -8,7 +8,7 @@ requires-python = ">=3.11,<4.0"
 dependencies = [
    "aiocqhttp>=1.4.4",
    "aiofiles>=24.1.0",
-    "aiohttp>=3.13.4",
+    "aiohttp>=3.14.0",
    "aioshutil>=1.5",
    "aiosqlite>=0.21.0",
    "anthropic>=0.51.0",
@@ -31,27 +31,27 @@ dependencies = [
    "psutil>=7.0.0",
    "pycryptodome>=3.22.0",
    "pydantic>2.0",
-    "pyjwt>=2.10.1",
+    "pyjwt>=2.12.0",
    "python-telegram-bot>=22.0",
    "pyyaml>=6.0.2",
    "qq-botpy-rc>=1.2.1.6",
    "qrcode>=7.4",
    "quart>=0.20.0",
    "quart-cors>=0.8.0",
-    "requests>=2.32.3",
+    "requests>=2.33.0",
    "slack-sdk>=3.35.0",
    "alembic>=1.15.0",
    "sqlalchemy[asyncio]>=2.0.40",
    "sqlmodel>=0.0.24",
    "telegramify-markdown>=0.5.1",
    "tiktoken>=0.9.0",
-    "urllib3>=2.4.0",
+    "urllib3>=2.7.0",
    "websockets>=15.0.1",
    "python-socks>=2.7.1", # dingtalk missing dependency
-    "pip>=25.1.1",
+    "pip>=26.1",
    "ruff>=0.11.9",
    "pre-commit>=4.2.0",
-    "uv>=0.11.6",
+    "uv>=0.11.15",
    "mypy>=1.16.0",
    "PyPDF2>=3.0.1",
    "python-docx>=1.1.0",
@@ -62,10 +62,10 @@ dependencies = [
    "ebooklib>=0.18",
    "html2text>=2024.2.26",
    "langchain>=0.2.0",
-    "langchain-core>=1.2.28",
-    "langsmith>=0.7.31",
-    "python-multipart>=0.0.26",
-    "Mako>=1.3.11",
+    "langchain-core>=1.3.3",
+    "langsmith>=0.8.0",
+    "python-multipart>=0.0.27",
+    "Mako>=1.3.12",
    "langchain-text-splitters>=1.1.2",
    "chromadb>=1.0.0,<2.0.0",
    "qdrant-client (>=1.15.1,<2.0.0)",
@@ -79,7 +79,6 @@ dependencies = [
    "pymilvus>=2.6.4",
    "pgvector>=0.4.1",
    "botocore>=1.42.39",
-    "litellm>=1.0.0",
 ]
 keywords = [
    "bot",
--- a/src/langbot/libs/deerflow_api/init.py
+++ b/src/langbot/libs/deerflow_api/init.py
@@ -0,0 +1,5 @@
+from .client import AsyncDeerFlowClient
+from .errors import DeerFlowAPIError
+from . import stream_utils
+
+__all__ = ['AsyncDeerFlowClient', 'DeerFlowAPIError', 'stream_utils']
--- a/src/langbot/libs/deerflow_api/client.py
+++ b/src/langbot/libs/deerflow_api/client.py
@@ -0,0 +1,204 @@
+"""DeerFlow LangGraph HTTP API 客户端
+
+参考 astrbot 的 deerflow_api_client 实现，使用 httpx 适配 LangBot 风格。
+"""
+
+from __future__ import annotations
+
+import codecs
+import json
+import typing
+from collections.abc import AsyncGenerator
+
+import httpx
+
+from .errors import DeerFlowAPIError
+
+
+SSE_MAX_BUFFER_CHARS = 1_048_576
+
+
+def _normalize_sse_newlines(text: str) -> str:
+    """规范化 CRLF/CR 为 LF，确保 SSE 块分割稳定"""
+    return text.replace('\r\n', '\n').replace('\r', '\n')
+
+
+def _parse_sse_data_lines(data_lines: list[str]) -> typing.Any:
+    raw_data = '\n'.join(data_lines)
+    try:
+        return json.loads(raw_data)
+    except json.JSONDecodeError:
+        # 某些 LangGraph 兼容服务端会在单个 SSE 事件中用多个 data 行
+        # 发送多段 JSON 片段（例如 tuple payload）
+        parsed_lines: list[typing.Any] = []
+        can_parse_all = True
+        for line in data_lines:
+            line = line.strip()
+            if not line:
+                continue
+            try:
+                parsed_lines.append(json.loads(line))
+            except json.JSONDecodeError:
+                can_parse_all = False
+                break
+        if can_parse_all and parsed_lines:
+            return parsed_lines[0] if len(parsed_lines) == 1 else parsed_lines
+        return raw_data
+
+
+def _parse_sse_block(block: str) -> dict[str, typing.Any] | None:
+    if not block.strip():
+        return None
+
+    event_name = 'message'
+    data_lines: list[str] = []
+    for line in block.splitlines():
+        if line.startswith('event:'):
+            event_name = line[6:].strip()
+        elif line.startswith('data:'):
+            data_lines.append(line[5:].lstrip())
+
+    if not data_lines:
+        return None
+    return {'event': event_name, 'data': _parse_sse_data_lines(data_lines)}
+
+
+class AsyncDeerFlowClient:
+    """DeerFlow LangGraph HTTP API 客户端"""
+
+    api_base: str
+    headers: dict[str, str]
+
+    def __init__(
+        self,
+        api_base: str = 'http://127.0.0.1:2026',
+        api_key: str = '',
+        auth_header: str = '',
+    ) -> None:
+        self.api_base = api_base.rstrip('/')
+        self.headers: dict[str, str] = {}
+        if auth_header:
+            self.headers['Authorization'] = auth_header
+        elif api_key:
+            self.headers['Authorization'] = f'Bearer {api_key}'
+
+    async def create_thread(self, timeout: float = 20) -> dict[str, typing.Any]:
+        """创建一个新的 LangGraph thread
+
+        Returns:
+            包含 thread_id 等信息的字典
+        """
+        url = f'{self.api_base}/api/langgraph/threads'
+        payload = {'metadata': {}}
+
+        async with httpx.AsyncClient(
+            trust_env=True,
+            timeout=timeout,
+        ) as client:
+            response = await client.post(
+                url,
+                headers=self.headers,
+                json=payload,
+            )
+            if response.status_code not in (200, 201):
+                raise DeerFlowAPIError(
+                    operation='create thread',
+                    status=response.status_code,
+                    body=response.text,
+                    url=url,
+                )
+            return response.json()
+
+    async def delete_thread(self, thread_id: str, timeout: float = 20) -> None:
+        """删除指定 thread"""
+        url = f'{self.api_base}/api/threads/{thread_id}'
+
+        async with httpx.AsyncClient(
+            trust_env=True,
+            timeout=timeout,
+        ) as client:
+            response = await client.delete(url, headers=self.headers)
+            if response.status_code not in (200, 202, 204, 404):
+                raise DeerFlowAPIError(
+                    operation='delete thread',
+                    status=response.status_code,
+                    body=response.text,
+                    url=url,
+                    thread_id=thread_id,
+                )
+
+    async def stream_run(
+        self,
+        thread_id: str,
+        payload: dict[str, typing.Any],
+        timeout: float = 120,
+    ) -> AsyncGenerator[dict[str, typing.Any], None]:
+        """运行一次 LangGraph stream 请求，逐事件 yield
+
+        Yields:
+            事件字典 {'event': event_name, 'data': parsed_data}
+        """
+        url = f'{self.api_base}/api/langgraph/threads/{thread_id}/runs/stream'
+
+        # 流式请求使用单独的 read timeout 控制
+        stream_timeout = httpx.Timeout(
+            connect=min(timeout, 30),
+            read=timeout,
+            write=timeout,
+            pool=timeout,
+        )
+
+        async with httpx.AsyncClient(
+            trust_env=True,
+            timeout=stream_timeout,
+        ) as client:
+            async with client.stream(
+                'POST',
+                url,
+                headers={
+                    **self.headers,
+                    'Accept': 'text/event-stream',
+                    'Content-Type': 'application/json',
+                },
+                json=payload,
+            ) as resp:
+                if resp.status_code != 200:
+                    body = await resp.aread()
+                    raise DeerFlowAPIError(
+                        operation='runs/stream request',
+                        status=resp.status_code,
+                        body=body.decode('utf-8', errors='replace'),
+                        url=url,
+                        thread_id=thread_id,
+                    )
+
+                decoder = codecs.getincrementaldecoder('utf-8')('replace')
+                buffer = ''
+
+                async for chunk in resp.aiter_bytes(8192):
+                    buffer += _normalize_sse_newlines(decoder.decode(chunk))
+
+                    while '\n\n' in buffer:
+                        block, buffer = buffer.split('\n\n', 1)
+                        parsed = _parse_sse_block(block)
+                        if parsed is not None:
+                            yield parsed
+
+                    if len(buffer) > SSE_MAX_BUFFER_CHARS:
+                        # 缓冲区过大，强制 flush
+                        parsed = _parse_sse_block(buffer)
+                        if parsed is not None:
+                            yield parsed
+                        buffer = ''
+
+                # flush 剩余内容
+                buffer += _normalize_sse_newlines(decoder.decode(b'', final=True))
+                while '\n\n' in buffer:
+                    block, buffer = buffer.split('\n\n', 1)
+                    parsed = _parse_sse_block(block)
+                    if parsed is not None:
+                        yield parsed
+                if buffer.strip():
+                    parsed = _parse_sse_block(buffer)
+                    if parsed is not None:
+                        yield parsed
--- a/src/langbot/libs/deerflow_api/errors.py
+++ b/src/langbot/libs/deerflow_api/errors.py
@@ -0,0 +1,30 @@
+from __future__ import annotations
+
+
+class DeerFlowAPIError(Exception):
+    """DeerFlow API 请求失败"""
+
+    def __init__(
+        self,
+        *,
+        operation: str = '',
+        status: int = 0,
+        body: str = '',
+        url: str = '',
+        thread_id: str | None = None,
+        message: str = '',
+    ) -> None:
+        self.operation = operation
+        self.status = status
+        self.body = body
+        self.url = url
+        self.thread_id = thread_id
+
+        if message:
+            super().__init__(message)
+            return
+
+        msg = f'DeerFlow {operation} failed: status={status}, url={url}, body={body}'
+        if thread_id is not None:
+            msg = f'DeerFlow {operation} failed: thread_id={thread_id}, status={status}, url={url}, body={body}'
+        super().__init__(msg)
--- a/src/langbot/libs/deerflow_api/stream_utils.py
+++ b/src/langbot/libs/deerflow_api/stream_utils.py
@@ -0,0 +1,212 @@
+"""DeerFlow LangGraph 流式响应解析工具
+
+参考 astrbot 实现的 deerflow_stream_utils。
+"""
+
+from __future__ import annotations
+
+import typing
+from collections.abc import Iterable
+
+
+def extract_text(content: typing.Any) -> str:
+    """从消息 content 中提取纯文本"""
+    if isinstance(content, str):
+        return content
+    if isinstance(content, dict):
+        if isinstance(content.get('text'), str):
+            return content['text']
+        if 'content' in content:
+            return extract_text(content.get('content'))
+        if 'kwargs' in content and isinstance(content['kwargs'], dict):
+            return extract_text(content['kwargs'].get('content'))
+    if isinstance(content, list):
+        parts: list[str] = []
+        for item in content:
+            if isinstance(item, str):
+                parts.append(item)
+            elif isinstance(item, dict):
+                item_type = item.get('type')
+                if item_type == 'text' and isinstance(item.get('text'), str):
+                    parts.append(item['text'])
+                elif 'content' in item:
+                    parts.append(extract_text(item['content']))
+        return '\n'.join([p for p in parts if p]).strip()
+    return str(content) if content is not None else ''
+
+
+def extract_messages_from_values_data(data: typing.Any) -> list[typing.Any]:
+    """从 values 事件中提取 messages 列表"""
+    candidates: list[typing.Any] = []
+    if isinstance(data, dict):
+        candidates.append(data)
+        if isinstance(data.get('values'), dict):
+            candidates.append(data['values'])
+    elif isinstance(data, list):
+        candidates.extend([x for x in data if isinstance(x, dict)])
+
+    for item in candidates:
+        messages = item.get('messages')
+        if isinstance(messages, list):
+            return messages
+    return []
+
+
+def is_ai_message(message: dict[str, typing.Any]) -> bool:
+    """判断是否为 AI/assistant 消息"""
+    role = str(message.get('role', '')).lower()
+    if role in {'assistant', 'ai'}:
+        return True
+
+    msg_type = str(message.get('type', '')).lower()
+    if msg_type in {'ai', 'assistant', 'aimessage', 'aimessagechunk'}:
+        return True
+    if 'ai' in msg_type and all(token not in msg_type for token in ('human', 'tool', 'system')):
+        return True
+    return False
+
+
+def extract_latest_ai_text(messages: Iterable[typing.Any]) -> str:
+    """获取最近一条 AI 消息的文本内容"""
+    if isinstance(messages, (list, tuple)):
+        iterable = reversed(messages)
+    else:
+        iterable = reversed(list(messages))
+
+    for msg in iterable:
+        if not isinstance(msg, dict):
+            continue
+        if is_ai_message(msg):
+            text = extract_text(msg.get('content'))
+            if text:
+                return text
+    return ''
+
+
+def extract_latest_ai_message(messages: Iterable[typing.Any]) -> dict[str, typing.Any] | None:
+    """获取最近一条 AI 消息对象"""
+    if isinstance(messages, (list, tuple)):
+        iterable = reversed(messages)
+    else:
+        iterable = reversed(list(messages))
+
+    for msg in iterable:
+        if not isinstance(msg, dict):
+            continue
+        if is_ai_message(msg):
+            return msg
+    return None
+
+
+def is_clarification_tool_message(message: dict[str, typing.Any]) -> bool:
+    """判断是否为澄清问题工具消息"""
+    msg_type = str(message.get('type', '')).lower()
+    tool_name = str(message.get('name', '')).lower()
+    return msg_type == 'tool' and tool_name == 'ask_clarification'
+
+
+def extract_latest_clarification_text(messages: Iterable[typing.Any]) -> str:
+    """提取最近的澄清问题文本"""
+    if isinstance(messages, (list, tuple)):
+        iterable = reversed(messages)
+    else:
+        iterable = reversed(list(messages))
+
+    for msg in iterable:
+        if not isinstance(msg, dict):
+            continue
+        if is_clarification_tool_message(msg):
+            text = extract_text(msg.get('content'))
+            if text:
+                return text
+    return ''
+
+
+def get_message_id(message: typing.Any) -> str:
+    """提取消息 ID"""
+    if not isinstance(message, dict):
+        return ''
+    msg_id = message.get('id')
+    return msg_id if isinstance(msg_id, str) else ''
+
+
+def extract_event_message_obj(data: typing.Any) -> dict[str, typing.Any] | None:
+    """从事件 data 中提取消息对象"""
+    msg_obj = data
+    if isinstance(data, (list, tuple)) and data:
+        msg_obj = data[0]
+    if isinstance(msg_obj, dict) and isinstance(msg_obj.get('data'), dict):
+        msg_obj = msg_obj['data']
+    return msg_obj if isinstance(msg_obj, dict) else None
+
+
+def extract_ai_delta_from_event_data(data: typing.Any) -> str:
+    """从 messages-tuple 事件中提取 AI delta 文本"""
+    msg_obj = extract_event_message_obj(data)
+    if not msg_obj:
+        return ''
+    if is_ai_message(msg_obj):
+        return extract_text(msg_obj.get('content'))
+    return ''
+
+
+def extract_clarification_from_event_data(data: typing.Any) -> str:
+    """从事件中提取澄清问题"""
+    msg_obj = extract_event_message_obj(data)
+    if not msg_obj:
+        return ''
+    if is_clarification_tool_message(msg_obj):
+        return extract_text(msg_obj.get('content'))
+    return ''
+
+
+def _iter_custom_event_items(data: typing.Any) -> list[dict[str, typing.Any]]:
+    items: list[dict[str, typing.Any]] = []
+    if isinstance(data, dict):
+        return [data]
+    if isinstance(data, list):
+        for item in data:
+            if isinstance(item, dict):
+                items.append(item)
+            elif isinstance(item, (list, tuple)):
+                for nested in item:
+                    if isinstance(nested, dict):
+                        items.append(nested)
+    return items
+
+
+def extract_task_failures_from_custom_event(data: typing.Any) -> list[str]:
+    """从 custom 事件中提取子任务失败信息"""
+    failures: list[str] = []
+    for item in _iter_custom_event_items(data):
+        event_type = str(item.get('type', '')).lower()
+        if event_type not in {'task_failed', 'task_timed_out'}:
+            continue
+
+        task_id = str(item.get('task_id', '')).strip()
+        error_text = extract_text(item.get('error')).strip()
+        if task_id and error_text:
+            failures.append(f'{task_id}: {error_text}')
+        elif error_text:
+            failures.append(error_text)
+        elif task_id:
+            failures.append(f'{task_id}: unknown error')
+        else:
+            failures.append('unknown task failure')
+    return failures
+
+
+def build_task_failure_summary(failures: list[str]) -> str:
+    """构建任务失败摘要"""
+    if not failures:
+        return ''
+    deduped: list[str] = []
+    seen: set[str] = set()
+    for failure in failures:
+        if failure not in seen:
+            seen.add(failure)
+            deduped.append(failure)
+    if len(deduped) == 1:
+        return f'DeerFlow subtask failed: {deduped[0]}'
+    joined = '\n'.join([f'- {item}' for item in deduped[:5]])
+    return f'DeerFlow subtasks failed:\n{joined}'
--- a/src/langbot/libs/weknora_api/init.py
+++ b/src/langbot/libs/weknora_api/init.py
@@ -0,0 +1,4 @@
+from .client import AsyncWeKnoraClient
+from .errors import WeKnoraAPIError
+
+__all__ = ['AsyncWeKnoraClient', 'WeKnoraAPIError']
--- a/src/langbot/libs/weknora_api/client.py
+++ b/src/langbot/libs/weknora_api/client.py
@@ -0,0 +1,180 @@
+from __future__ import annotations
+
+import httpx
+import typing
+import json
+
+from .errors import WeKnoraAPIError
+
+
+class AsyncWeKnoraClient:
+    """WeKnora API 客户端"""
+
+    api_key: str
+    base_url: str
+
+    def __init__(
+        self,
+        api_key: str,
+        base_url: str = 'http://localhost:80/api/v1',
+    ) -> None:
+        self.api_key = api_key
+        self.base_url = base_url
+
+    async def create_session(
+        self,
+        title: str = '',
+        description: str = '',
+        timeout: float = 30.0,
+    ) -> str:
+        """创建会话，返回 session_id"""
+        async with httpx.AsyncClient(
+            base_url=self.base_url,
+            trust_env=True,
+            timeout=timeout,
+        ) as client:
+            payload: dict[str, typing.Any] = {}
+            if title:
+                payload['title'] = title
+            if description:
+                payload['description'] = description
+
+            response = await client.post(
+                '/sessions',
+                headers={
+                    'X-API-Key': self.api_key,
+                    'Content-Type': 'application/json',
+                },
+                json=payload,
+            )
+
+            if response.status_code not in (200, 201):
+                raise WeKnoraAPIError(f'{response.status_code} {response.text}')
+
+            data = response.json()
+            return data['data']['id']
+
+    async def agent_chat(
+        self,
+        session_id: str,
+        query: str,
+        user: str,
+        agent_id: str = '',
+        knowledge_base_ids: list[str] | None = None,
+        web_search_enabled: bool = False,
+        timeout: float = 120.0,
+    ) -> typing.AsyncGenerator[dict[str, typing.Any], None]:
+        """
+        Agent 智能对话（SSE 流式）
+
+        响应事件类型:
+        - agent_query: Agent 开始处理
+        - thinking: 思考过程
+        - tool_call: 工具调用
+        - tool_result: 工具结果
+        - references: 知识库引用
+        - answer: 回答内容
+        - reflection: 反思
+        - session_title: 会话标题
+        - error: 错误
+        """
+        if knowledge_base_ids is None:
+            knowledge_base_ids = []
+
+        async with httpx.AsyncClient(
+            base_url=self.base_url,
+            trust_env=True,
+            timeout=timeout,
+        ) as client:
+            payload: dict[str, typing.Any] = {
+                'query': query,
+                'agent_enabled': True,
+                'channel': 'im',
+            }
+            if agent_id:
+                payload['agent_id'] = agent_id
+            if knowledge_base_ids:
+                payload['knowledge_base_ids'] = knowledge_base_ids
+            if web_search_enabled:
+                payload['web_search_enabled'] = True
+
+            async with client.stream(
+                'POST',
+                f'/agent-chat/{session_id}',
+                headers={
+                    'X-API-Key': self.api_key,
+                    'Content-Type': 'application/json',
+                },
+                json=payload,
+            ) as r:
+                async for chunk in r.aiter_lines():
+                    if r.status_code != 200:
+                        raise WeKnoraAPIError(f'{r.status_code} {chunk}')
+                    if chunk.strip() == '':
+                        continue
+                    if chunk.startswith('data:'):
+                        try:
+                            data = json.loads(chunk[5:].strip())
+                        except json.JSONDecodeError:
+                            continue
+                        yield data
+                        # 收到 error 事件后主动结束流，避免上层未 raise 时持续等待
+                        if data.get('response_type') == 'error':
+                            return
+
+    async def knowledge_chat(
+        self,
+        session_id: str,
+        query: str,
+        user: str,
+        agent_id: str = 'builtin-quick-answer',
+        knowledge_base_ids: list[str] | None = None,
+        timeout: float = 120.0,
+    ) -> typing.AsyncGenerator[dict[str, typing.Any], None]:
+        """
+        知识库 RAG 问答（SSE 流式）
+
+        响应事件类型:
+        - references: 知识库引用
+        - answer: 回答内容
+        """
+        if knowledge_base_ids is None:
+            knowledge_base_ids = []
+
+        async with httpx.AsyncClient(
+            base_url=self.base_url,
+            trust_env=True,
+            timeout=timeout,
+        ) as client:
+            payload: dict[str, typing.Any] = {
+                'query': query,
+                'channel': 'im',
+            }
+            if agent_id:
+                payload['agent_id'] = agent_id
+            if knowledge_base_ids:
+                payload['knowledge_base_ids'] = knowledge_base_ids
+
+            async with client.stream(
+                'POST',
+                f'/knowledge-chat/{session_id}',
+                headers={
+                    'X-API-Key': self.api_key,
+                    'Content-Type': 'application/json',
+                },
+                json=payload,
+            ) as r:
+                async for chunk in r.aiter_lines():
+                    if r.status_code != 200:
+                        raise WeKnoraAPIError(f'{r.status_code} {chunk}')
+                    if chunk.strip() == '':
+                        continue
+                    if chunk.startswith('data:'):
+                        try:
+                            data = json.loads(chunk[5:].strip())
+                        except json.JSONDecodeError:
+                            continue
+                        yield data
+                        # 收到 error 事件后主动结束流，避免上层未 raise 时持续等待
+                        if data.get('response_type') == 'error':
+                            return
--- a/src/langbot/libs/weknora_api/errors.py
+++ b/src/langbot/libs/weknora_api/errors.py
@@ -0,0 +1,6 @@
+class WeKnoraAPIError(Exception):
+    """WeKnora API 请求失败"""
+
+    def __init__(self, message: str = ''):
+        self.message = message
+        super().__init__(self.message)
--- a/src/langbot/pkg/api/http/controller/groups/monitoring.py
+++ b/src/langbot/pkg/api/http/controller/groups/monitoring.py
@@ -46,30 +46,6 @@ class MonitoringRouterGroup(group.RouterGroup):

            return self.success(data=metrics)

-        @self.route('/token-statistics', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
-        async def get_token_statistics() -> str:
-            """Get detailed token usage statistics (summary, per-model, timeseries)."""
-            bot_ids = quart.request.args.getlist('botId')
-            pipeline_ids = quart.request.args.getlist('pipelineId')
-            start_time_str = quart.request.args.get('startTime')
-            end_time_str = quart.request.args.get('endTime')
-            bucket = quart.request.args.get('bucket', 'hour')
-            if bucket not in ('hour', 'day'):
-                bucket = 'hour'
-
-            start_time = parse_iso_datetime(start_time_str)
-            end_time = parse_iso_datetime(end_time_str)
-
-            stats = await self.ap.monitoring_service.get_token_statistics(
-                bot_ids=bot_ids if bot_ids else None,
-                pipeline_ids=pipeline_ids if pipeline_ids else None,
-                start_time=start_time,
-                end_time=end_time,
-                bucket=bucket,
-            )
-
-            return self.success(data=stats)
-
        @self.route('/messages', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
        async def get_messages() -> str:
            """Get message logs"""
--- a/src/langbot/pkg/api/http/service/monitoring.py
+++ b/src/langbot/pkg/api/http/service/monitoring.py
@@ -472,179 +472,6 @@ class MonitoringService:
            'active_sessions': active_sessions,
        }

-    async def get_token_statistics(
-        self,
-        bot_ids: list[str] | None = None,
-        pipeline_ids: list[str] | None = None,
-        start_time: datetime.datetime | None = None,
-        end_time: datetime.datetime | None = None,
-        bucket: str = 'hour',
-    ) -> dict:
-        """Get detailed token usage statistics for production observability.
-
-        Returns:
-        - summary: aggregate token counters and call/latency stats over the window
-        - by_model: per-model token + call breakdown (sorted by total tokens desc)
-        - timeseries: token usage bucketed by `bucket` ('hour' or 'day')
-
-        Only successful LLM calls are counted toward token totals; error calls are
-        reported separately so a spike in failures is visible without polluting
-        token accounting.
-        """
-        LLMCall = persistence_monitoring.MonitoringLLMCall
-
-        conditions = []
-        if bot_ids:
-            conditions.append(LLMCall.bot_id.in_(bot_ids))
-        if pipeline_ids:
-            conditions.append(LLMCall.pipeline_id.in_(pipeline_ids))
-        if start_time:
-            conditions.append(LLMCall.timestamp >= start_time)
-        if end_time:
-            conditions.append(LLMCall.timestamp <= end_time)
-
-        def _apply(query):
-            if conditions:
-                query = query.where(sqlalchemy.and_(*conditions))
-            return query
-
-        # ---- Summary aggregates ----
-        summary_query = _apply(
-            sqlalchemy.select(
-                sqlalchemy.func.count(LLMCall.id),
-                sqlalchemy.func.coalesce(sqlalchemy.func.sum(LLMCall.input_tokens), 0),
-                sqlalchemy.func.coalesce(sqlalchemy.func.sum(LLMCall.output_tokens), 0),
-                sqlalchemy.func.coalesce(sqlalchemy.func.sum(LLMCall.total_tokens), 0),
-                sqlalchemy.func.coalesce(sqlalchemy.func.sum(LLMCall.duration), 0),
-                sqlalchemy.func.coalesce(sqlalchemy.func.sum(LLMCall.cost), 0.0),
-                sqlalchemy.func.sum(sqlalchemy.case((LLMCall.status == 'success', 1), else_=0)),
-                sqlalchemy.func.sum(sqlalchemy.case((LLMCall.status == 'error', 1), else_=0)),
-                # Count of successful calls that nonetheless recorded zero tokens —
-                # a data-quality signal that usage reporting may be broken upstream.
-                sqlalchemy.func.sum(
-                    sqlalchemy.case(
-                        (sqlalchemy.and_(LLMCall.status == 'success', LLMCall.total_tokens == 0), 1),
-                        else_=0,
-                    )
-                ),
-            )
-        )
-        summary_result = await self.ap.persistence_mgr.execute_async(summary_query)
-        row = summary_result.first()
-        (
-            total_calls,
-            total_input_tokens,
-            total_output_tokens,
-            total_tokens,
-            total_duration,
-            total_cost,
-            success_calls,
-            error_calls,
-            zero_token_success_calls,
-        ) = row if row else (0, 0, 0, 0, 0, 0.0, 0, 0, 0)
-
-        total_calls = total_calls or 0
-        success_calls = success_calls or 0
-        error_calls = error_calls or 0
-        zero_token_success_calls = zero_token_success_calls or 0
-
-        summary = {
-            'total_calls': total_calls,
-            'success_calls': success_calls,
-            'error_calls': error_calls,
-            'total_input_tokens': int(total_input_tokens or 0),
-            'total_output_tokens': int(total_output_tokens or 0),
-            'total_tokens': int(total_tokens or 0),
-            'total_cost': round(float(total_cost or 0.0), 6),
-            'avg_tokens_per_call': int((total_tokens or 0) / total_calls) if total_calls > 0 else 0,
-            'avg_duration_ms': int((total_duration or 0) / total_calls) if total_calls > 0 else 0,
-            'avg_tokens_per_second': round((total_output_tokens or 0) / (total_duration / 1000), 2)
-            if total_duration and total_duration > 0
-            else 0,
-            'zero_token_success_calls': zero_token_success_calls,
-        }
-
-        # ---- Per-model breakdown ----
-        by_model_query = _apply(
-            sqlalchemy.select(
-                LLMCall.model_name,
-                sqlalchemy.func.count(LLMCall.id),
-                sqlalchemy.func.coalesce(sqlalchemy.func.sum(LLMCall.input_tokens), 0),
-                sqlalchemy.func.coalesce(sqlalchemy.func.sum(LLMCall.output_tokens), 0),
-                sqlalchemy.func.coalesce(sqlalchemy.func.sum(LLMCall.total_tokens), 0),
-                sqlalchemy.func.coalesce(sqlalchemy.func.sum(LLMCall.duration), 0),
-                sqlalchemy.func.coalesce(sqlalchemy.func.sum(LLMCall.cost), 0.0),
-                sqlalchemy.func.sum(sqlalchemy.case((LLMCall.status == 'error', 1), else_=0)),
-            ).group_by(LLMCall.model_name)
-        )
-        by_model_result = await self.ap.persistence_mgr.execute_async(by_model_query)
-        by_model = []
-        for mrow in by_model_result.all():
-            (
-                model_name,
-                m_calls,
-                m_in,
-                m_out,
-                m_total,
-                m_duration,
-                m_cost,
-                m_errors,
-            ) = mrow
-            m_calls = m_calls or 0
-            by_model.append(
-                {
-                    'model_name': model_name,
-                    'calls': m_calls,
-                    'error_calls': m_errors or 0,
-                    'input_tokens': int(m_in or 0),
-                    'output_tokens': int(m_out or 0),
-                    'total_tokens': int(m_total or 0),
-                    'cost': round(float(m_cost or 0.0), 6),
-                    'avg_tokens_per_call': int((m_total or 0) / m_calls) if m_calls > 0 else 0,
-                    'avg_duration_ms': int((m_duration or 0) / m_calls) if m_calls > 0 else 0,
-                }
-            )
-        by_model.sort(key=lambda x: x['total_tokens'], reverse=True)
-
-        # ---- Time-bucketed series ----
-        # Use a DB-agnostic bucketing approach: fetch (timestamp, tokens) rows and
-        # aggregate in Python. The window is bounded by the time filter, so this is
-        # cheap for typical dashboard ranges (hours/days).
-        series_query = _apply(
-            sqlalchemy.select(
-                LLMCall.timestamp,
-                LLMCall.input_tokens,
-                LLMCall.output_tokens,
-                LLMCall.total_tokens,
-            ).order_by(LLMCall.timestamp.asc())
-        )
-        series_result = await self.ap.persistence_mgr.execute_async(series_query)
-
-        bucket_fmt = '%Y-%m-%d %H:00' if bucket == 'hour' else '%Y-%m-%d'
-        buckets: dict[str, dict] = {}
-        for srow in series_result.all():
-            ts, s_in, s_out, s_total = srow
-            if ts is None:
-                continue
-            key = ts.strftime(bucket_fmt)
-            b = buckets.setdefault(
-                key,
-                {'bucket': key, 'input_tokens': 0, 'output_tokens': 0, 'total_tokens': 0, 'calls': 0},
-            )
-            b['input_tokens'] += int(s_in or 0)
-            b['output_tokens'] += int(s_out or 0)
-            b['total_tokens'] += int(s_total or 0)
-            b['calls'] += 1
-
-        timeseries = [buckets[k] for k in sorted(buckets.keys())]
-
-        return {
-            'summary': summary,
-            'by_model': by_model,
-            'timeseries': timeseries,
-            'bucket': bucket,
-        }
-
    async def get_messages(
        self,
        bot_ids: list[str] | None = None,
--- a/src/langbot/pkg/box/connector.py
+++ b/src/langbot/pkg/box/connector.py
@@ -120,13 +120,19 @@ class BoxRuntimeConnector(ManagedRuntimeConnector):
        self._relay_port = parsed.port or _DEFAULT_PORT
        self._filtered_box_config = _filter_config_for_runtime(_get_box_config(ap))

-    def _uses_websocket(self) -> bool:
+    def uses_websocket(self) -> bool:
        """Whether the connector should use WebSocket to reach the Box runtime.

        True when:
          - Running inside Docker (Box runtime is a separate container)
          - The ``--standalone-box`` CLI flag was passed
          - An explicit ``runtime.endpoint`` was configured
+
+        When this is True the Box runtime lives in a separate process with its
+        own filesystem view (container, pod sidecar, or remote host), so paths
+        it reports (e.g. skill ``package_root``) are NOT resolvable on the
+        LangBot side. When False, Box runs as a stdio child process that shares
+        LangBot's filesystem.
        """
        return bool(
            self.configured_runtime_endpoint
@@ -134,6 +140,10 @@ class BoxRuntimeConnector(ManagedRuntimeConnector):
            or platform.use_websocket_to_connect_box_runtime()
        )

+    # Backwards-compatible private alias.
+    def _uses_websocket(self) -> bool:
+        return self.uses_websocket()
+
    async def initialize(self) -> None:
        if self._uses_websocket():
            if platform.get_platform() == 'win32' and not self.configured_runtime_endpoint:
--- a/src/langbot/pkg/box/service.py
+++ b/src/langbot/pkg/box/service.py
@@ -67,6 +67,10 @@ class BoxService:
        self._available = False
        self._connector_error: str = ''
        self._reconnecting = False
+        # Optional explicit override for shares_filesystem_with_box. None means
+        # "derive from the connector transport". Set by tests / embedders that
+        # know the real LangBot<->Box filesystem topology.
+        self._shares_filesystem_with_box_override: bool | None = None

    @property
    def enabled(self) -> bool:
@@ -148,6 +152,32 @@ class BoxService:
    def available(self) -> bool:
        return self._available

+    @property
+    def shares_filesystem_with_box(self) -> bool:
+        """Whether LangBot and the Box runtime share a filesystem view.
+
+        This is True only when Box runs as a local stdio child process of
+        LangBot (same container/host). In that case paths the Box runtime
+        reports — notably skill ``package_root`` — resolve identically on the
+        LangBot side, so LangBot may validate them against its own filesystem.
+
+        It is False for every separated deployment (Docker Compose, k8s
+        sidecar, ``--standalone-box``, or an explicit ``runtime.endpoint``),
+        where the Box runtime owns its own filesystem and LangBot must trust
+        the paths it reports rather than checking them locally.
+
+        When Box is wired up with an injected client (tests, custom embeds)
+        there is no connector to introspect; we conservatively report False so
+        LangBot never wrongly drops Box-reported skills. An explicit override
+        can be set via ``_shares_filesystem_with_box`` (used by tests and any
+        embedder that knows the real topology).
+        """
+        if self._shares_filesystem_with_box_override is not None:
+            return self._shares_filesystem_with_box_override
+        if self._runtime_connector is None:
+            return False
+        return not self._runtime_connector.uses_websocket()
+
    async def execute_spec_payload(
        self,
        spec_payload: dict,
@@ -220,14 +250,24 @@ class BoxService:
        all skill packages mounted, regardless of which skill is currently
        activated.

-        Skills whose ``package_root`` is missing or no longer a directory on
-        the LangBot-visible filesystem are skipped with a warning instead of
-        being passed through to the backend. Without this guard the three
-        backends behave inconsistently on a stale mount: nsjail refuses to
-        start the sandbox (failing every exec in the session), Docker
-        silently auto-creates a root-owned empty directory on the host, and
-        E2B silently skips the upload — none of which surfaces an
-        actionable error to the agent or operator.
+        Path validation is filesystem-topology dependent. When LangBot and the
+        Box runtime share a filesystem (local stdio mode), a skill whose
+        ``package_root`` is missing or no longer a directory is skipped with a
+        warning instead of being passed through to the backend. Without that
+        guard the three backends behave inconsistently on a stale mount: nsjail
+        refuses to start the sandbox (failing every exec in the session),
+        Docker silently auto-creates a root-owned empty directory on the host,
+        and E2B silently skips the upload — none of which surfaces an
+        actionable error.
+
+        When Box runs as a separate process (Docker Compose, k8s sidecar,
+        ``--standalone-box``, or a remote ``runtime.endpoint``), the
+        ``package_root`` reported by ``list_skills`` is the Box runtime's own
+        filesystem path and is NOT resolvable on the LangBot side. Validating
+        it locally would wrongly drop every skill, so LangBot trusts the path
+        and lets the Box runtime resolve it. The Box runtime only ever reports
+        skills it discovered on its own filesystem, so the path is valid there
+        by construction.
        """
        skill_mgr = getattr(self.ap, 'skill_mgr', None)
        if skill_mgr is None:
@@ -235,13 +275,15 @@ class BoxService:

        from ..provider.tools.loaders import skill as skill_loader

+        validate_locally = self.shares_filesystem_with_box
+
        visible_skills = skill_loader.get_visible_skills(self.ap, query)
        mounts: list[dict] = []
        for skill_name, skill_data in visible_skills.items():
            package_root = str(skill_data.get('package_root', '') or '').strip()
            if not package_root:
                continue
-            if not os.path.isdir(package_root):
+            if validate_locally and not os.path.isdir(package_root):
                self.ap.logger.warning(
                    f'Skill "{skill_name}" package_root missing on filesystem '
                    f'({package_root}); skipping mount to prevent sandbox failures. '
--- a/src/langbot/pkg/core/bootutils/deps.py
+++ b/src/langbot/pkg/core/bootutils/deps.py
@@ -42,7 +42,6 @@ required_deps = {
    'telegramify_markdown': 'telegramify-markdown',
    'slack_sdk': 'slack_sdk',
    'asyncpg': 'asyncpg',
-    'litellm': 'litellm',
 }


--- a/src/langbot/pkg/core/migrations/m042_weknora_api.py
+++ b/src/langbot/pkg/core/migrations/m042_weknora_api.py
@@ -0,0 +1,27 @@
+from __future__ import annotations
+
+from .. import migration
+
+
+@migration.migration_class('weknora-api-config', 42)
+class WeKnoraAPICfgMigration(migration.Migration):
+    """WeKnora API 配置迁移"""
+
+    async def need_migrate(self) -> bool:
+        """判断当前环境是否需要运行此迁移"""
+        return 'weknora-api' not in self.ap.provider_cfg.data
+
+    async def run(self):
+        """执行迁移"""
+        self.ap.provider_cfg.data['weknora-api'] = {
+            'base-url': 'http://localhost:8080/api/v1',
+            'app-type': 'agent',
+            'api-key': '',
+            'agent-id': 'builtin-smart-reasoning',
+            'knowledge-base-ids': [],
+            'web-search-enabled': False,
+            'timeout': 120,
+            'base-prompt': '请回答用户的问题。',
+        }
+
+        await self.ap.provider_cfg.dump_config()
--- a/src/langbot/pkg/core/migrations/m043_deerflow_api.py
+++ b/src/langbot/pkg/core/migrations/m043_deerflow_api.py
@@ -0,0 +1,30 @@
+from __future__ import annotations
+
+from .. import migration
+
+
+@migration.migration_class('deerflow-api-config', 43)
+class DeerFlowAPICfgMigration(migration.Migration):
+    """DeerFlow API 配置迁移"""
+
+    async def need_migrate(self) -> bool:
+        """判断当前环境是否需要运行此迁移"""
+        return 'deerflow-api' not in self.ap.provider_cfg.data
+
+    async def run(self):
+        """执行迁移"""
+        self.ap.provider_cfg.data['deerflow-api'] = {
+            'api-base': 'http://127.0.0.1:2026',
+            'api-key': '',
+            'auth-header': '',
+            'assistant-id': 'lead_agent',
+            'model-name': '',
+            'thinking-enabled': False,
+            'plan-mode': False,
+            'subagent-enabled': False,
+            'max-concurrent-subagents': 3,
+            'timeout': 300,
+            'recursion-limit': 1000,
+        }
+
+        await self.ap.provider_cfg.dump_config()
--- a/src/langbot/pkg/entity/persistence/mcp.py
+++ b/src/langbot/pkg/entity/persistence/mcp.py
@@ -11,6 +11,10 @@ class MCPServer(Base):
    enable = sqlalchemy.Column(sqlalchemy.Boolean, nullable=False, default=False)
    mode = sqlalchemy.Column(sqlalchemy.String(255), nullable=False)  # stdio, sse, http
    extra_args = sqlalchemy.Column(sqlalchemy.JSON, nullable=False, default={})
+    # Markdown documentation captured from LangBot Space at install time so the
+    # detail page can show docs even when the server is offline / has no tools.
+    # Empty string for manually-created servers that have no marketplace README.
+    readme = sqlalchemy.Column(sqlalchemy.Text, nullable=False, server_default='', default='')
    created_at = sqlalchemy.Column(sqlalchemy.DateTime, nullable=False, server_default=sqlalchemy.func.now())
    updated_at = sqlalchemy.Column(
        sqlalchemy.DateTime,
--- a/src/langbot/pkg/entity/persistence/model.py
+++ b/src/langbot/pkg/entity/persistence/model.py
@@ -31,7 +31,6 @@ class LLMModel(Base):
    name = sqlalchemy.Column(sqlalchemy.String(255), nullable=False)
    provider_uuid = sqlalchemy.Column(sqlalchemy.String(255), nullable=False)
    abilities = sqlalchemy.Column(sqlalchemy.JSON, nullable=False, default=[])
-    context_length = sqlalchemy.Column(sqlalchemy.Integer, nullable=True)
    extra_args = sqlalchemy.Column(sqlalchemy.JSON, nullable=False, default={})
    prefered_ranking = sqlalchemy.Column(sqlalchemy.Integer, nullable=False, default=0)
    created_at = sqlalchemy.Column(sqlalchemy.DateTime, nullable=False, server_default=sqlalchemy.func.now())
--- a/src/langbot/pkg/persistence/alembic/versions/0004_add_llm_model_context_length.py
+++ b/src/langbot/pkg/persistence/alembic/versions/0004_add_llm_model_context_length.py
@@ -1,30 +0,0 @@
-"""add llm model context length
-
-Revision ID: 0004_add_llm_model_context_length
-Revises: 0003_add_rerank_models
-Create Date: 2026-06-07
-"""
-
-import sqlalchemy as sa
-from alembic import op
-
-revision = '0004_add_llm_model_context_length'
-down_revision = '0003_add_rerank_models'
-branch_labels = None
-depends_on = None
-
-
-def upgrade() -> None:
-    conn = op.get_bind()
-    inspector = sa.inspect(conn)
-    columns = {column['name'] for column in inspector.get_columns('llm_models')}
-    if 'context_length' not in columns:
-        op.add_column('llm_models', sa.Column('context_length', sa.Integer(), nullable=True))
-
-
-def downgrade() -> None:
-    conn = op.get_bind()
-    inspector = sa.inspect(conn)
-    columns = {column['name'] for column in inspector.get_columns('llm_models')}
-    if 'context_length' in columns:
-        op.drop_column('llm_models', 'context_length')
--- a/src/langbot/pkg/persistence/alembic/versions/0004_add_mcp_readme.py
+++ b/src/langbot/pkg/persistence/alembic/versions/0004_add_mcp_readme.py
@@ -0,0 +1,34 @@
+"""add readme column to mcp_servers
+
+Revision ID: 0004_add_mcp_readme
+Revises: 0003_add_rerank_models
+Create Date: 2026-06-06
+"""
+
+import sqlalchemy as sa
+from alembic import op
+
+revision = '0004_add_mcp_readme'
+down_revision = '0003_add_rerank_models'
+branch_labels = None
+depends_on = None
+
+
+def upgrade() -> None:
+    # Add ``readme`` to mcp_servers if the table exists and the column is missing
+    # (the table may have been created by create_all() with the column already
+    # present on fresh installs, so guard against duplicate-add).
+    conn = op.get_bind()
+    inspector = sa.inspect(conn)
+    if 'mcp_servers' not in inspector.get_table_names():
+        return
+    columns = {col['name'] for col in inspector.get_columns('mcp_servers')}
+    if 'readme' not in columns:
+        op.add_column(
+            'mcp_servers',
+            sa.Column('readme', sa.Text(), nullable=False, server_default=''),
+        )
+
+
+def downgrade() -> None:
+    op.drop_column('mcp_servers', 'readme')
--- a/src/langbot/pkg/persistence/migrations/dbm026_llm_model_context_length.py
+++ b/src/langbot/pkg/persistence/migrations/dbm026_llm_model_context_length.py
@@ -1,42 +0,0 @@
-import sqlalchemy
-from .. import migration
-
-
-@migration.migration_class(26)
-class DBMigrateLLMModelContextLength(migration.DBMigration):
-    """Add context_length column to LLM models"""
-
-    async def upgrade(self):
-        columns = await self._get_columns('llm_models')
-        if 'context_length' not in columns:
-            await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.text('ALTER TABLE llm_models ADD COLUMN context_length INTEGER')
-            )
-
-    async def downgrade(self):
-        columns = await self._get_columns('llm_models')
-        if 'context_length' not in columns:
-            return
-
-        if self.ap.persistence_mgr.db.name == 'postgresql':
-            await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.text('ALTER TABLE llm_models DROP COLUMN IF EXISTS context_length')
-            )
-        else:
-            await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.text('ALTER TABLE llm_models DROP COLUMN context_length')
-            )
-
-    async def _get_columns(self, table_name: str) -> set[str]:
-        if self.ap.persistence_mgr.db.name == 'postgresql':
-            result = await self.ap.persistence_mgr.execute_async(
-                sqlalchemy.text("""
-                    SELECT column_name FROM information_schema.columns
-                    WHERE table_name = :table_name
-                """),
-                {'table_name': table_name},
-            )
-            return {row[0] for row in result.fetchall()}
-
-        result = await self.ap.persistence_mgr.execute_async(sqlalchemy.text(f'PRAGMA table_info({table_name})'))
-        return {row[1] for row in result.fetchall()}
--- a/src/langbot/pkg/pipeline/preproc/preproc.py
+++ b/src/langbot/pkg/pipeline/preproc/preproc.py
@@ -109,7 +109,7 @@ class PreProcessor(stage.PipelineStage):
            if llm_model:
                query.use_llm_model_uuid = llm_model.model_entity.uuid

-                if 'func_call' in (llm_model.model_entity.abilities or []):
+                if llm_model.model_entity.abilities.__contains__('func_call'):
                    # Get bound plugins and MCP servers for filtering tools
                    bound_plugins = query.variables.get('_pipeline_bound_plugins', None)
                    bound_mcp_servers = query.variables.get('_pipeline_bound_mcp_servers', None)
@@ -159,7 +159,11 @@ class PreProcessor(stage.PipelineStage):

        # Check if this model supports vision, if not, remove all images
        # TODO this checking should be performed in runner, and in this stage, the image should be reserved
-        if selected_runner == 'local-agent' and llm_model and 'vision' not in (llm_model.model_entity.abilities or []):
+        if (
+            selected_runner == 'local-agent'
+            and llm_model
+            and not llm_model.model_entity.abilities.__contains__('vision')
+        ):
            for msg in query.messages:
                if isinstance(msg.content, list):
                    for me in msg.content:
@@ -177,7 +181,7 @@ class PreProcessor(stage.PipelineStage):
                plain_text += me.text
            elif isinstance(me, platform_message.Image):
                if selected_runner != 'local-agent' or (
-                    llm_model and 'vision' in (llm_model.model_entity.abilities or [])
+                    llm_model and llm_model.model_entity.abilities.__contains__('vision')
                ):
                    if me.base64 is not None:
                        content_list.append(provider_message.ContentElement.from_image_base64(me.base64))
@@ -198,7 +202,7 @@ class PreProcessor(stage.PipelineStage):
                        content_list.append(provider_message.ContentElement.from_text(msg.text))
                    elif isinstance(msg, platform_message.Image):
                        if selected_runner != 'local-agent' or (
-                            llm_model and 'vision' in (llm_model.model_entity.abilities or [])
+                            llm_model and llm_model.model_entity.abilities.__contains__('vision')
                        ):
                            if msg.base64 is not None:
                                content_list.append(provider_message.ContentElement.from_image_base64(msg.base64))
--- a/src/langbot/pkg/plugin/connector.py
+++ b/src/langbot/pkg/plugin/connector.py
@@ -248,6 +248,9 @@ class PluginRuntimeConnector(ManagedRuntimeConnector):

        mode = mcp_data.get('mode') or 'stdio'
        extra_args = mcp_data.get('extra_args') or {}
+        # Marketplace records carry the rendered README markdown; persist it so
+        # the detail page Docs tab works offline and without a marketplace round-trip.
+        readme = mcp_data.get('readme') or ''
        # Use __ instead of / to avoid URL routing issues with slashes
        name = f'{mcp_data.get("author", "")}__{mcp_data.get("name", "")}'

@@ -267,6 +270,7 @@ class PluginRuntimeConnector(ManagedRuntimeConnector):
            'enable': True,
            'mode': mode,
            'extra_args': extra_args,
+            'readme': readme,
        }

        await self.ap.persistence_mgr.execute_async(sqlalchemy.insert(persistence_mcp.MCPServer).values(server_data))
--- a/src/langbot/pkg/provider/modelmgr/modelmgr.py
+++ b/src/langbot/pkg/provider/modelmgr/modelmgr.py
@@ -37,41 +37,11 @@ class ModelManager:
        self.requester_components = []
        self.requester_dict = {}

-    @staticmethod
-    def _get_litellm_provider_from_manifest(component: engine.Component | None) -> str | None:
-        if component is None:
-            return None
-
-        spec = getattr(component, 'spec', None) or {}
-        litellm_provider = None
-
-        if isinstance(spec, dict):
-            litellm_provider = spec.get('litellm_provider')
-        else:
-            getter = getattr(spec, 'get', None)
-            if callable(getter):
-                try:
-                    litellm_provider = getter('litellm_provider')
-                except Exception:
-                    litellm_provider = None
-
-        if isinstance(litellm_provider, str) and litellm_provider:
-            return litellm_provider
-        return None
-
    async def initialize(self):
        self.requester_components = self.ap.discover.get_components_by_kind('LLMAPIRequester')

        requester_dict: dict[str, type[requester.ProviderAPIRequester]] = {}
        for component in self.requester_components:
-            # Skip components that use litellm_provider (they will use litellmchat.py instead)
-            litellm_provider = self._get_litellm_provider_from_manifest(component)
-            if litellm_provider:
-                self.ap.logger.debug(
-                    f'Skipping Python class loading for {component.metadata.name} '
-                    f'(uses litellm_provider={litellm_provider})'
-                )
-                continue
            requester_dict[component.metadata.name] = component.get_python_component_class()

        self.requester_dict = requester_dict
@@ -266,7 +236,6 @@ class ModelManager:
                name=model_info.get('name', ''),
                provider_uuid='',
                abilities=model_info.get('abilities', []),
-                context_length=model_info.get('context_length'),
                extra_args=model_info.get('extra_args', {}),
            ),
            provider=runtime_provider,
@@ -325,37 +294,13 @@ class ModelManager:
        else:
            provider_entity = provider_info

-        # Get requester manifest to check for litellm_provider
-        requester_manifest = self.get_available_requester_manifest_by_name(provider_entity.requester)
-        litellm_provider = self._get_litellm_provider_from_manifest(requester_manifest)
-
-        # Build config from base_url
-        config = {'base_url': provider_entity.base_url}
-
-        # Check if requester manifest specifies litellm_provider
-        if litellm_provider:
-            from .requesters import litellmchat
-
-            # Use unified LiteLLMRequester with provider prefix
-            # Map litellm_provider (YAML spec) to custom_llm_provider (config)
-            config['custom_llm_provider'] = litellm_provider
-            requester_inst = litellmchat.LiteLLMRequester(
-                ap=self.ap,
-                config=config,
-            )
-            self.ap.logger.debug(
-                f'Using LiteLLMRequester for {provider_entity.requester} '
-                f'with custom_llm_provider={config["custom_llm_provider"]}'
-            )
-        else:
-            # Use original requester class (for backward compatibility)
-            if provider_entity.requester not in self.requester_dict:
-                raise provider_errors.RequesterNotFoundError(provider_entity.requester)
-            requester_inst = self.requester_dict[provider_entity.requester](
-                ap=self.ap,
-                config=config,
-            )
+        if provider_entity.requester not in self.requester_dict:
+            raise provider_errors.RequesterNotFoundError(provider_entity.requester)

+        requester_inst = self.requester_dict[provider_entity.requester](
+            ap=self.ap,
+            config={'base_url': provider_entity.base_url},
+        )
        await requester_inst.initialize()

        token_mgr = token.TokenManager(name=provider_entity.uuid, tokens=provider_entity.api_keys or [])
@@ -461,7 +406,6 @@ class ModelManager:
            name=model_info.get('name', ''),
            provider_uuid=model_info.get('provider_uuid', ''),
            abilities=model_info.get('abilities', []),
-            context_length=model_info.get('context_length'),
            extra_args=model_info.get('extra_args', {}),
        )

--- a/src/langbot/pkg/provider/modelmgr/requester.py
+++ b/src/langbot/pkg/provider/modelmgr/requester.py
@@ -67,8 +67,8 @@ class RuntimeProvider:
            if isinstance(result, tuple):
                msg, usage_info = result
                if usage_info:
-                    input_tokens = usage_info.get('prompt_tokens', 0)
-                    output_tokens = usage_info.get('completion_tokens', 0)
+                    input_tokens = usage_info.get('input_tokens', 0)
+                    output_tokens = usage_info.get('output_tokens', 0)
                return msg
            else:
                return result
@@ -128,6 +128,7 @@ class RuntimeProvider:
        start_time = time.time()
        status = 'success'
        error_message = None
+        # Note: Stream doesn't easily provide token counts, set to 0
        input_tokens = 0
        output_tokens = 0

@@ -142,15 +143,6 @@ class RuntimeProvider:
                remove_think=remove_think,
            ):
                yield chunk
-            # Extract usage from stream if available (stored by LiteLLM requester)
-            if query:
-                if query.variables is None:
-                    query.variables = {}
-                if '_stream_usage' in query.variables:
-                    usage_info = query.variables['_stream_usage']
-                    input_tokens = usage_info.get('prompt_tokens', 0)
-                    output_tokens = usage_info.get('completion_tokens', 0)
-                    del query.variables['_stream_usage']
        except Exception as e:
            status = 'error'
            error_message = str(e)
--- a/src/langbot/pkg/provider/modelmgr/requesters/302aichatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/302aichatcmpl.py
@@ -0,0 +1,17 @@
+from __future__ import annotations
+
+import typing
+import openai
+
+from . import chatcmpl
+
+
+class AI302ChatCompletions(chatcmpl.OpenAIChatCompletions):
+    """302.AI ChatCompletion API 请求器"""
+
+    client: openai.AsyncClient
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'https://api.302.ai/v1',
+        'timeout': 120,
+    }
--- a/src/langbot/pkg/provider/modelmgr/requesters/302aichatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/302aichatcmpl.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: 302.AI
  icon: 302ai.png
 spec:
-  litellm_provider: openai
  config:
  - name: base_url
    label:
--- a/src/langbot/pkg/provider/modelmgr/requesters/anthropicmsgs.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/anthropicmsgs.py
@@ -0,0 +1,370 @@
+from __future__ import annotations
+
+import typing
+import json
+import platform
+import socket
+import anthropic
+import httpx
+
+from .. import errors, requester
+
+from ....utils import image
+import langbot_plugin.api.entities.builtin.resource.tool as resource_tool
+import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
+import langbot_plugin.api.entities.builtin.provider.message as provider_message
+
+
+class AnthropicMessages(requester.ProviderAPIRequester):
+    """Anthropic Messages API 请求器"""
+
+    client: anthropic.AsyncAnthropic
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'https://api.anthropic.com',
+        'timeout': 120,
+    }
+
+    async def initialize(self):
+        # 兼容 Windows 缺失 TCP_KEEPINTVL 和 TCP_KEEPCNT 的问题
+        if platform.system() == 'Windows':
+            if not hasattr(socket, 'TCP_KEEPINTVL'):
+                socket.TCP_KEEPINTVL = 0
+            if not hasattr(socket, 'TCP_KEEPCNT'):
+                socket.TCP_KEEPCNT = 0
+        httpx_client = anthropic._base_client.AsyncHttpxClientWrapper(
+            base_url=self.requester_cfg['base_url'],
+            # cast to a valid type because mypy doesn't understand our type narrowing
+            timeout=typing.cast(httpx.Timeout, self.requester_cfg['timeout']),
+            limits=anthropic._constants.DEFAULT_CONNECTION_LIMITS,
+            follow_redirects=True,
+            trust_env=True,
+        )
+
+        self.client = anthropic.AsyncAnthropic(
+            api_key='',
+            http_client=httpx_client,
+            base_url=self.requester_cfg['base_url'],
+        )
+
+    async def invoke_llm(
+        self,
+        query: pipeline_query.Query,
+        model: requester.RuntimeLLMModel,
+        messages: typing.List[provider_message.Message],
+        funcs: typing.List[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> provider_message.Message:
+        self.client.api_key = model.provider.token_mgr.get_token()
+
+        args = extra_args.copy()
+        args['model'] = model.model_entity.name
+
+        # 处理消息
+
+        # system
+        system_role_message = None
+
+        for i, m in enumerate(messages):
+            if m.role == 'system':
+                system_role_message = m
+
+                break
+
+        if system_role_message:
+            messages.pop(i)
+
+        if isinstance(system_role_message, provider_message.Message) and isinstance(system_role_message.content, str):
+            args['system'] = system_role_message.content
+
+        req_messages = []
+
+        for m in messages:
+            if m.role == 'tool':
+                tool_call_id = m.tool_call_id
+
+                req_messages.append(
+                    {
+                        'role': 'user',
+                        'content': [
+                            {
+                                'type': 'tool_result',
+                                'tool_use_id': tool_call_id,
+                                'is_error': False,
+                                'content': [{'type': 'text', 'text': m.content}],
+                            }
+                        ],
+                    }
+                )
+
+                continue
+
+            msg_dict = m.dict(exclude_none=True)
+
+            if isinstance(m.content, str) and m.content.strip() != '':
+                msg_dict['content'] = [{'type': 'text', 'text': m.content}]
+            elif isinstance(m.content, list):
+                for i, ce in enumerate(m.content):
+                    if ce.type == 'image_base64':
+                        image_b64, image_format = await image.extract_b64_and_format(ce.image_base64)
+
+                        alter_image_ele = {
+                            'type': 'image',
+                            'source': {
+                                'type': 'base64',
+                                'media_type': f'image/{image_format}',
+                                'data': image_b64,
+                            },
+                        }
+                        msg_dict['content'][i] = alter_image_ele
+
+            if m.tool_calls:
+                for tool_call in m.tool_calls:
+                    msg_dict['content'].append(
+                        {
+                            'type': 'tool_use',
+                            'id': tool_call.id,
+                            'name': tool_call.function.name,
+                            'input': json.loads(tool_call.function.arguments),
+                        }
+                    )
+
+                del msg_dict['tool_calls']
+
+            req_messages.append(msg_dict)
+
+        args['messages'] = req_messages
+
+        if 'thinking' in args:
+            args['thinking'] = {'type': 'enabled', 'budget_tokens': 10000}
+
+        if funcs:
+            tools = await self.ap.tool_mgr.generate_tools_for_anthropic(funcs)
+
+            if tools:
+                args['tools'] = tools
+
+        try:
+            resp = await self.client.messages.create(**args)
+
+            args = {
+                'content': '',
+                'role': resp.role,
+            }
+            assert type(resp) is anthropic.types.message.Message
+
+            for block in resp.content:
+                if not remove_think and block.type == 'thinking':
+                    args['content'] = '<think>\n' + block.thinking + '\n</think>\n' + args['content']
+                elif block.type == 'text':
+                    args['content'] += block.text
+                elif block.type == 'tool_use':
+                    assert type(block) is anthropic.types.tool_use_block.ToolUseBlock
+                    tool_call = provider_message.ToolCall(
+                        id=block.id,
+                        type='function',
+                        function=provider_message.FunctionCall(name=block.name, arguments=json.dumps(block.input)),
+                    )
+                    if 'tool_calls' not in args:
+                        args['tool_calls'] = []
+                    args['tool_calls'].append(tool_call)
+
+            return provider_message.Message(**args)
+        except anthropic.AuthenticationError as e:
+            raise errors.RequesterError(f'api-key 无效: {e.message}')
+        except anthropic.BadRequestError as e:
+            raise errors.RequesterError(str(e.message))
+        except anthropic.NotFoundError as e:
+            if 'model: ' in str(e):
+                raise errors.RequesterError(f'模型无效: {e.message}')
+            else:
+                raise errors.RequesterError(f'请求地址无效: {e.message}')
+
+    async def invoke_llm_stream(
+        self,
+        query: pipeline_query.Query,
+        model: requester.RuntimeLLMModel,
+        messages: typing.List[provider_message.Message],
+        funcs: typing.List[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> provider_message.Message:
+        self.client.api_key = model.provider.token_mgr.get_token()
+
+        args = extra_args.copy()
+        args['model'] = model.model_entity.name
+        args['stream'] = True
+
+        # 处理消息
+
+        # system
+        system_role_message = None
+
+        for i, m in enumerate(messages):
+            if m.role == 'system':
+                system_role_message = m
+
+                break
+
+        if system_role_message:
+            messages.pop(i)
+
+        if isinstance(system_role_message, provider_message.Message) and isinstance(system_role_message.content, str):
+            args['system'] = system_role_message.content
+
+        req_messages = []
+
+        for m in messages:
+            if m.role == 'tool':
+                tool_call_id = m.tool_call_id
+
+                req_messages.append(
+                    {
+                        'role': 'user',
+                        'content': [
+                            {
+                                'type': 'tool_result',
+                                'tool_use_id': tool_call_id,
+                                'is_error': False,  # 暂时直接写false
+                                'content': [
+                                    {'type': 'text', 'text': m.content}
+                                ],  # 这里要是list包裹，应该是多个返回的情况？type类型好像也可以填其他的，暂时只写text
+                            }
+                        ],
+                    }
+                )
+
+                continue
+
+            msg_dict = m.dict(exclude_none=True)
+
+            if isinstance(m.content, str) and m.content.strip() != '':
+                msg_dict['content'] = [{'type': 'text', 'text': m.content}]
+            elif isinstance(m.content, list):
+                for i, ce in enumerate(m.content):
+                    if ce.type == 'image_base64':
+                        image_b64, image_format = await image.extract_b64_and_format(ce.image_base64)
+
+                        alter_image_ele = {
+                            'type': 'image',
+                            'source': {
+                                'type': 'base64',
+                                'media_type': f'image/{image_format}',
+                                'data': image_b64,
+                            },
+                        }
+                        msg_dict['content'][i] = alter_image_ele
+            if isinstance(msg_dict['content'], str) and msg_dict['content'] == '':
+                msg_dict['content'] = []  # 这里不知道为什么会莫名有个空导致content为字符
+            if m.tool_calls:
+                for tool_call in m.tool_calls:
+                    msg_dict['content'].append(
+                        {
+                            'type': 'tool_use',
+                            'id': tool_call.id,
+                            'name': tool_call.function.name,
+                            'input': json.loads(tool_call.function.arguments),
+                        }
+                    )
+
+                del msg_dict['tool_calls']
+
+            req_messages.append(msg_dict)
+        if 'thinking' in args:
+            args['thinking'] = {'type': 'enabled', 'budget_tokens': 10000}
+
+        args['messages'] = req_messages
+
+        if funcs:
+            tools = await self.ap.tool_mgr.generate_tools_for_anthropic(funcs)
+
+            if tools:
+                args['tools'] = tools
+
+        try:
+            role = 'assistant'  # 默认角色
+            # chunk_idx = 0
+            think_started = False
+            think_ended = False
+            finish_reason = False
+            tool_name = ''
+            tool_id = ''
+            async for chunk in await self.client.messages.create(**args):
+                content = ''
+                tool_call = {'id': None, 'function': {'name': None, 'arguments': None}, 'type': 'function'}
+                if isinstance(
+                    chunk, anthropic.types.raw_content_block_start_event.RawContentBlockStartEvent
+                ):  # 记录开始
+                    if chunk.content_block.type == 'tool_use':
+                        if chunk.content_block.name is not None:
+                            tool_name = chunk.content_block.name
+                        if chunk.content_block.id is not None:
+                            tool_id = chunk.content_block.id
+
+                        tool_call['function']['name'] = tool_name
+                        tool_call['function']['arguments'] = ''
+                        tool_call['id'] = tool_id
+
+                    if not remove_think:
+                        if chunk.content_block.type == 'thinking' and not remove_think:
+                            think_started = True
+                        elif chunk.content_block.type == 'text' and chunk.index != 0 and not remove_think:
+                            think_ended = True
+                        continue
+                elif isinstance(chunk, anthropic.types.raw_content_block_delta_event.RawContentBlockDeltaEvent):
+                    if chunk.delta.type == 'thinking_delta':
+                        if think_started:
+                            think_started = False
+                            content = '<think>\n' + chunk.delta.thinking
+                        elif remove_think:
+                            continue
+                        else:
+                            content = chunk.delta.thinking
+                    elif chunk.delta.type == 'text_delta':
+                        if think_ended:
+                            think_ended = False
+                            content = '\n</think>\n' + chunk.delta.text
+                        else:
+                            content = chunk.delta.text
+                    elif chunk.delta.type == 'input_json_delta':
+                        tool_call['function']['arguments'] = chunk.delta.partial_json
+                        tool_call['function']['name'] = tool_name
+                        tool_call['id'] = tool_id
+                elif isinstance(chunk, anthropic.types.raw_content_block_stop_event.RawContentBlockStopEvent):
+                    continue  # 记录raw_content_block结束的
+
+                elif isinstance(chunk, anthropic.types.raw_message_delta_event.RawMessageDeltaEvent):
+                    if chunk.delta.stop_reason == 'end_turn':
+                        finish_reason = True
+                elif isinstance(chunk, anthropic.types.raw_message_stop_event.RawMessageStopEvent):
+                    continue  # 这个好像是完全结束
+                else:
+                    # print(chunk)
+                    self.ap.logger.debug(f'anthropic chunk: {chunk}')
+                    continue
+
+                args = {
+                    'content': content,
+                    'role': role,
+                    'is_final': finish_reason,
+                    'tool_calls': None if tool_call['id'] is None else [tool_call],
+                }
+                # if chunk_idx == 0:
+                #     chunk_idx += 1
+                #     continue
+
+                # assert type(chunk) is anthropic.types.message.Chunk
+
+                yield provider_message.MessageChunk(**args)
+
+            # return llm_entities.Message(**args)
+        except anthropic.AuthenticationError as e:
+            raise errors.RequesterError(f'api-key 无效: {e.message}')
+        except anthropic.BadRequestError as e:
+            raise errors.RequesterError(str(e.message))
+        except anthropic.NotFoundError as e:
+            if 'model: ' in str(e):
+                raise errors.RequesterError(f'模型无效: {e.message}')
+            else:
+                raise errors.RequesterError(f'请求地址无效: {e.message}')
--- a/src/langbot/pkg/provider/modelmgr/requesters/anthropicmsgs.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/anthropicmsgs.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: Anthropic
  icon: anthropic.svg
 spec:
-  litellm_provider: anthropic
  config:
  - name: base_url
    label:
@@ -25,8 +24,6 @@ spec:
    default: 120
  support_type:
  - llm
-  - text-embedding
-  - rerank
  provider_category: manufacturer
 execution:
  python:
--- a/src/langbot/pkg/provider/modelmgr/requesters/baidu.svg
+++ b/src/langbot/pkg/provider/modelmgr/requesters/baidu.svg
@@ -1,5 +0,0 @@
-<svg width="60" height="50" viewBox="0 0 60 50" xmlns="http://www.w3.org/2000/svg">
-  <rect width="60" height="50" rx="8" fill="#2932E1"/>
-  <text x="30" y="28" font-family="Arial, sans-serif" font-size="10" font-weight="bold" fill="white" text-anchor="middle">Baidu</text>
-  <text x="30" y="40" font-family="Arial, sans-serif" font-size="8" fill="white" text-anchor="middle">ERNIE</text>
-</svg>
--- a/src/langbot/pkg/provider/modelmgr/requesters/baiduchatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/baiduchatcmpl.yaml
@@ -1,30 +0,0 @@
-apiVersion: v1
-kind: LLMAPIRequester
-metadata:
-  name: baidu-chat-completions
-  label:
-    en_US: Baidu ERNIE
-    zh_Hans: 百度文心一言
-  icon: baidu.svg
-spec:
-  litellm_provider: openai
-  config:
-  - name: base_url
-    label:
-      en_US: Base URL
-      zh_Hans: 基础 URL
-    type: string
-    required: true
-    default: https://aip.baidubce.com/rpc/2.0/ai_custom/v1/wenxinworkshop
-  - name: timeout
-    label:
-      en_US: Timeout
-      zh_Hans: 超时时间
-    type: integer
-    required: true
-    default: 120
-  support_type:
-  - llm
-  - text-embedding
-  - rerank
-  provider_category: manufacturer
--- a/src/langbot/pkg/provider/modelmgr/requesters/bailianchatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/bailianchatcmpl.py
@@ -0,0 +1,242 @@
+from __future__ import annotations
+
+import typing
+import dashscope
+import openai
+
+from . import modelscopechatcmpl
+from .. import requester
+import langbot_plugin.api.entities.builtin.resource.tool as resource_tool
+import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
+import langbot_plugin.api.entities.builtin.provider.message as provider_message
+
+
+class BailianChatCompletions(modelscopechatcmpl.ModelScopeChatCompletions):
+    """阿里云百炼大模型平台 ChatCompletion API 请求器"""
+
+    client: openai.AsyncClient
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'https://dashscope.aliyuncs.com/compatible-mode/v1',
+        'timeout': 120,
+    }
+
+    async def _closure_stream(
+        self,
+        query: pipeline_query.Query,
+        req_messages: list[dict],
+        use_model: requester.RuntimeLLMModel,
+        use_funcs: list[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> provider_message.Message | typing.AsyncGenerator[provider_message.MessageChunk, None]:
+        self.client.api_key = use_model.provider.token_mgr.get_token()
+
+        args = {}
+        args['model'] = use_model.model_entity.name
+
+        if use_funcs:
+            tools = await self.ap.tool_mgr.generate_tools_for_openai(use_funcs)
+
+            if tools:
+                args['tools'] = tools
+
+        # 设置此次请求中的messages
+        messages = req_messages.copy()
+
+        is_use_dashscope_call = False  # 是否使用阿里原生库调用
+        is_enable_multi_model = True  # 是否支持多轮对话
+        use_time_num = 0  # 模型已调用次数，防止存在多文件时重复调用
+        use_time_ids = []  # 已调用的ID列表
+        message_id = 0  # 记录消息序号
+
+        for msg in messages:
+            # print(msg)
+            if 'content' in msg and isinstance(msg['content'], list):
+                for me in msg['content']:
+                    if me['type'] == 'image_base64':
+                        me['image_url'] = {'url': me['image_base64']}
+                        me['type'] = 'image_url'
+                        del me['image_base64']
+                    elif me['type'] == 'file_url' and '.' in me.get('file_name', ''):
+                        # 1. 视频文件推理
+                        # https://bailian.console.aliyun.com/?tab=doc#/doc/?type=model&url=2845871
+                        file_type = me.get('file_name').lower().split('.')[-1]
+                        if file_type in ['mp4', 'avi', 'mkv', 'mov', 'flv', 'wmv']:
+                            me['type'] = 'video_url'
+                            me['video_url'] = {'url': me['file_url']}
+                            del me['file_url']
+                            del me['file_name']
+                            use_time_num += 1
+                            use_time_ids.append(message_id)
+                            is_enable_multi_model = False
+                        # 2. 语音文件识别, 无法通过openai的audio字段传递，暂时不支持
+                        # https://bailian.console.aliyun.com/?tab=doc#/doc/?type=model&url=2979031
+                        elif file_type in [
+                            'aac',
+                            'amr',
+                            'aiff',
+                            'flac',
+                            'm4a',
+                            'mp3',
+                            'mpeg',
+                            'ogg',
+                            'opus',
+                            'wav',
+                            'webm',
+                            'wma',
+                        ]:
+                            me['audio'] = me['file_url']
+                            me['type'] = 'audio'
+                            del me['file_url']
+                            del me['type']
+                            del me['file_name']
+                            is_use_dashscope_call = True
+                            use_time_num += 1
+                            use_time_ids.append(message_id)
+                            is_enable_multi_model = False
+            message_id += 1
+
+        # 使用列表推导式，保留不在 use_time_ids[:-1] 中的元素，仅保留最后一个多媒体消息
+        if not is_enable_multi_model and use_time_num > 1:
+            messages = [msg for idx, msg in enumerate(messages) if idx not in use_time_ids[:-1]]
+
+        if not is_enable_multi_model:
+            messages = [msg for msg in messages if 'resp_message_id' not in msg]
+
+        args['messages'] = messages
+        args['stream'] = True
+
+        # 流式处理状态
+        # tool_calls_map: dict[str, provider_message.ToolCall] = {}
+        chunk_idx = 0
+        thinking_started = False
+        thinking_ended = False
+        role = 'assistant'  # 默认角色
+
+        if is_use_dashscope_call:
+            response = dashscope.MultiModalConversation.call(
+                # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key = "sk-xxx"
+                api_key=use_model.provider.token_mgr.get_token(),
+                model=use_model.model_entity.name,
+                messages=messages,
+                result_format='message',
+                asr_options={
+                    # "language": "zh", # 可选，若已知音频的语种，可通过该参数指定待识别语种，以提升识别准确率
+                    'enable_lid': True,
+                    'enable_itn': False,
+                },
+                stream=True,
+            )
+            content_length_list = []
+            previous_length = 0  # 记录上一次的内容长度
+            for res in response:
+                chunk = res['output']
+                # 解析 chunk 数据
+                if hasattr(chunk, 'choices') and chunk.choices:
+                    choice = chunk.choices[0]
+                    delta_content = choice['message'].content[0]['text']
+                    finish_reason = choice['finish_reason']
+                    content_length_list.append(len(delta_content))
+                else:
+                    delta_content = ''
+                    finish_reason = None
+
+                # 跳过空的第一个 chunk（只有 role 没有内容）
+                if chunk_idx == 0 and not delta_content:
+                    chunk_idx += 1
+                    continue
+
+                # 检查 content_length_list 是否有足够的数据
+                if len(content_length_list) >= 2:
+                    now_content = delta_content[previous_length : content_length_list[-1]]
+                    previous_length = content_length_list[-1]  # 更新上一次的长度
+                else:
+                    now_content = delta_content  # 第一次循环时直接使用 delta_content
+                    previous_length = len(delta_content)  # 更新上一次的长度
+
+                # 构建 MessageChunk - 只包含增量内容
+                chunk_data = {
+                    'role': role,
+                    'content': now_content if now_content else None,
+                    'is_final': bool(finish_reason) and finish_reason != 'null',
+                }
+
+                # 移除 None 值
+                chunk_data = {k: v for k, v in chunk_data.items() if v is not None}
+                yield provider_message.MessageChunk(**chunk_data)
+                chunk_idx += 1
+        else:
+            async for chunk in self._req_stream(args, extra_body=extra_args):
+                # 解析 chunk 数据
+                if hasattr(chunk, 'choices') and chunk.choices:
+                    choice = chunk.choices[0]
+                    delta = choice.delta.model_dump() if hasattr(choice, 'delta') else {}
+                    finish_reason = getattr(choice, 'finish_reason', None)
+                else:
+                    delta = {}
+                    finish_reason = None
+
+                # 从第一个 chunk 获取 role，后续使用这个 role
+                if 'role' in delta and delta['role']:
+                    role = delta['role']
+
+                # 获取增量内容
+                delta_content = delta.get('content', '')
+                reasoning_content = delta.get('reasoning_content', '')
+
+                # 处理 reasoning_content
+                if reasoning_content:
+                    # accumulated_reasoning += reasoning_content
+                    # 如果设置了 remove_think，跳过 reasoning_content
+                    if remove_think:
+                        chunk_idx += 1
+                        continue
+
+                    # 第一次出现 reasoning_content，添加 <think> 开始标签
+                    if not thinking_started:
+                        thinking_started = True
+                        delta_content = '<think>\n' + reasoning_content
+                    else:
+                        # 继续输出 reasoning_content
+                        delta_content = reasoning_content
+                elif thinking_started and not thinking_ended and delta_content:
+                    # reasoning_content 结束，normal content 开始，添加 </think> 结束标签
+                    thinking_ended = True
+                    delta_content = '\n</think>\n' + delta_content
+
+                # 处理工具调用增量
+                if delta.get('tool_calls'):
+                    for tool_call in delta['tool_calls']:
+                        if tool_call['id'] != '':
+                            tool_id = tool_call['id']
+                        if tool_call['function']['name'] is not None:
+                            tool_name = tool_call['function']['name']
+
+                        if tool_call['type'] is None:
+                            tool_call['type'] = 'function'
+                        tool_call['id'] = tool_id
+                        tool_call['function']['name'] = tool_name
+                        tool_call['function']['arguments'] = (
+                            '' if tool_call['function']['arguments'] is None else tool_call['function']['arguments']
+                        )
+
+                # 跳过空的第一个 chunk（只有 role 没有内容）
+                if chunk_idx == 0 and not delta_content and not reasoning_content and not delta.get('tool_calls'):
+                    chunk_idx += 1
+                    continue
+
+                # 构建 MessageChunk - 只包含增量内容
+                chunk_data = {
+                    'role': role,
+                    'content': delta_content if delta_content else None,
+                    'tool_calls': delta.get('tool_calls'),
+                    'is_final': bool(finish_reason),
+                }
+
+                # 移除 None 值
+                chunk_data = {k: v for k, v in chunk_data.items() if v is not None}
+
+                yield provider_message.MessageChunk(**chunk_data)
+                chunk_idx += 1
+                # return
--- a/src/langbot/pkg/provider/modelmgr/requesters/bailianchatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/bailianchatcmpl.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: 阿里云百炼
  icon: bailian.png
 spec:
-  litellm_provider: openai
  config:
  - name: base_url
    label:
@@ -25,7 +24,6 @@ spec:
    default: 120
  support_type:
  - llm
-  - text-embedding
  - rerank
  provider_category: maas
 execution:
--- a/src/langbot/pkg/provider/modelmgr/requesters/chatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/chatcmpl.py
@@ -0,0 +1,702 @@
+from __future__ import annotations
+
+import asyncio
+import typing
+
+import openai
+import openai.types.chat.chat_completion as chat_completion_module
+import httpx
+
+from .. import errors, requester
+import langbot_plugin.api.entities.builtin.resource.tool as resource_tool
+import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
+import langbot_plugin.api.entities.builtin.provider.message as provider_message
+
+
+class OpenAIChatCompletions(requester.ProviderAPIRequester):
+    """OpenAI ChatCompletion API 请求器"""
+
+    client: openai.AsyncClient
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'https://api.openai.com/v1',
+        'timeout': 120,
+    }
+
+    async def initialize(self):
+        self.client = openai.AsyncClient(
+            api_key=self.init_api_key,
+            base_url=self.requester_cfg['base_url'].replace(' ', ''),
+            timeout=self.requester_cfg['timeout'],
+            http_client=httpx.AsyncClient(trust_env=True, timeout=self.requester_cfg['timeout']),
+        )
+
+    def _mask_api_key(self, api_key: str | None) -> str:
+        if not api_key:
+            return ''
+        if len(api_key) <= 8:
+            return '****'
+        return f'{api_key[:4]}...{api_key[-4:]}'
+
+    def _infer_model_type(self, model_id: str) -> str:
+        normalized_model_id = (model_id or '').lower()
+        embedding_keywords = (
+            'embedding',
+            'embed',
+            'bge-',
+            'e5-',
+            'm3e',
+            'gte-',
+            'multilingual-e5',
+            'text-embedding',
+        )
+        return 'embedding' if any(keyword in normalized_model_id for keyword in embedding_keywords) else 'llm'
+
+    def _infer_model_abilities(self, item: dict[str, typing.Any], model_id: str) -> list[str]:
+        normalized_model_id = (model_id or '').lower()
+        abilities: set[str] = set()
+
+        def _flatten(value: typing.Any) -> list[str]:
+            if value is None:
+                return []
+            if isinstance(value, str):
+                return [value.lower()]
+            if isinstance(value, dict):
+                flattened: list[str] = []
+                for nested_value in value.values():
+                    flattened.extend(_flatten(nested_value))
+                return flattened
+            if isinstance(value, (list, tuple, set)):
+                flattened: list[str] = []
+                for nested_value in value:
+                    flattened.extend(_flatten(nested_value))
+                return flattened
+            return [str(value).lower()]
+
+        capability_tokens = _flatten(item.get('capabilities'))
+        capability_tokens.extend(_flatten(item.get('modalities')))
+        capability_tokens.extend(_flatten(item.get('input_modalities')))
+        capability_tokens.extend(_flatten(item.get('output_modalities')))
+        capability_tokens.extend(_flatten(item.get('supported_generation_methods')))
+        capability_tokens.extend(_flatten(item.get('supported_parameters')))
+        capability_tokens.extend(_flatten(item.get('architecture')))
+
+        combined_tokens = capability_tokens + [normalized_model_id]
+
+        vision_keywords = (
+            'vision',
+            'image',
+            'file',
+            'video',
+            'multimodal',
+            'vl',
+            'ocr',
+            'omni',
+        )
+        function_call_keywords = (
+            'function',
+            'tool',
+            'tools',
+            'tool_choice',
+            'tool_call',
+            'tool-use',
+            'tool_use',
+        )
+
+        if any(any(keyword in token for keyword in vision_keywords) for token in combined_tokens):
+            abilities.add('vision')
+
+        if any(any(keyword in token for keyword in function_call_keywords) for token in combined_tokens):
+            abilities.add('func_call')
+
+        return sorted(abilities)
+
+    def _normalize_modalities(self, value: typing.Any) -> list[str]:
+        normalized: list[str] = []
+
+        def _collect(item: typing.Any):
+            if item is None:
+                return
+            if isinstance(item, str):
+                for part in item.replace('->', ',').replace('+', ',').split(','):
+                    token = part.strip().lower()
+                    if token and token not in normalized:
+                        normalized.append(token)
+                return
+            if isinstance(item, dict):
+                for nested in item.values():
+                    _collect(nested)
+                return
+            if isinstance(item, (list, tuple, set)):
+                for nested in item:
+                    _collect(nested)
+                return
+
+        _collect(value)
+        return normalized
+
+    def _extract_scan_metadata(self, item: dict[str, typing.Any], model_id: str) -> dict[str, typing.Any]:
+        display_name = item.get('name')
+        if not isinstance(display_name, str) or not display_name.strip() or display_name == model_id:
+            display_name = ''
+
+        description = item.get('description')
+        if not isinstance(description, str) or not description.strip():
+            description = ''
+
+        context_length = item.get('context_length')
+        if context_length is None and isinstance(item.get('top_provider'), dict):
+            context_length = item['top_provider'].get('context_length')
+
+        if not isinstance(context_length, int):
+            try:
+                context_length = int(context_length) if context_length is not None else None
+            except (TypeError, ValueError):
+                context_length = None
+
+        input_modalities = self._normalize_modalities(item.get('input_modalities'))
+        output_modalities = self._normalize_modalities(item.get('output_modalities'))
+
+        if isinstance(item.get('architecture'), dict):
+            if not input_modalities:
+                input_modalities = self._normalize_modalities(item['architecture'].get('input_modalities'))
+            if not output_modalities:
+                output_modalities = self._normalize_modalities(item['architecture'].get('output_modalities'))
+
+        owned_by = item.get('owned_by')
+        if not isinstance(owned_by, str) or not owned_by.strip():
+            owned_by = ''
+
+        return {
+            'display_name': display_name or None,
+            'description': description or None,
+            'context_length': context_length,
+            'owned_by': owned_by or None,
+            'input_modalities': input_modalities,
+            'output_modalities': output_modalities,
+        }
+
+    async def scan_models(self, api_key: str | None = None) -> dict[str, typing.Any]:
+        headers = {}
+        if api_key:
+            headers['Authorization'] = f'Bearer {api_key}'
+
+        models_url = f'{self.requester_cfg["base_url"].rstrip("/")}/models'
+        async with httpx.AsyncClient(trust_env=True, timeout=self.requester_cfg['timeout']) as client:
+            response = await client.get(models_url, headers=headers)
+            response.raise_for_status()
+            payload = response.json()
+
+        models = []
+        for item in payload.get('data', []):
+            model_id = item.get('id')
+            if not model_id:
+                continue
+            models.append(
+                {
+                    'id': model_id,
+                    'name': model_id,
+                    'type': self._infer_model_type(model_id),
+                    'abilities': self._infer_model_abilities(item, model_id),
+                    **self._extract_scan_metadata(item, model_id),
+                }
+            )
+
+        models.sort(key=lambda item: (item['type'] != 'llm', item['name'].lower()))
+        return {
+            'models': models,
+            'debug': {
+                'request': {
+                    'method': 'GET',
+                    'url': models_url,
+                    'headers': {
+                        'Authorization': f'Bearer {self._mask_api_key(api_key)}' if api_key else '',
+                    },
+                },
+                'response': payload,
+            },
+        }
+
+    async def _req(
+        self,
+        args: dict,
+        extra_body: dict = {},
+    ) -> chat_completion_module.ChatCompletion:
+        return await self.client.chat.completions.create(**args, extra_body=extra_body)
+
+    async def _req_stream(
+        self,
+        args: dict,
+        extra_body: dict = {},
+    ):
+        async for chunk in await self.client.chat.completions.create(**args, extra_body=extra_body):
+            yield chunk
+
+    async def _make_msg(
+        self,
+        chat_completion: chat_completion_module.ChatCompletion,
+        remove_think: bool = False,
+    ) -> provider_message.Message:
+        if not isinstance(chat_completion, chat_completion_module.ChatCompletion):
+            raise TypeError(f'Expected ChatCompletion, got {type(chat_completion).__name__}: {chat_completion[:16]}')
+
+        chatcmpl_message = chat_completion.choices[0].message.model_dump()
+
+        # 确保 role 字段存在且不为 None
+        if 'role' not in chatcmpl_message or chatcmpl_message['role'] is None:
+            chatcmpl_message['role'] = 'assistant'
+
+        # 处理思维链
+        content = chatcmpl_message.get('content', '')
+        reasoning_content = chatcmpl_message.get('reasoning_content', None)
+
+        processed_content, _ = await self._process_thinking_content(
+            content=content, reasoning_content=reasoning_content, remove_think=remove_think
+        )
+
+        chatcmpl_message['content'] = processed_content
+
+        # 移除 reasoning_content 字段，避免传递给 Message
+        if 'reasoning_content' in chatcmpl_message:
+            del chatcmpl_message['reasoning_content']
+
+        message = provider_message.Message(**chatcmpl_message)
+
+        return message
+
+    async def _process_thinking_content(
+        self,
+        content: str,
+        reasoning_content: str = None,
+        remove_think: bool = False,
+    ) -> tuple[str, str]:
+        """处理思维链内容
+
+        Args:
+            content: 原始内容
+            reasoning_content: reasoning_content 字段内容
+            remove_think: 是否移除思维链
+
+        Returns:
+            (处理后的内容, 提取的思维链内容)
+        """
+        thinking_content = ''
+
+        # 1. 从 reasoning_content 提取思维链
+        if reasoning_content:
+            thinking_content = reasoning_content
+
+        # 2. 从 content 中提取 <think> 标签内容
+        if content and '<think>' in content and '</think>' in content:
+            import re
+
+            think_pattern = r'<think>(.*?)</think>'
+            think_matches = re.findall(think_pattern, content, re.DOTALL)
+            if think_matches:
+                # 如果已有 reasoning_content，则追加
+                if thinking_content:
+                    thinking_content += '\n' + '\n'.join(think_matches)
+                else:
+                    thinking_content = '\n'.join(think_matches)
+                # 移除 content 中的 <think> 标签
+                content = re.sub(think_pattern, '', content, flags=re.DOTALL).strip()
+
+        # 3. 根据 remove_think 参数决定是否保留思维链
+        if remove_think:
+            return content, ''
+        else:
+            # 如果有思维链内容，将其以 <think> 格式添加到 content 开头
+            if thinking_content:
+                content = f'<think>\n{thinking_content}\n</think>\n{content}'.strip()
+            return content, thinking_content
+
+    async def _closure_stream(
+        self,
+        query: pipeline_query.Query,
+        req_messages: list[dict],
+        use_model: requester.RuntimeLLMModel,
+        use_funcs: list[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> provider_message.MessageChunk:
+        self.client.api_key = use_model.provider.token_mgr.get_token()
+
+        args = {}
+        args['model'] = use_model.model_entity.name
+
+        if use_funcs:
+            tools = await self.ap.tool_mgr.generate_tools_for_openai(use_funcs)
+            if tools:
+                args['tools'] = tools
+
+        # 设置此次请求中的messages
+        messages = req_messages.copy()
+
+        # 检查vision
+        for msg in messages:
+            if 'content' in msg and isinstance(msg['content'], list):
+                for me in msg['content']:
+                    if me['type'] == 'image_base64':
+                        me['image_url'] = {'url': me['image_base64']}
+                        me['type'] = 'image_url'
+                        del me['image_base64']
+
+        args['messages'] = messages
+        args['stream'] = True
+
+        # 流式处理状态
+        # tool_calls_map: dict[str, provider_message.ToolCall] = {}
+        chunk_idx = 0
+        thinking_started = False
+        thinking_ended = False
+        role = 'assistant'  # 默认角色
+        tool_id = ''
+        tool_name = ''
+        # accumulated_reasoning = ''  # 仅用于判断何时结束思维链
+
+        async for chunk in self._req_stream(args, extra_body=extra_args):
+            # 解析 chunk 数据
+
+            if hasattr(chunk, 'choices') and chunk.choices:
+                choice = chunk.choices[0]
+                delta = choice.delta.model_dump() if hasattr(choice, 'delta') else {}
+
+                finish_reason = getattr(choice, 'finish_reason', None)
+            else:
+                delta = {}
+                finish_reason = None
+            # 从第一个 chunk 获取 role，后续使用这个 role
+            if 'role' in delta and delta['role']:
+                role = delta['role']
+
+            # 获取增量内容
+            delta_content = delta.get('content', '')
+            reasoning_content = delta.get('reasoning_content', '')
+
+            # 处理 reasoning_content
+            if reasoning_content:
+                # accumulated_reasoning += reasoning_content
+                # 如果设置了 remove_think，跳过 reasoning_content
+                if remove_think:
+                    chunk_idx += 1
+                    continue
+
+                # 第一次出现 reasoning_content，添加 <think> 开始标签
+                if not thinking_started:
+                    thinking_started = True
+                    delta_content = '<think>\n' + reasoning_content
+                else:
+                    # 继续输出 reasoning_content
+                    delta_content = reasoning_content
+            elif thinking_started and not thinking_ended and delta_content:
+                # reasoning_content 结束，normal content 开始，添加 </think> 结束标签
+                thinking_ended = True
+                delta_content = '\n</think>\n' + delta_content
+
+            # 处理 content 中已有的 <think> 标签（如果需要移除）
+            # if delta_content and remove_think and '<think>' in delta_content:
+            #     import re
+            #
+            #     # 移除 <think> 标签及其内容
+            #     delta_content = re.sub(r'<think>.*?</think>', '', delta_content, flags=re.DOTALL)
+
+            # 处理工具调用增量
+            # delta_tool_calls = None
+            if delta.get('tool_calls'):
+                for tool_call in delta['tool_calls']:
+                    if tool_call['id'] and tool_call['function']['name']:
+                        tool_id = tool_call['id']
+                        tool_name = tool_call['function']['name']
+                    else:
+                        tool_call['id'] = tool_id
+                        tool_call['function']['name'] = tool_name
+                    if tool_call['type'] is None:
+                        tool_call['type'] = 'function'
+
+            # 跳过空的第一个 chunk（只有 role 没有内容）
+            if chunk_idx == 0 and not delta_content and not reasoning_content and not delta.get('tool_calls'):
+                chunk_idx += 1
+                continue
+            # 构建 MessageChunk - 只包含增量内容
+            chunk_data = {
+                'role': role,
+                'content': delta_content if delta_content else None,
+                'tool_calls': delta.get('tool_calls'),
+                'is_final': bool(finish_reason),
+            }
+
+            # 移除 None 值
+            chunk_data = {k: v for k, v in chunk_data.items() if v is not None}
+
+            yield provider_message.MessageChunk(**chunk_data)
+            chunk_idx += 1
+
+    async def _closure(
+        self,
+        query: pipeline_query.Query,
+        req_messages: list[dict],
+        use_model: requester.RuntimeLLMModel,
+        use_funcs: list[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> tuple[provider_message.Message, dict]:
+        self.client.api_key = use_model.provider.token_mgr.get_token()
+
+        args = {}
+        args['model'] = use_model.model_entity.name
+
+        if use_funcs:
+            tools = await self.ap.tool_mgr.generate_tools_for_openai(use_funcs)
+
+            if tools:
+                args['tools'] = tools
+
+        # 设置此次请求中的messages
+        messages = req_messages.copy()
+
+        # 检查vision
+        for msg in messages:
+            if 'content' in msg and isinstance(msg['content'], list):
+                for me in msg['content']:
+                    if me['type'] == 'image_base64':
+                        me['image_url'] = {'url': me['image_base64']}
+                        me['type'] = 'image_url'
+                        del me['image_base64']
+
+        args['messages'] = messages
+
+        # 发送请求
+
+        resp = await self._req(args, extra_body=extra_args)
+        # 处理请求结果
+        message = await self._make_msg(resp, remove_think)
+
+        # Extract token usage from response
+        usage_info = {}
+        if hasattr(resp, 'usage') and resp.usage:
+            usage_info['input_tokens'] = resp.usage.prompt_tokens or 0
+            usage_info['output_tokens'] = resp.usage.completion_tokens or 0
+            usage_info['total_tokens'] = resp.usage.total_tokens or 0
+
+        return message, usage_info
+
+    async def invoke_llm(
+        self,
+        query: pipeline_query.Query,
+        model: requester.RuntimeLLMModel,
+        messages: typing.List[provider_message.Message],
+        funcs: typing.List[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> tuple[provider_message.Message, dict]:
+        """Invoke LLM and return message with usage info"""
+        req_messages = []  # req_messages 仅用于类内，外部同步由 query.messages 进行
+        for m in messages:
+            msg_dict = m.dict(exclude_none=True)
+            content = msg_dict.get('content')
+            if isinstance(content, list):
+                # 检查 content 列表中是否每个部分都是文本
+                if all(isinstance(part, dict) and part.get('type') == 'text' for part in content):
+                    # 将所有文本部分合并为一个字符串
+                    msg_dict['content'] = '\n'.join(part['text'] for part in content)
+            req_messages.append(msg_dict)
+
+        try:
+            msg, usage_info = await self._closure(
+                query=query,
+                req_messages=req_messages,
+                use_model=model,
+                use_funcs=funcs,
+                extra_args=extra_args,
+                remove_think=remove_think,
+            )
+            return msg, usage_info
+        except asyncio.TimeoutError:
+            raise errors.RequesterError('请求超时')
+        except openai.BadRequestError as e:
+            error_message = str(e.message) if hasattr(e, 'message') else str(e)
+            if 'context_length_exceeded' in str(e):
+                raise errors.RequesterError(f'上文过长，请重置会话: {error_message}')
+            else:
+                raise errors.RequesterError(f'请求参数错误: {error_message}')
+        except openai.AuthenticationError as e:
+            error_message = str(e.message) if hasattr(e, 'message') else str(e)
+            raise errors.RequesterError(f'无效的 api-key: {error_message}')
+        except openai.NotFoundError as e:
+            error_message = str(e.message) if hasattr(e, 'message') else str(e)
+            raise errors.RequesterError(f'请求路径错误: {error_message}')
+        except openai.RateLimitError as e:
+            error_message = str(e.message) if hasattr(e, 'message') else str(e)
+            raise errors.RequesterError(f'请求过于频繁或余额不足: {error_message}')
+        except openai.APIConnectionError as e:
+            error_message = f'连接错误: {str(e)}'
+            raise errors.RequesterError(error_message)
+        except openai.APIError as e:
+            error_message = str(e.message) if hasattr(e, 'message') else str(e)
+            raise errors.RequesterError(f'请求错误: {error_message}')
+
+    async def invoke_embedding(
+        self,
+        model: requester.RuntimeEmbeddingModel,
+        input_text: list[str],
+        extra_args: dict[str, typing.Any] = {},
+    ) -> tuple[list[list[float]], dict]:
+        """调用 Embedding API, returns (embeddings, usage_info)"""
+        self.client.api_key = model.provider.token_mgr.get_token()
+
+        args = {
+            'model': model.model_entity.name,
+            'input': input_text,
+        }
+
+        if model.model_entity.extra_args:
+            args.update(model.model_entity.extra_args)
+
+        args.update(extra_args)
+
+        try:
+            resp = await self.client.embeddings.create(**args)
+
+            # Extract usage info
+            usage_info = {}
+            if hasattr(resp, 'usage') and resp.usage:
+                usage_info['prompt_tokens'] = resp.usage.prompt_tokens or 0
+                usage_info['total_tokens'] = resp.usage.total_tokens or 0
+
+            return [d.embedding for d in resp.data], usage_info
+        except asyncio.TimeoutError:
+            raise errors.RequesterError('请求超时')
+        except openai.BadRequestError as e:
+            raise errors.RequesterError(f'请求参数错误: {e.message}')
+
+    async def invoke_llm_stream(
+        self,
+        query: pipeline_query.Query,
+        model: requester.RuntimeLLMModel,
+        messages: typing.List[provider_message.Message],
+        funcs: typing.List[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> provider_message.MessageChunk:
+        req_messages = []  # req_messages 仅用于类内，外部同步由 query.messages 进行
+        for m in messages:
+            msg_dict = m.dict(exclude_none=True)
+            content = msg_dict.get('content')
+            if isinstance(content, list):
+                # 检查 content 列表中是否每个部分都是文本
+                if all(isinstance(part, dict) and part.get('type') == 'text' for part in content):
+                    # 将所有文本部分合并为一个字符串
+                    msg_dict['content'] = '\n'.join(part['text'] for part in content)
+            req_messages.append(msg_dict)
+
+        try:
+            async for item in self._closure_stream(
+                query=query,
+                req_messages=req_messages,
+                use_model=model,
+                use_funcs=funcs,
+                extra_args=extra_args,
+                remove_think=remove_think,
+            ):
+                yield item
+
+        except asyncio.TimeoutError:
+            raise errors.RequesterError('请求超时')
+        except openai.BadRequestError as e:
+            if 'context_length_exceeded' in e.message:
+                raise errors.RequesterError(f'上文过长，请重置会话: {e.message}')
+            else:
+                raise errors.RequesterError(f'请求参数错误: {e.message}')
+        except openai.AuthenticationError as e:
+            raise errors.RequesterError(f'无效的 api-key: {e.message}')
+        except openai.NotFoundError as e:
+            raise errors.RequesterError(f'请求路径错误: {e.message}')
+        except openai.RateLimitError as e:
+            raise errors.RequesterError(f'请求过于频繁或余额不足: {e.message}')
+        except openai.APIError as e:
+            raise errors.RequesterError(f'请求错误: {e.message}')
+
+    async def invoke_rerank(
+        self,
+        model: requester.RuntimeRerankModel,
+        query: str,
+        documents: typing.List[str],
+        extra_args: dict[str, typing.Any] = {},
+    ) -> typing.List[dict]:
+        """Standard /rerank endpoint (Jina/Cohere/SiliconFlow/Voyage/DashScope compatible)
+
+        Supports extra_args from model.extra_args:
+        - rerank_url: full URL override (e.g. "https://dashscope.aliyuncs.com/compatible-api/v1/reranks")
+        - rerank_path: path override appended to base_url (e.g. "reranks" instead of default "rerank")
+        - Any other fields are merged into the request payload.
+        """
+        api_key = model.provider.token_mgr.get_token()
+        base_url = self.requester_cfg.get('base_url', '').rstrip('/')
+        timeout = self.requester_cfg.get('timeout', 120)
+
+        merged_args = {}
+        if model.model_entity.extra_args:
+            merged_args.update(model.model_entity.extra_args)
+        if extra_args:
+            merged_args.update(extra_args)
+
+        rerank_url = merged_args.pop('rerank_url', None)
+        rerank_path = merged_args.pop('rerank_path', 'rerank')
+        if not rerank_url:
+            rerank_url = f'{base_url}/{rerank_path}'
+
+        headers = {
+            'Content-Type': 'application/json',
+            'Authorization': f'Bearer {api_key}',
+        }
+
+        payload = {
+            'model': model.model_entity.name,
+            'query': query,
+            'documents': documents[:64],
+            'top_n': min(len(documents), 64),
+        }
+
+        if merged_args:
+            payload.update(merged_args)
+
+        try:
+            async with httpx.AsyncClient(trust_env=True, timeout=timeout) as client:
+                resp = await client.post(rerank_url, headers=headers, json=payload)
+                resp.raise_for_status()
+                data = resp.json()
+
+            results = self._parse_rerank_response(data)
+
+            if results:
+                scores = [r.get('relevance_score', 0.0) for r in results]
+                min_score = min(scores)
+                max_score = max(scores)
+                if max_score - min_score > 1e-6:
+                    for r in results:
+                        r['relevance_score'] = (r['relevance_score'] - min_score) / (max_score - min_score)
+
+            return results
+        except httpx.HTTPStatusError as e:
+            raise errors.RequesterError(f'Rerank request failed: {e.response.status_code} - {e.response.text}')
+        except httpx.TimeoutException:
+            raise errors.RequesterError('Rerank request timed out')
+        except Exception as e:
+            raise errors.RequesterError(f'Rerank request error: {str(e)}')
+
+    @staticmethod
+    def _parse_rerank_response(data: dict) -> typing.List[dict]:
+        """Parse rerank response from various providers.
+
+        Handles:
+        - Jina/Cohere/SiliconFlow: {"results": [{"index", "relevance_score"}]}
+        - Voyage AI: {"data": [{"index", "relevance_score"}]}
+        - DashScope: {"output": {"results": [{"index", "relevance_score"}]}}
+        """
+        if 'results' in data:
+            return data['results']
+        if 'data' in data:
+            return data['data']
+        if 'output' in data and isinstance(data['output'], dict):
+            return data['output'].get('results', [])
+        return []
--- a/src/langbot/pkg/provider/modelmgr/requesters/chatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/chatcmpl.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: OpenAI
  icon: openai.svg
 spec:
-  litellm_provider: openai
  config:
  - name: base_url
    label:
--- a/src/langbot/pkg/provider/modelmgr/requesters/coherererank.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/coherererank.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: Cohere
  icon: cohere.svg
 spec:
-  litellm_provider: cohere
  config:
  - name: base_url
    label:
--- a/src/langbot/pkg/provider/modelmgr/requesters/compsharechatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/compsharechatcmpl.py
@@ -0,0 +1,17 @@
+from __future__ import annotations
+
+import typing
+import openai
+
+from . import chatcmpl
+
+
+class CompShareChatCompletions(chatcmpl.OpenAIChatCompletions):
+    """CompShare ChatCompletion API 请求器"""
+
+    client: openai.AsyncClient
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'https://api.modelverse.cn/v1',
+        'timeout': 120,
+    }
--- a/src/langbot/pkg/provider/modelmgr/requesters/compsharechatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/compsharechatcmpl.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: 优云智算
  icon: compshare.png
 spec:
-  litellm_provider: openai
  config:
  - name: base_url
    label:
@@ -25,8 +24,6 @@ spec:
    default: 120
  support_type:
  - llm
-  - text-embedding
-  - rerank
  provider_category: maas
 execution:
  python:
--- a/src/langbot/pkg/provider/modelmgr/requesters/deepseekchatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/deepseekchatcmpl.py
@@ -0,0 +1,67 @@
+from __future__ import annotations
+
+import typing
+
+from . import chatcmpl
+from .. import errors, requester
+import langbot_plugin.api.entities.builtin.resource.tool as resource_tool
+import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
+import langbot_plugin.api.entities.builtin.provider.message as provider_message
+
+
+class DeepseekChatCompletions(chatcmpl.OpenAIChatCompletions):
+    """Deepseek ChatCompletion API 请求器"""
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'https://api.deepseek.com',
+        'timeout': 120,
+    }
+
+    async def _closure(
+        self,
+        query: pipeline_query.Query,
+        req_messages: list[dict],
+        use_model: requester.RuntimeLLMModel,
+        use_funcs: list[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> tuple[provider_message.Message, dict]:
+        self.client.api_key = use_model.provider.token_mgr.get_token()
+
+        args = {}
+        args['model'] = use_model.model_entity.name
+
+        if use_funcs:
+            tools = await self.ap.tool_mgr.generate_tools_for_openai(use_funcs)
+
+            if tools:
+                args['tools'] = tools
+
+        # 设置此次请求中的messages
+        messages = req_messages
+
+        # deepseek 不支持多模态，把content都转换成纯文字
+        for m in messages:
+            if 'content' in m and isinstance(m['content'], list):
+                m['content'] = ' '.join([c['text'] for c in m['content'] if 'text' in c])
+
+        args['messages'] = messages
+
+        # 发送请求
+        resp = await self._req(args, extra_body=extra_args)
+
+        # print(resp)
+
+        if resp is None:
+            raise errors.RequesterError('接口返回为空，请确定模型提供商服务是否正常')
+        # 处理请求结果
+        message = await self._make_msg(resp, remove_think)
+
+        # Extract token usage from response
+        usage_info = {}
+        if hasattr(resp, 'usage') and resp.usage:
+            usage_info['input_tokens'] = resp.usage.prompt_tokens or 0
+            usage_info['output_tokens'] = resp.usage.completion_tokens or 0
+            usage_info['total_tokens'] = resp.usage.total_tokens or 0
+
+        return message, usage_info
--- a/src/langbot/pkg/provider/modelmgr/requesters/deepseekchatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/deepseekchatcmpl.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: DeepSeek
  icon: deepseek.svg
 spec:
-  litellm_provider: deepseek
  config:
  - name: base_url
    label:
@@ -25,8 +24,6 @@ spec:
    default: 120
  support_type:
  - llm
-  - text-embedding
-  - rerank
  provider_category: manufacturer
 execution:
  python:
--- a/src/langbot/pkg/provider/modelmgr/requesters/doubao.svg
+++ b/src/langbot/pkg/provider/modelmgr/requesters/doubao.svg
@@ -1,4 +0,0 @@
-<svg width="60" height="50" viewBox="0 0 60 50" xmlns="http://www.w3.org/2000/svg">
-  <rect width="60" height="50" rx="8" fill="#3B82F6"/>
-  <text x="30" y="32" font-family="Arial, sans-serif" font-size="12" font-weight="bold" fill="white" text-anchor="middle">豆包</text>
-</svg>
--- a/src/langbot/pkg/provider/modelmgr/requesters/doubaochatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/doubaochatcmpl.yaml
@@ -1,30 +0,0 @@
-apiVersion: v1
-kind: LLMAPIRequester
-metadata:
-  name: doubao-chat-completions
-  label:
-    en_US: ByteDance Doubao
-    zh_Hans: 字节豆包
-  icon: doubao.svg
-spec:
-  litellm_provider: openai
-  config:
-  - name: base_url
-    label:
-      en_US: Base URL
-      zh_Hans: 基础 URL
-    type: string
-    required: true
-    default: https://ark.cn-beijing.volces.com/api/v3
-  - name: timeout
-    label:
-      en_US: Timeout
-      zh_Hans: 超时时间
-    type: integer
-    required: true
-    default: 120
-  support_type:
-  - llm
-  - text-embedding
-  - rerank
-  provider_category: manufacturer
--- a/src/langbot/pkg/provider/modelmgr/requesters/geminichatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/geminichatcmpl.py
@@ -0,0 +1,205 @@
+from __future__ import annotations
+
+import typing
+import httpx
+
+from . import chatcmpl
+
+import uuid
+
+from .. import requester
+import langbot_plugin.api.entities.builtin.provider.message as provider_message
+import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
+import langbot_plugin.api.entities.builtin.resource.tool as resource_tool
+
+
+class GeminiChatCompletions(chatcmpl.OpenAIChatCompletions):
+    """Google Gemini API 请求器"""
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'https://generativelanguage.googleapis.com/v1beta/openai',
+        'timeout': 120,
+    }
+
+    async def scan_models(self, api_key: str | None = None) -> dict[str, typing.Any]:
+        models_url = 'https://generativelanguage.googleapis.com/v1beta/models'
+        params = {'key': api_key} if api_key else {}
+
+        all_models: list[dict[str, typing.Any]] = []
+        next_page_token = ''
+        last_payload: dict[str, typing.Any] = {}
+
+        async with httpx.AsyncClient(trust_env=True, timeout=self.requester_cfg['timeout']) as client:
+            while True:
+                request_params = dict(params)
+                if next_page_token:
+                    request_params['pageToken'] = next_page_token
+
+                response = await client.get(models_url, params=request_params)
+                response.raise_for_status()
+                payload = response.json()
+                last_payload = payload
+
+                for item in payload.get('models', []):
+                    model_name = item.get('name', '')
+                    model_id = model_name.replace('models/', '', 1)
+                    if not model_id:
+                        continue
+
+                    supported_methods = item.get('supportedGenerationMethods', []) or []
+                    if 'embedContent' in supported_methods and 'generateContent' not in supported_methods:
+                        model_type = 'embedding'
+                    else:
+                        model_type = 'llm'
+
+                    all_models.append(
+                        {
+                            'id': model_id,
+                            'name': model_id,
+                            'type': model_type,
+                            'abilities': self._infer_model_abilities(item, model_id),
+                            'display_name': item.get('displayName') or None,
+                            'description': item.get('description') or None,
+                            'context_length': item.get('inputTokenLimit'),
+                            'input_modalities': self._normalize_modalities(item.get('inputModalities')),
+                            'output_modalities': self._normalize_modalities(item.get('outputModalities')),
+                        }
+                    )
+
+                next_page_token = payload.get('nextPageToken', '')
+                if not next_page_token:
+                    break
+
+        all_models.sort(key=lambda item: (item['type'] != 'llm', item['name'].lower()))
+        return {
+            'models': all_models,
+            'debug': {
+                'request': {
+                    'method': 'GET',
+                    'url': models_url,
+                    'query': {'key': self._mask_api_key(api_key)} if api_key else {},
+                },
+                'response': last_payload,
+            },
+        }
+
+    async def _closure_stream(
+        self,
+        query: pipeline_query.Query,
+        req_messages: list[dict],
+        use_model: requester.RuntimeLLMModel,
+        use_funcs: list[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> provider_message.MessageChunk:
+        self.client.api_key = use_model.provider.token_mgr.get_token()
+
+        args = {}
+        args['model'] = use_model.model_entity.name
+
+        if use_funcs:
+            tools = await self.ap.tool_mgr.generate_tools_for_openai(use_funcs)
+            if tools:
+                args['tools'] = tools
+
+        # 设置此次请求中的messages
+        messages = req_messages.copy()
+
+        # 检查vision
+        for msg in messages:
+            if 'content' in msg and isinstance(msg['content'], list):
+                for me in msg['content']:
+                    if me['type'] == 'image_base64':
+                        me['image_url'] = {'url': me['image_base64']}
+                        me['type'] = 'image_url'
+                        del me['image_base64']
+
+        args['messages'] = messages
+        args['stream'] = True
+
+        # 流式处理状态
+        # tool_calls_map: dict[str, provider_message.ToolCall] = {}
+        chunk_idx = 0
+        thinking_started = False
+        thinking_ended = False
+        role = 'assistant'  # 默认角色
+        tool_id = ''
+        tool_name = ''
+        # accumulated_reasoning = ''  # 仅用于判断何时结束思维链
+
+        async for chunk in self._req_stream(args, extra_body=extra_args):
+            # 解析 chunk 数据
+
+            if hasattr(chunk, 'choices') and chunk.choices:
+                choice = chunk.choices[0]
+                delta = choice.delta.model_dump() if hasattr(choice, 'delta') else {}
+
+                finish_reason = getattr(choice, 'finish_reason', None)
+            else:
+                delta = {}
+                finish_reason = None
+            # 从第一个 chunk 获取 role，后续使用这个 role
+            if 'role' in delta and delta['role']:
+                role = delta['role']
+
+            # 获取增量内容
+            delta_content = delta.get('content', '')
+            reasoning_content = delta.get('reasoning_content', '')
+
+            # 处理 reasoning_content
+            if reasoning_content:
+                # accumulated_reasoning += reasoning_content
+                # 如果设置了 remove_think，跳过 reasoning_content
+                if remove_think:
+                    chunk_idx += 1
+                    continue
+
+                # 第一次出现 reasoning_content，添加 <think> 开始标签
+                if not thinking_started:
+                    thinking_started = True
+                    delta_content = '<think>\n' + reasoning_content
+                else:
+                    # 继续输出 reasoning_content
+                    delta_content = reasoning_content
+            elif thinking_started and not thinking_ended and delta_content:
+                # reasoning_content 结束，normal content 开始，添加 </think> 结束标签
+                thinking_ended = True
+                delta_content = '\n</think>\n' + delta_content
+
+            # 处理 content 中已有的 <think> 标签（如果需要移除）
+            # if delta_content and remove_think and '<think>' in delta_content:
+            #     import re
+            #
+            #     # 移除 <think> 标签及其内容
+            #     delta_content = re.sub(r'<think>.*?</think>', '', delta_content, flags=re.DOTALL)
+
+            # 处理工具调用增量
+            # delta_tool_calls = None
+            if delta.get('tool_calls'):
+                for tool_call in delta['tool_calls']:
+                    if tool_call['id'] == '' and tool_id == '':
+                        tool_id = str(uuid.uuid4())
+                    if tool_call['function']['name']:
+                        tool_name = tool_call['function']['name']
+                    tool_call['id'] = tool_id
+                    tool_call['function']['name'] = tool_name
+                    if tool_call['type'] is None:
+                        tool_call['type'] = 'function'
+
+            # 跳过空的第一个 chunk（只有 role 没有内容）
+            if chunk_idx == 0 and not delta_content and not reasoning_content and not delta.get('tool_calls'):
+                chunk_idx += 1
+                continue
+            # 构建 MessageChunk - 只包含增量内容
+            chunk_data = {
+                'role': role,
+                'content': delta_content if delta_content else None,
+                'tool_calls': delta.get('tool_calls'),
+                'is_final': bool(finish_reason),
+            }
+
+            # 移除 None 值
+            chunk_data = {k: v for k, v in chunk_data.items() if v is not None}
+
+            yield provider_message.MessageChunk(**chunk_data)
+            chunk_idx += 1
--- a/src/langbot/pkg/provider/modelmgr/requesters/geminichatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/geminichatcmpl.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: Google Gemini
  icon: gemini.svg
 spec:
-  litellm_provider: gemini
  config:
  - name: base_url
    label:
@@ -25,8 +24,6 @@ spec:
    default: 120
  support_type:
  - llm
-  - text-embedding
-  - rerank
  provider_category: manufacturer
 execution:
  python:
--- a/src/langbot/pkg/provider/modelmgr/requesters/giteeaichatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/giteeaichatcmpl.py
@@ -0,0 +1,15 @@
+from __future__ import annotations
+
+
+import typing
+
+from . import ppiochatcmpl
+
+
+class GiteeAIChatCompletions(ppiochatcmpl.PPIOChatCompletions):
+    """Gitee AI ChatCompletions API 请求器"""
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'https://ai.gitee.com/v1',
+        'timeout': 120,
+    }
--- a/src/langbot/pkg/provider/modelmgr/requesters/giteeaichatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/giteeaichatcmpl.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: Gitee AI
  icon: giteeai.svg
 spec:
-  litellm_provider: openai
  config:
  - name: base_url
    label:
--- a/src/langbot/pkg/provider/modelmgr/requesters/groq.svg
+++ b/src/langbot/pkg/provider/modelmgr/requesters/groq.svg
@@ -1,4 +0,0 @@
-<svg width="60" height="50" viewBox="0 0 60 50" xmlns="http://www.w3.org/2000/svg">
-  <rect width="60" height="50" rx="8" fill="#F97316"/>
-  <text x="30" y="32" font-family="Arial, sans-serif" font-size="14" font-weight="bold" fill="white" text-anchor="middle">Groq</text>
-</svg>
--- a/src/langbot/pkg/provider/modelmgr/requesters/groqchatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/groqchatcmpl.yaml
@@ -1,30 +0,0 @@
-apiVersion: v1
-kind: LLMAPIRequester
-metadata:
-  name: groq-chat-completions
-  label:
-    en_US: Groq
-    zh_Hans: Groq
-  icon: groq.svg
-spec:
-  litellm_provider: groq
-  config:
-  - name: base_url
-    label:
-      en_US: Base URL
-      zh_Hans: 基础 URL
-    type: string
-    required: true
-    default: https://api.groq.com/openai/v1
-  - name: timeout
-    label:
-      en_US: Timeout
-      zh_Hans: 超时时间
-    type: integer
-    required: true
-    default: 120
-  support_type:
-  - llm
-  - text-embedding
-  - rerank
-  provider_category: manufacturer
--- a/src/langbot/pkg/provider/modelmgr/requesters/iflytek.svg
+++ b/src/langbot/pkg/provider/modelmgr/requesters/iflytek.svg
@@ -1,5 +0,0 @@
-<svg width="60" height="50" viewBox="0 0 60 50" xmlns="http://www.w3.org/2000/svg">
-  <rect width="60" height="50" rx="8" fill="#0066FF"/>
-  <text x="30" y="28" font-family="Arial, sans-serif" font-size="10" font-weight="bold" fill="white" text-anchor="middle">iFlytek</text>
-  <text x="30" y="40" font-family="Arial, sans-serif" font-size="8" fill="white" text-anchor="middle">Spark</text>
-</svg>
--- a/src/langbot/pkg/provider/modelmgr/requesters/iflytekchatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/iflytekchatcmpl.yaml
@@ -1,30 +0,0 @@
-apiVersion: v1
-kind: LLMAPIRequester
-metadata:
-  name: iflytek-chat-completions
-  label:
-    en_US: iFlytek Spark
-    zh_Hans: 讯飞星火
-  icon: iflytek.svg
-spec:
-  litellm_provider: openai
-  config:
-  - name: base_url
-    label:
-      en_US: Base URL
-      zh_Hans: 基础 URL
-    type: string
-    required: true
-    default: https://spark-api-open.xf-yun.com/v1
-  - name: timeout
-    label:
-      en_US: Timeout
-      zh_Hans: 超时时间
-    type: integer
-    required: true
-    default: 120
-  support_type:
-  - llm
-  - text-embedding
-  - rerank
-  provider_category: manufacturer
--- a/src/langbot/pkg/provider/modelmgr/requesters/jiekouaichatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/jiekouaichatcmpl.py
@@ -0,0 +1,208 @@
+from __future__ import annotations
+
+import openai
+import typing
+
+from . import chatcmpl
+from .. import requester
+import openai.types.chat.chat_completion as chat_completion
+import re
+import langbot_plugin.api.entities.builtin.provider.message as provider_message
+import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
+import langbot_plugin.api.entities.builtin.resource.tool as resource_tool
+
+
+class JieKouAIChatCompletions(chatcmpl.OpenAIChatCompletions):
+    """接口 AI ChatCompletion API 请求器"""
+
+    client: openai.AsyncClient
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'https://api.jiekou.ai/openai',
+        'timeout': 120,
+    }
+
+    is_think: bool = False
+
+    async def _make_msg(
+        self,
+        chat_completion: chat_completion.ChatCompletion,
+        remove_think: bool,
+    ) -> provider_message.Message:
+        chatcmpl_message = chat_completion.choices[0].message.model_dump()
+        # print(chatcmpl_message.keys(), chatcmpl_message.values())
+
+        # 确保 role 字段存在且不为 None
+        if 'role' not in chatcmpl_message or chatcmpl_message['role'] is None:
+            chatcmpl_message['role'] = 'assistant'
+
+        reasoning_content = chatcmpl_message['reasoning_content'] if 'reasoning_content' in chatcmpl_message else None
+
+        # deepseek的reasoner模型
+        chatcmpl_message['content'] = await self._process_thinking_content(
+            chatcmpl_message['content'], reasoning_content, remove_think
+        )
+
+        # 移除 reasoning_content 字段，避免传递给 Message
+        if 'reasoning_content' in chatcmpl_message:
+            del chatcmpl_message['reasoning_content']
+
+        message = provider_message.Message(**chatcmpl_message)
+
+        return message
+
+    async def _process_thinking_content(
+        self,
+        content: str,
+        reasoning_content: str = None,
+        remove_think: bool = False,
+    ) -> tuple[str, str]:
+        """处理思维链内容
+
+        Args:
+            content: 原始内容
+            reasoning_content: reasoning_content 字段内容
+            remove_think: 是否移除思维链
+
+        Returns:
+            处理后的内容
+        """
+        if remove_think:
+            content = re.sub(r'<think>.*?</think>', '', content, flags=re.DOTALL)
+        else:
+            if reasoning_content is not None:
+                content = '<think>\n' + reasoning_content + '\n</think>\n' + content
+        return content
+
+    async def _make_msg_chunk(
+        self,
+        delta: dict[str, typing.Any],
+        idx: int,
+    ) -> provider_message.MessageChunk:
+        # 处理流式chunk和完整响应的差异
+        # print(chat_completion.choices[0])
+
+        # 确保 role 字段存在且不为 None
+        if 'role' not in delta or delta['role'] is None:
+            delta['role'] = 'assistant'
+
+        reasoning_content = delta['reasoning_content'] if 'reasoning_content' in delta else None
+
+        delta['content'] = '' if delta['content'] is None else delta['content']
+        # print(reasoning_content)
+
+        # deepseek的reasoner模型
+
+        if reasoning_content is not None:
+            delta['content'] += reasoning_content
+
+        message = provider_message.MessageChunk(**delta)
+
+        return message
+
+    async def _closure_stream(
+        self,
+        query: pipeline_query.Query,
+        req_messages: list[dict],
+        use_model: requester.RuntimeLLMModel,
+        use_funcs: list[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> provider_message.Message | typing.AsyncGenerator[provider_message.MessageChunk, None]:
+        self.client.api_key = use_model.provider.token_mgr.get_token()
+
+        args = {}
+        args['model'] = use_model.model_entity.name
+
+        if use_funcs:
+            tools = await self.ap.tool_mgr.generate_tools_for_openai(use_funcs)
+
+            if tools:
+                args['tools'] = tools
+
+        # 设置此次请求中的messages
+        messages = req_messages.copy()
+
+        # 检查vision
+        for msg in messages:
+            if 'content' in msg and isinstance(msg['content'], list):
+                for me in msg['content']:
+                    if me['type'] == 'image_base64':
+                        me['image_url'] = {'url': me['image_base64']}
+                        me['type'] = 'image_url'
+                        del me['image_base64']
+
+        args['messages'] = messages
+        args['stream'] = True
+
+        # tool_calls_map: dict[str, provider_message.ToolCall] = {}
+        chunk_idx = 0
+        thinking_started = False
+        thinking_ended = False
+        role = 'assistant'  # 默认角色
+        async for chunk in self._req_stream(args, extra_body=extra_args):
+            # 解析 chunk 数据
+            if hasattr(chunk, 'choices') and chunk.choices:
+                choice = chunk.choices[0]
+                delta = choice.delta.model_dump() if hasattr(choice, 'delta') else {}
+                finish_reason = getattr(choice, 'finish_reason', None)
+            else:
+                delta = {}
+                finish_reason = None
+
+            # 从第一个 chunk 获取 role，后续使用这个 role
+            if 'role' in delta and delta['role']:
+                role = delta['role']
+
+            # 获取增量内容
+            delta_content = delta.get('content', '')
+            # reasoning_content = delta.get('reasoning_content', '')
+
+            if remove_think:
+                if delta['content'] is not None:
+                    if '<think>' in delta['content'] and not thinking_started and not thinking_ended:
+                        thinking_started = True
+                        continue
+                    elif delta['content'] == r'</think>' and not thinking_ended:
+                        thinking_ended = True
+                        continue
+                    elif thinking_ended and delta['content'] == '\n\n' and thinking_started:
+                        thinking_started = False
+                        continue
+                    elif thinking_started and not thinking_ended:
+                        continue
+
+            # delta_tool_calls = None
+            if delta.get('tool_calls'):
+                for tool_call in delta['tool_calls']:
+                    if tool_call['id'] and tool_call['function']['name']:
+                        tool_id = tool_call['id']
+                        tool_name = tool_call['function']['name']
+
+                    if tool_call['id'] is None:
+                        tool_call['id'] = tool_id
+                    if tool_call['function']['name'] is None:
+                        tool_call['function']['name'] = tool_name
+                    if tool_call['function']['arguments'] is None:
+                        tool_call['function']['arguments'] = ''
+                    if tool_call['type'] is None:
+                        tool_call['type'] = 'function'
+
+            # 跳过空的第一个 chunk（只有 role 没有内容）
+            if chunk_idx == 0 and not delta_content and not delta.get('tool_calls'):
+                chunk_idx += 1
+                continue
+
+            # 构建 MessageChunk - 只包含增量内容
+            chunk_data = {
+                'role': role,
+                'content': delta_content if delta_content else None,
+                'tool_calls': delta.get('tool_calls'),
+                'is_final': bool(finish_reason),
+            }
+
+            # 移除 None 值
+            chunk_data = {k: v for k, v in chunk_data.items() if v is not None}
+
+            yield provider_message.MessageChunk(**chunk_data)
+            chunk_idx += 1
--- a/src/langbot/pkg/provider/modelmgr/requesters/jiekouaichatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/jiekouaichatcmpl.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: 接口 AI
  icon: jiekouai.png
 spec:
-  litellm_provider: openai
  config:
  - name: base_url
    label:
--- a/src/langbot/pkg/provider/modelmgr/requesters/jinarerank.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/jinarerank.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: Jina
  icon: jina.svg
 spec:
-  litellm_provider: openai
  config:
  - name: base_url
    label:
--- a/src/langbot/pkg/provider/modelmgr/requesters/litellmchat.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/litellmchat.py
@@ -1,644 +0,0 @@
-"""LiteLLM unified requester for chat, embedding, and rerank."""
-
-from __future__ import annotations
-
-import typing
-
-import litellm
-from litellm import acompletion, aembedding, arerank
-
-from .. import errors, requester
-import langbot_plugin.api.entities.builtin.resource.tool as resource_tool
-import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
-import langbot_plugin.api.entities.builtin.provider.message as provider_message
-
-
-class LiteLLMRequester(requester.ProviderAPIRequester):
-    """LiteLLM unified API requester supporting chat, embedding, and rerank."""
-
-    _EMBEDDING_MODEL_HINTS = ('embedding', 'embed', 'bge-', 'e5-', 'm3e', 'gte-', 'text-embedding')
-    _RERANK_MODEL_HINTS = ('rerank', 're-rank', 're_rank')
-
-    default_config: dict[str, typing.Any] = {
-        'base_url': '',
-        'timeout': 120,
-        'custom_llm_provider': '',
-        'drop_params': False,
-        'num_retries': 0,
-        'api_version': '',
-    }
-
-    async def initialize(self):
-        """Initialize LiteLLM client settings."""
-        # LiteLLM doesn't require explicit client initialization
-        # Configuration is passed per-request via litellm params
-        pass
-
-    def _build_litellm_model_name(self, model_name: str, custom_llm_provider: str | None = None) -> str:
-        """Build LiteLLM model name with provider prefix if needed."""
-        provider = custom_llm_provider or self.requester_cfg.get('custom_llm_provider', '')
-        if provider:
-            # LiteLLM format: provider/model_name
-            if model_name.startswith(f'{provider}/'):
-                return model_name
-            return f'{provider}/{model_name}'
-        # If no custom provider, assume model_name already includes prefix or is OpenAI-compatible
-        return model_name
-
-    def _get_custom_llm_provider(self) -> str | None:
-        return self.requester_cfg.get('custom_llm_provider') or None
-
-    def _safe_litellm_bool_helper(self, helper_name: str, model_name: str) -> bool:
-        """Call a LiteLLM boolean capability helper without letting metadata gaps fail requests."""
-        helper = getattr(litellm, helper_name, None)
-        if not callable(helper):
-            return False
-
-        provider = self._get_custom_llm_provider()
-        candidates: list[tuple[str, str | None]] = [(model_name, provider)]
-        litellm_model_name = self._build_litellm_model_name(model_name)
-        if litellm_model_name != model_name:
-            candidates.append((litellm_model_name, None))
-        for metadata_provider in self._metadata_provider_candidates(model_name):
-            candidates.append((f'{metadata_provider}/{model_name}', None))
-
-        tried_candidates: set[tuple[str, str | None]] = set()
-        for candidate_model, candidate_provider in candidates:
-            candidate_key = (candidate_model, candidate_provider)
-            if candidate_key in tried_candidates:
-                continue
-            tried_candidates.add(candidate_key)
-            try:
-                if bool(helper(model=candidate_model, custom_llm_provider=candidate_provider)):
-                    return True
-            except Exception:
-                continue
-        return False
-
-    def _context_length_from_scan_payload(self, model_payload: dict[str, typing.Any] | None) -> int | None:
-        if not model_payload:
-            return None
-
-        for field_name in ('context_length', 'context_window', 'max_context_length'):
-            value = model_payload.get(field_name)
-            if isinstance(value, bool):
-                continue
-            if isinstance(value, int) and value > 0:
-                return value
-            if isinstance(value, str) and value.isdigit():
-                parsed_value = int(value)
-                if parsed_value > 0:
-                    return parsed_value
-        return None
-
-    def _metadata_provider_candidates(self, model_name: str) -> list[str]:
-        normalized_model_name = (model_name or '').lower()
-        candidates = []
-        if normalized_model_name.startswith(('moonshot-', 'kimi-')):
-            candidates.append('moonshot')
-        if normalized_model_name.startswith('deepseek-'):
-            candidates.append('deepseek')
-
-        base_url = self.requester_cfg.get('base_url', '').lower()
-        if 'moonshot' in base_url:
-            candidates.append('moonshot')
-        if 'deepseek' in base_url:
-            candidates.append('deepseek')
-
-        deduped_candidates = []
-        for candidate in candidates:
-            if candidate not in deduped_candidates:
-                deduped_candidates.append(candidate)
-        return deduped_candidates
-
-    def _known_context_length_fallback(self, model_name: str) -> int | None:
-        normalized_model_name = (model_name or '').lower()
-        if normalized_model_name.startswith('deepseek-v4-'):
-            return 1_000_000
-        if normalized_model_name.startswith(('kimi-k2.5', 'kimi-k2.6')):
-            return 256 * 1024
-        if normalized_model_name.startswith('moonshot-v1-8k'):
-            return 8 * 1024
-        if normalized_model_name.startswith('moonshot-v1-32k'):
-            return 32 * 1024
-        if normalized_model_name.startswith('moonshot-v1-128k') or normalized_model_name == 'moonshot-v1-auto':
-            return 128 * 1024
-        return None
-
-    def _safe_context_length(self, model_name: str) -> int | None:
-        helper = getattr(litellm, 'get_max_tokens', None)
-        if not callable(helper):
-            return self._known_context_length_fallback(model_name)
-
-        candidates = [model_name]
-        litellm_model_name = self._build_litellm_model_name(model_name)
-        if litellm_model_name != model_name:
-            candidates.append(litellm_model_name)
-        for provider in self._metadata_provider_candidates(model_name):
-            candidates.append(f'{provider}/{model_name}')
-
-        tried_candidates = []
-        for candidate in candidates:
-            if candidate in tried_candidates:
-                continue
-            tried_candidates.append(candidate)
-            try:
-                max_tokens = helper(candidate)
-            except Exception:
-                continue
-            if isinstance(max_tokens, int) and max_tokens > 0:
-                return max_tokens
-        return self._known_context_length_fallback(model_name)
-
-    def _supports_function_calling(self, model_name: str) -> bool:
-        return self._safe_litellm_bool_helper('supports_function_calling', model_name)
-
-    def _supports_vision(self, model_name: str) -> bool:
-        return self._safe_litellm_bool_helper('supports_vision', model_name)
-
-    def _infer_model_type(self, model_id: str) -> str:
-        normalized_id = (model_id or '').lower()
-        if any(kw in normalized_id for kw in self._RERANK_MODEL_HINTS):
-            return 'rerank'
-        if any(kw in normalized_id for kw in self._EMBEDDING_MODEL_HINTS):
-            return 'embedding'
-        return 'llm'
-
-    def _enrich_scanned_model(
-        self,
-        model_id: str,
-        model_payload: dict[str, typing.Any] | None = None,
-    ) -> dict[str, typing.Any]:
-        model_type = self._infer_model_type(model_id)
-        scanned_model: dict[str, typing.Any] = {
-            'id': model_id,
-            'name': model_id,
-            'type': model_type,
-        }
-
-        if model_type == 'llm':
-            abilities = []
-            if self._supports_function_calling(model_id):
-                abilities.append('func_call')
-            supports_provider_reported_vision = bool(
-                model_payload
-                and (model_payload.get('supports_image_in') is True or model_payload.get('supports_vision') is True)
-            )
-            if supports_provider_reported_vision or self._supports_vision(model_id):
-                abilities.append('vision')
-            scanned_model['abilities'] = abilities
-
-            context_length = self._context_length_from_scan_payload(model_payload)
-            if context_length is None:
-                context_length = self._safe_context_length(model_id)
-            if context_length is not None:
-                scanned_model['context_length'] = context_length
-
-        return scanned_model
-
-    def _convert_messages(self, messages: typing.List[provider_message.Message]) -> list[dict]:
-        """Convert LangBot messages to LiteLLM/OpenAI format."""
-        req_messages = []
-        for m in messages:
-            msg_dict = m.dict(exclude_none=True)
-            content = msg_dict.get('content')
-
-            if isinstance(content, list):
-                for part in content:
-                    if isinstance(part, dict) and part.get('type') == 'image_base64':
-                        part['image_url'] = {'url': part['image_base64']}
-                        part['type'] = 'image_url'
-                        del part['image_base64']
-
-            req_messages.append(msg_dict)
-
-        return req_messages
-
-    def _process_thinking_content(self, content: str, reasoning_content: str | None, remove_think: bool) -> str:
-        """Process thinking/reasoning content.
-
-        Args:
-            content: The main content from response
-            reasoning_content: Separate reasoning content from model
-            remove_think: If True, remove thinking markers; if False, preserve them
-
-        Returns:
-            Processed content string
-        """
-        # Extract and handle thinking tags
-        if content and 'CRETIRE_REASONING_BEGINk' in content and 'CRETIRE_REASONING_ENDk' in content:
-            import re
-
-            think_pattern = r'CRETIRE_REASONING_BEGINk(.*?)CRETIRE_REASONING_ENDk'
-
-            if remove_think:
-                # Remove thinking tags and their content from output
-                content = re.sub(think_pattern, '', content, flags=re.DOTALL).strip()
-            # else: preserve thinking content as-is
-
-        # Handle separate reasoning_content field
-        # Currently we don't include reasoning_content in user-facing output regardless of remove_think
-        # because it's typically internal model reasoning, not user-visible thinking
-        return content or ''
-
-    @staticmethod
-    def _normalize_usage(usage: typing.Any) -> dict:
-        """Normalize a LiteLLM/OpenAI usage object into a plain token dict.
-
-        Handles several real-world shapes returned by different upstreams:
-        - object with ``prompt_tokens`` / ``completion_tokens`` / ``total_tokens`` attrs
-        - dict with the same keys
-        - missing ``total_tokens`` (derived from prompt + completion)
-        - ``None`` / partially-populated usage (defaults to 0)
-        """
-        if usage is None:
-            return {'prompt_tokens': 0, 'completion_tokens': 0, 'total_tokens': 0}
-
-        def _get(key: str) -> typing.Any:
-            if isinstance(usage, dict):
-                return usage.get(key)
-            return getattr(usage, key, None)
-
-        prompt_tokens = _get('prompt_tokens') or 0
-        completion_tokens = _get('completion_tokens') or 0
-        total_tokens = _get('total_tokens') or 0
-
-        # Some providers omit total_tokens in streaming usage; derive it.
-        if not total_tokens:
-            total_tokens = prompt_tokens + completion_tokens
-
-        return {
-            'prompt_tokens': int(prompt_tokens),
-            'completion_tokens': int(completion_tokens),
-            'total_tokens': int(total_tokens),
-        }
-
-    def _extract_usage(self, response) -> dict:
-        """Extract usage info from a non-streaming LiteLLM response."""
-        return self._normalize_usage(getattr(response, 'usage', None))
-
-    @staticmethod
-    def _as_dict(value: typing.Any) -> dict:
-        if value is None:
-            return {}
-        if isinstance(value, dict):
-            return value
-        if hasattr(value, 'model_dump'):
-            return value.model_dump()
-        return {}
-
-    def _normalize_stream_tool_calls(
-        self,
-        raw_tool_calls: typing.Any,
-        tool_call_state: dict[int, dict[str, str]],
-    ) -> list[dict] | None:
-        """Fill OpenAI-style streaming tool-call deltas so MessageChunk can validate them."""
-        if not raw_tool_calls:
-            return None
-
-        normalized = []
-        for fallback_index, raw_tool_call in enumerate(raw_tool_calls):
-            tool_call = self._as_dict(raw_tool_call)
-            index = tool_call.get('index')
-            if not isinstance(index, int):
-                index = fallback_index
-
-            state = tool_call_state.setdefault(index, {'id': '', 'type': 'function', 'name': ''})
-            if tool_call.get('id'):
-                state['id'] = tool_call['id']
-            if tool_call.get('type'):
-                state['type'] = tool_call['type']
-
-            function = self._as_dict(tool_call.get('function'))
-            if function.get('name'):
-                state['name'] = function['name']
-
-            arguments = function.get('arguments')
-            if arguments is None:
-                arguments = ''
-            elif not isinstance(arguments, str):
-                arguments = str(arguments)
-
-            if not state['id'] or not state['name']:
-                continue
-
-            normalized.append(
-                {
-                    'id': state['id'],
-                    'type': state['type'] or 'function',
-                    'function': {
-                        'name': state['name'],
-                        'arguments': arguments,
-                    },
-                }
-            )
-
-        return normalized or None
-
-    def _build_common_args(self, args: dict, include_retry_params: bool = True) -> dict:
-        """Apply common requester config to args dict."""
-        if self.requester_cfg.get('base_url'):
-            args['api_base'] = self.requester_cfg['base_url']
-        if self.requester_cfg.get('timeout'):
-            args['timeout'] = self.requester_cfg['timeout']
-        if include_retry_params:
-            if self.requester_cfg.get('drop_params'):
-                args['drop_params'] = self.requester_cfg['drop_params']
-            if self.requester_cfg.get('num_retries'):
-                args['num_retries'] = self.requester_cfg['num_retries']
-            if self.requester_cfg.get('api_version'):
-                args['api_version'] = self.requester_cfg['api_version']
-        return args
-
-    def _handle_litellm_error(self, e: Exception) -> None:
-        """Convert LiteLLM exceptions to RequesterError. Never returns, always raises."""
-        # Check more specific exceptions first (they inherit from base exceptions)
-        if isinstance(e, litellm.ContextWindowExceededError):
-            raise errors.RequesterError(f'上下文长度超限: {str(e)}')
-        if isinstance(e, litellm.BadRequestError):
-            raise errors.RequesterError(f'请求参数错误: {str(e)}')
-        if isinstance(e, litellm.AuthenticationError):
-            raise errors.RequesterError(f'API key 无效: {str(e)}')
-        if isinstance(e, litellm.NotFoundError):
-            raise errors.RequesterError(f'模型或路径无效: {str(e)}')
-        if isinstance(e, litellm.RateLimitError):
-            raise errors.RequesterError(f'请求过于频繁或余额不足: {str(e)}')
-        if isinstance(e, litellm.Timeout):
-            raise errors.RequesterError(f'请求超时: {str(e)}')
-        if isinstance(e, litellm.APIConnectionError):
-            raise errors.RequesterError(f'连接错误: {str(e)}')
-        if isinstance(e, litellm.APIError):
-            raise errors.RequesterError(f'API 错误: {str(e)}')
-        raise errors.RequesterError(f'未知错误: {str(e)}')
-
-    async def _build_completion_args(
-        self,
-        model: requester.RuntimeLLMModel,
-        messages: typing.List[provider_message.Message],
-        funcs: typing.List[resource_tool.LLMTool] = None,
-        extra_args: dict[str, typing.Any] = {},
-        stream: bool = False,
-    ) -> dict:
-        """Build common completion arguments for invoke_llm and invoke_llm_stream."""
-        req_messages = self._convert_messages(messages)
-        model_name = self._build_litellm_model_name(model.model_entity.name)
-        api_key = model.provider.token_mgr.get_token()
-
-        args = {
-            'model': model_name,
-            'messages': req_messages,
-            'api_key': api_key,
-        }
-        if stream:
-            args['stream'] = True
-            args['stream_options'] = {'include_usage': True}
-        self._build_common_args(args)
-
-        # Apply model-level extra_args first, then call-level extra_args
-        if model.model_entity.extra_args:
-            args.update(model.model_entity.extra_args)
-        args.update(extra_args)
-
-        if funcs:
-            tools = await self.ap.tool_mgr.generate_tools_for_openai(funcs)
-            if tools:
-                args['tools'] = tools
-                args.setdefault('tool_choice', 'auto')
-
-        return args
-
-    async def invoke_llm(
-        self,
-        query: pipeline_query.Query,
-        model: requester.RuntimeLLMModel,
-        messages: typing.List[provider_message.Message],
-        funcs: typing.List[resource_tool.LLMTool] = None,
-        extra_args: dict[str, typing.Any] = {},
-        remove_think: bool = False,
-    ) -> tuple[provider_message.Message, dict]:
-        """Invoke LLM and return message with usage info."""
-        args = await self._build_completion_args(model, messages, funcs, extra_args, stream=False)
-
-        try:
-            response = await acompletion(**args)
-
-            message_data = response.choices[0].message.model_dump()
-            if 'role' not in message_data or message_data['role'] is None:
-                message_data['role'] = 'assistant'
-
-            content = message_data.get('content', '')
-            reasoning_content = message_data.get('reasoning_content', None)
-            message_data['content'] = self._process_thinking_content(content, reasoning_content, remove_think)
-
-            if 'reasoning_content' in message_data:
-                del message_data['reasoning_content']
-
-            message = provider_message.Message(**message_data)
-            usage_info = self._extract_usage(response)
-
-            return message, usage_info
-
-        except Exception as e:
-            self._handle_litellm_error(e)
-
-    async def invoke_llm_stream(
-        self,
-        query: pipeline_query.Query,
-        model: requester.RuntimeLLMModel,
-        messages: typing.List[provider_message.Message],
-        funcs: typing.List[resource_tool.LLMTool] = None,
-        extra_args: dict[str, typing.Any] = {},
-        remove_think: bool = False,
-    ) -> provider_message.MessageChunk:
-        """Invoke LLM streaming and yield chunks."""
-        args = await self._build_completion_args(model, messages, funcs, extra_args, stream=True)
-
-        chunk_idx = 0
-        role = 'assistant'
-        tool_call_state: dict[int, dict[str, str]] = {}
-
-        try:
-            response = await acompletion(**args)
-            async for chunk in response:
-                # Capture usage whenever a chunk carries it.
-                #
-                # Important: many OpenAI-compatible gateways (e.g. new-api) and
-                # providers send the final usage payload in a chunk that STILL
-                # contains a (empty-delta) choice, not an empty `choices` list.
-                # The previous implementation only captured usage when `choices`
-                # was empty, so streamed calls always recorded 0 tokens.
-                # We therefore capture usage independently of `choices`, and then
-                # fall through to also process any content this chunk may carry.
-                if getattr(chunk, 'usage', None):
-                    usage_info = self._normalize_usage(chunk.usage)
-                    if query is not None:
-                        if query.variables is None:
-                            query.variables = {}
-                        query.variables['_stream_usage'] = usage_info
-
-                if not hasattr(chunk, 'choices') or not chunk.choices:
-                    continue
-
-                choice = chunk.choices[0]
-                delta = choice.delta.model_dump() if hasattr(choice, 'delta') else {}
-                finish_reason = getattr(choice, 'finish_reason', None)
-
-                if 'role' in delta and delta['role']:
-                    role = delta['role']
-
-                delta_content = delta.get('content', '')
-                reasoning_content = delta.get('reasoning_content', '')
-
-                # Handle reasoning_content based on remove_think flag
-                if reasoning_content:
-                    if remove_think:
-                        # Skip reasoning content when remove_think is True
-                        chunk_idx += 1
-                        continue
-                    else:
-                        # Use reasoning_content as the displayed content
-                        delta_content = reasoning_content
-
-                tool_calls = self._normalize_stream_tool_calls(delta.get('tool_calls'), tool_call_state)
-
-                if chunk_idx == 0 and not delta_content and not tool_calls:
-                    chunk_idx += 1
-                    continue
-
-                chunk_data = {
-                    'role': role,
-                    'content': delta_content if delta_content else None,
-                    'tool_calls': tool_calls,
-                    'is_final': bool(finish_reason),
-                }
-
-                chunk_data = {k: v for k, v in chunk_data.items() if v is not None}
-                yield provider_message.MessageChunk(**chunk_data)
-                chunk_idx += 1
-
-        except Exception as e:
-            self._handle_litellm_error(e)
-
-    async def invoke_embedding(
-        self,
-        model: requester.RuntimeEmbeddingModel,
-        input_text: list[str],
-        extra_args: dict[str, typing.Any] = {},
-    ) -> tuple[list[list[float]], dict]:
-        """Invoke embedding and return vectors with usage info."""
-        model_name = self._build_litellm_model_name(model.model_entity.name)
-        api_key = model.provider.token_mgr.get_token()
-
-        args = {
-            'model': model_name,
-            'input': input_text,
-            'api_key': api_key,
-        }
-        self._build_common_args(args, include_retry_params=False)
-
-        if model.model_entity.extra_args:
-            args.update(model.model_entity.extra_args)
-
-        args.update(extra_args)
-
-        try:
-            response = await aembedding(**args)
-
-            embeddings = [d.embedding for d in response.data]
-            usage_info = self._extract_usage(response)
-
-            return embeddings, usage_info
-
-        except Exception as e:
-            self._handle_litellm_error(e)
-
-    async def invoke_rerank(
-        self,
-        model: requester.RuntimeRerankModel,
-        query: str,
-        documents: typing.List[str],
-        extra_args: dict[str, typing.Any] = {},
-    ) -> typing.List[dict]:
-        """Invoke rerank and return relevance scores."""
-        model_name = self._build_litellm_model_name(model.model_entity.name)
-        api_key = model.provider.token_mgr.get_token()
-
-        args = {
-            'model': model_name,
-            'query': query,
-            'documents': documents,
-            'api_key': api_key,
-            'top_n': min(len(documents), 64),
-        }
-        self._build_common_args(args, include_retry_params=False)
-
-        if model.model_entity.extra_args:
-            args.update(model.model_entity.extra_args)
-
-        args.update(extra_args)
-
-        try:
-            response = await arerank(**args)
-
-            results = []
-            for r in response.results:
-                results.append(
-                    {
-                        'index': r.get('index', 0),
-                        'relevance_score': r.get('relevance_score', 0.0),
-                    }
-                )
-
-            if results:
-                scores = [r['relevance_score'] for r in results]
-                min_score = min(scores)
-                max_score = max(scores)
-                if max_score - min_score > 1e-6:
-                    for r in results:
-                        r['relevance_score'] = (r['relevance_score'] - min_score) / (max_score - min_score)
-
-            return results
-
-        except Exception as e:
-            self._handle_litellm_error(e)
-
-    async def scan_models(self, api_key: str | None = None) -> dict[str, typing.Any]:
-        """Scan models supported by the provider."""
-        import httpx
-
-        base_url = self.requester_cfg.get('base_url', '').rstrip('/')
-        timeout = self.requester_cfg.get('timeout', 120)
-
-        if not base_url:
-            raise errors.RequesterError('Base URL required for model scanning')
-
-        headers = {}
-        if api_key:
-            headers['Authorization'] = f'Bearer {api_key}'
-
-        models_url = f'{base_url}/models'
-
-        try:
-            async with httpx.AsyncClient(trust_env=True, timeout=timeout) as client:
-                response = await client.get(models_url, headers=headers)
-                response.raise_for_status()
-                payload = response.json()
-
-            models = []
-            for item in payload.get('data', []):
-                model_id = item.get('id')
-                if not model_id:
-                    continue
-
-                models.append(self._enrich_scanned_model(model_id, item))
-
-            models.sort(key=lambda x: (x['type'] != 'llm', x['name'].lower()))
-
-            return {'models': models}
-
-        except httpx.HTTPStatusError as e:
-            raise errors.RequesterError(f'Model scan failed: {e.response.status_code}')
-        except httpx.TimeoutException:
-            raise errors.RequesterError('Model scan timeout')
-        except Exception as e:
-            raise errors.RequesterError(f'Model scan error: {str(e)}')
--- a/src/langbot/pkg/provider/modelmgr/requesters/litellmchat.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/litellmchat.yaml
@@ -1,64 +0,0 @@
-apiVersion: v1
-kind: LLMAPIRequester
-metadata:
-  name: litellm-chat
-  label:
-    en_US: LiteLLM (Unified)
-    zh_Hans: LiteLLM (统一请求器)
-  icon: litellm.svg
-spec:
-  config:
-  - name: base_url
-    label:
-      en_US: Base URL
-      zh_Hans: 基础 URL
-    type: string
-    required: false
-    default: ''
-  - name: timeout
-    label:
-      en_US: Timeout
-      zh_Hans: 超时时间
-    type: integer
-    required: true
-    default: 120
-  - name: custom_llm_provider
-    label:
-      en_US: Custom Provider
-      zh_Hans: 自定义 Provider
-    type: string
-    required: false
-    default: ''
-    description:
-      en_US: Force provider type (e.g., anthropic, openai, gemini)
-      zh_Hans: 强制指定 provider 类型（如 anthropic, openai, gemini）
-  - name: drop_params
-    label:
-      en_US: Drop Unsupported Params
-      zh_Hans: 丢弃不支持参数
-    type: boolean
-    required: false
-    default: false
-  - name: num_retries
-    label:
-      en_US: Number of Retries
-      zh_Hans: 重试次数
-    type: integer
-    required: false
-    default: 0
-  - name: api_version
-    label:
-      en_US: API Version
-      zh_Hans: API 版本
-    type: string
-    required: false
-    default: ''
-  support_type:
-  - llm
-  - text-embedding
-  - rerank
-  provider_category: unified
-execution:
-  python:
-    path: ./litellmchat.py
-    attr: LiteLLMRequester
--- a/src/langbot/pkg/provider/modelmgr/requesters/lmstudiochatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/lmstudiochatcmpl.py
@@ -0,0 +1,17 @@
+from __future__ import annotations
+
+import typing
+import openai
+
+from . import chatcmpl
+
+
+class LmStudioChatCompletions(chatcmpl.OpenAIChatCompletions):
+    """LMStudio ChatCompletion API 请求器"""
+
+    client: openai.AsyncClient
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'http://127.0.0.1:1234/v1',
+        'timeout': 120,
+    }
--- a/src/langbot/pkg/provider/modelmgr/requesters/lmstudiochatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/lmstudiochatcmpl.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: LM Studio
  icon: lmstudio.webp
 spec:
-  litellm_provider: openai
  config:
  - name: base_url
    label:
--- a/src/langbot/pkg/provider/modelmgr/requesters/mimo.svg
+++ b/src/langbot/pkg/provider/modelmgr/requesters/mimo.svg
@@ -1,4 +0,0 @@
-<svg width="60" height="50" viewBox="0 0 60 50" xmlns="http://www.w3.org/2000/svg">
-  <rect width="60" height="50" rx="8" fill="#FF6700"/>
-  <text x="30" y="32" font-family="Arial, sans-serif" font-size="18" font-weight="bold" fill="white" text-anchor="middle">MiMo</text>
-</svg>
--- a/src/langbot/pkg/provider/modelmgr/requesters/mimochatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/mimochatcmpl.yaml
@@ -1,30 +0,0 @@
-apiVersion: v1
-kind: LLMAPIRequester
-metadata:
-  name: mimo-chat-completions
-  label:
-    en_US: Xiaomi MiMo
-    zh_Hans: 小米 MiMo
-  icon: mimo.svg
-spec:
-  litellm_provider: openai
-  config:
-  - name: base_url
-    label:
-      en_US: Base URL
-      zh_Hans: 基础 URL
-    type: string
-    required: true
-    default: https://api.xiaomimimo.com/v1
-  - name: timeout
-    label:
-      en_US: Timeout
-      zh_Hans: 超时时间
-    type: integer
-    required: true
-    default: 120
-  support_type:
-  - llm
-  - text-embedding
-  - rerank
-  provider_category: manufacturer
--- a/src/langbot/pkg/provider/modelmgr/requesters/minimax.svg
+++ b/src/langbot/pkg/provider/modelmgr/requesters/minimax.svg
@@ -1,4 +0,0 @@
-<svg width="60" height="50" viewBox="0 0 60 50" xmlns="http://www.w3.org/2000/svg">
-  <rect width="60" height="50" rx="8" fill="#4F46E5"/>
-  <text x="30" y="32" font-family="Arial, sans-serif" font-size="12" font-weight="bold" fill="white" text-anchor="middle">MiniMax</text>
-</svg>
--- a/src/langbot/pkg/provider/modelmgr/requesters/minimaxchatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/minimaxchatcmpl.yaml
@@ -1,30 +0,0 @@
-apiVersion: v1
-kind: LLMAPIRequester
-metadata:
-  name: minimax-chat-completions
-  label:
-    en_US: MiniMax
-    zh_Hans: MiniMax
-  icon: minimax.svg
-spec:
-  litellm_provider: openai
-  config:
-  - name: base_url
-    label:
-      en_US: Base URL
-      zh_Hans: 基础 URL
-    type: string
-    required: true
-    default: https://api.minimax.chat/v1
-  - name: timeout
-    label:
-      en_US: Timeout
-      zh_Hans: 超时时间
-    type: integer
-    required: true
-    default: 120
-  support_type:
-  - llm
-  - text-embedding
-  - rerank
-  provider_category: manufacturer
--- a/src/langbot/pkg/provider/modelmgr/requesters/mistral.svg
+++ b/src/langbot/pkg/provider/modelmgr/requesters/mistral.svg
@@ -1,5 +0,0 @@
-<svg width="60" height="50" viewBox="0 0 60 50" xmlns="http://www.w3.org/2000/svg">
-  <rect width="60" height="50" rx="8" fill="#FF6B35"/>
-  <text x="30" y="28" font-family="Arial, sans-serif" font-size="10" font-weight="bold" fill="white" text-anchor="middle">Mistral</text>
-  <text x="30" y="40" font-family="Arial, sans-serif" font-size="8" fill="white" text-anchor="middle">AI</text>
-</svg>
--- a/src/langbot/pkg/provider/modelmgr/requesters/mistralchatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/mistralchatcmpl.yaml
@@ -1,30 +0,0 @@
-apiVersion: v1
-kind: LLMAPIRequester
-metadata:
-  name: mistral-chat-completions
-  label:
-    en_US: Mistral AI
-    zh_Hans: Mistral AI
-  icon: mistral.svg
-spec:
-  litellm_provider: mistral
-  config:
-  - name: base_url
-    label:
-      en_US: Base URL
-      zh_Hans: 基础 URL
-    type: string
-    required: true
-    default: https://api.mistral.ai/v1
-  - name: timeout
-    label:
-      en_US: Timeout
-      zh_Hans: 超时时间
-    type: integer
-    required: true
-    default: 120
-  support_type:
-  - llm
-  - text-embedding
-  - rerank
-  provider_category: manufacturer
--- a/src/langbot/pkg/provider/modelmgr/requesters/modelscopechatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/modelscopechatcmpl.py
@@ -0,0 +1,561 @@
+from __future__ import annotations
+
+import asyncio
+import typing
+
+import openai
+import openai.types.chat.chat_completion as chat_completion
+import httpx
+
+from .. import entities, errors, requester
+import langbot_plugin.api.entities.builtin.resource.tool as resource_tool
+import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
+import langbot_plugin.api.entities.builtin.provider.message as provider_message
+
+
+class ModelScopeChatCompletions(requester.ProviderAPIRequester):
+    """ModelScope ChatCompletion API 请求器"""
+
+    client: openai.AsyncClient
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'https://api-inference.modelscope.cn/v1',
+        'timeout': 120,
+    }
+
+    async def initialize(self):
+        self.client = openai.AsyncClient(
+            api_key=self.init_api_key,
+            base_url=self.requester_cfg['base_url'],
+            timeout=self.requester_cfg['timeout'],
+            http_client=httpx.AsyncClient(trust_env=True, timeout=self.requester_cfg['timeout']),
+        )
+
+    def _mask_api_key(self, api_key: str | None) -> str:
+        if not api_key:
+            return ''
+        if len(api_key) <= 8:
+            return '****'
+        return f'{api_key[:4]}...{api_key[-4:]}'
+
+    def _infer_model_type(self, model_id: str) -> str:
+        normalized_model_id = (model_id or '').lower()
+        embedding_keywords = (
+            'embedding',
+            'embed',
+            'bge-',
+            'e5-',
+            'm3e',
+            'gte-',
+            'multilingual-e5',
+            'text-embedding',
+        )
+        return 'embedding' if any(keyword in normalized_model_id for keyword in embedding_keywords) else 'llm'
+
+    def _infer_model_abilities(self, item: dict[str, typing.Any], model_id: str) -> list[str]:
+        normalized_model_id = (model_id or '').lower()
+        abilities: set[str] = set()
+
+        def _flatten(value: typing.Any) -> list[str]:
+            if value is None:
+                return []
+            if isinstance(value, str):
+                return [value.lower()]
+            if isinstance(value, dict):
+                flattened: list[str] = []
+                for nested_value in value.values():
+                    flattened.extend(_flatten(nested_value))
+                return flattened
+            if isinstance(value, (list, tuple, set)):
+                flattened: list[str] = []
+                for nested_value in value:
+                    flattened.extend(_flatten(nested_value))
+                return flattened
+            return [str(value).lower()]
+
+        capability_tokens = _flatten(item.get('capabilities'))
+        capability_tokens.extend(_flatten(item.get('modalities')))
+        capability_tokens.extend(_flatten(item.get('input_modalities')))
+        capability_tokens.extend(_flatten(item.get('output_modalities')))
+        capability_tokens.extend(_flatten(item.get('supported_generation_methods')))
+        capability_tokens.extend(_flatten(item.get('supported_parameters')))
+        capability_tokens.extend(_flatten(item.get('architecture')))
+
+        combined_tokens = capability_tokens + [normalized_model_id]
+
+        vision_keywords = ('vision', 'image', 'file', 'video', 'multimodal', 'vl', 'ocr', 'omni')
+        function_call_keywords = ('function', 'tool', 'tools', 'tool_choice', 'tool_call', 'tool-use', 'tool_use')
+
+        if any(any(keyword in token for keyword in vision_keywords) for token in combined_tokens):
+            abilities.add('vision')
+
+        if any(any(keyword in token for keyword in function_call_keywords) for token in combined_tokens):
+            abilities.add('func_call')
+
+        return sorted(abilities)
+
+    def _normalize_modalities(self, value: typing.Any) -> list[str]:
+        normalized: list[str] = []
+
+        def _collect(item: typing.Any):
+            if item is None:
+                return
+            if isinstance(item, str):
+                for part in item.replace('->', ',').replace('+', ',').split(','):
+                    token = part.strip().lower()
+                    if token and token not in normalized:
+                        normalized.append(token)
+                return
+            if isinstance(item, dict):
+                for nested in item.values():
+                    _collect(nested)
+                return
+            if isinstance(item, (list, tuple, set)):
+                for nested in item:
+                    _collect(nested)
+                return
+
+        _collect(value)
+        return normalized
+
+    def _extract_scan_metadata(self, item: dict[str, typing.Any], model_id: str) -> dict[str, typing.Any]:
+        display_name = item.get('name')
+        if not isinstance(display_name, str) or not display_name.strip() or display_name == model_id:
+            display_name = ''
+
+        description = item.get('description')
+        if not isinstance(description, str) or not description.strip():
+            description = ''
+
+        context_length = item.get('context_length')
+        if context_length is None and isinstance(item.get('top_provider'), dict):
+            context_length = item['top_provider'].get('context_length')
+
+        if not isinstance(context_length, int):
+            try:
+                context_length = int(context_length) if context_length is not None else None
+            except (TypeError, ValueError):
+                context_length = None
+
+        input_modalities = self._normalize_modalities(item.get('input_modalities'))
+        output_modalities = self._normalize_modalities(item.get('output_modalities'))
+
+        if isinstance(item.get('architecture'), dict):
+            if not input_modalities:
+                input_modalities = self._normalize_modalities(item['architecture'].get('input_modalities'))
+            if not output_modalities:
+                output_modalities = self._normalize_modalities(item['architecture'].get('output_modalities'))
+
+        owned_by = item.get('owned_by')
+        if not isinstance(owned_by, str) or not owned_by.strip():
+            owned_by = ''
+
+        return {
+            'display_name': display_name or None,
+            'description': description or None,
+            'context_length': context_length,
+            'owned_by': owned_by or None,
+            'input_modalities': input_modalities,
+            'output_modalities': output_modalities,
+        }
+
+    async def scan_models(self, api_key: str | None = None) -> dict[str, typing.Any]:
+        headers = {}
+        if api_key:
+            headers['Authorization'] = f'Bearer {api_key}'
+
+        models_url = f'{self.requester_cfg["base_url"].rstrip("/")}/models'
+        async with httpx.AsyncClient(trust_env=True, timeout=self.requester_cfg['timeout']) as client:
+            response = await client.get(models_url, headers=headers)
+            response.raise_for_status()
+            payload = response.json()
+
+        models = []
+        for item in payload.get('data', []):
+            model_id = item.get('id')
+            if not model_id:
+                continue
+            models.append(
+                {
+                    'id': model_id,
+                    'name': model_id,
+                    'type': self._infer_model_type(model_id),
+                    'abilities': self._infer_model_abilities(item, model_id),
+                    **self._extract_scan_metadata(item, model_id),
+                }
+            )
+
+        models.sort(key=lambda item: (item['type'] != 'llm', item['name'].lower()))
+        return {
+            'models': models,
+            'debug': {
+                'request': {
+                    'method': 'GET',
+                    'url': models_url,
+                    'headers': {
+                        'Authorization': f'Bearer {self._mask_api_key(api_key)}' if api_key else '',
+                    },
+                },
+                'response': payload,
+            },
+        }
+
+    async def _req(
+        self,
+        query: pipeline_query.Query,
+        args: dict,
+        extra_body: dict = {},
+        remove_think: bool = False,
+    ) -> list[dict[str, typing.Any]]:
+        args['stream'] = True
+
+        chunk = None
+
+        pending_content = ''
+
+        tool_calls = []
+
+        resp_gen: openai.AsyncStream = await self.client.chat.completions.create(**args, extra_body=extra_body)
+
+        chunk_idx = 0
+        thinking_started = False
+        thinking_ended = False
+        tool_id = ''
+        tool_name = ''
+        message_delta = {}
+        async for chunk in resp_gen:
+            if not chunk or not chunk.id or not chunk.choices or not chunk.choices[0] or not chunk.choices[0].delta:
+                continue
+
+            delta = chunk.choices[0].delta.model_dump() if hasattr(chunk.choices[0], 'delta') else {}
+            reasoning_content = delta.get('reasoning_content')
+            # 处理 reasoning_content
+            if reasoning_content:
+                # accumulated_reasoning += reasoning_content
+                # 如果设置了 remove_think，跳过 reasoning_content
+                if remove_think:
+                    chunk_idx += 1
+                    continue
+
+                # 第一次出现 reasoning_content，添加 <think> 开始标签
+                if not thinking_started:
+                    thinking_started = True
+                    pending_content += '<think>\n' + reasoning_content
+                else:
+                    # 继续输出 reasoning_content
+                    pending_content += reasoning_content
+            elif thinking_started and not thinking_ended and delta.get('content'):
+                # reasoning_content 结束，normal content 开始，添加 </think> 结束标签
+                thinking_ended = True
+                pending_content += '\n</think>\n' + delta.get('content')
+
+            if delta.get('content') is not None:
+                pending_content += delta.get('content')
+
+            if delta.get('tool_calls') is not None:
+                for tool_call in delta.get('tool_calls'):
+                    if tool_call['id'] != '':
+                        tool_id = tool_call['id']
+                    if tool_call['function']['name'] is not None:
+                        tool_name = tool_call['function']['name']
+                    if tool_call['function']['arguments'] is None:
+                        continue
+                    tool_call['id'] = tool_id
+                    tool_call['name'] = tool_name
+                    for tc in tool_calls:
+                        if tc['index'] == tool_call['index']:
+                            tc['function']['arguments'] += tool_call['function']['arguments']
+                            break
+                    else:
+                        tool_calls.append(tool_call)
+
+            if chunk.choices[0].finish_reason is not None:
+                break
+        message_delta['content'] = pending_content
+        message_delta['role'] = 'assistant'
+
+        message_delta['tool_calls'] = tool_calls if tool_calls else None
+        return [message_delta]
+
+    async def _make_msg(
+        self,
+        chat_completion: list[dict[str, typing.Any]],
+    ) -> provider_message.Message:
+        chatcmpl_message = chat_completion[0]
+
+        # 确保 role 字段存在且不为 None
+        if 'role' not in chatcmpl_message or chatcmpl_message['role'] is None:
+            chatcmpl_message['role'] = 'assistant'
+
+        message = provider_message.Message(**chatcmpl_message)
+
+        return message
+
+    async def _closure(
+        self,
+        query: pipeline_query.Query,
+        req_messages: list[dict],
+        use_model: requester.RuntimeLLMModel,
+        use_funcs: list[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> tuple[provider_message.Message, dict]:
+        self.client.api_key = use_model.provider.token_mgr.get_token()
+
+        args = {}
+        args['model'] = use_model.model_entity.name
+
+        if use_funcs:
+            tools = await self.ap.tool_mgr.generate_tools_for_openai(use_funcs)
+
+            if tools:
+                args['tools'] = tools
+
+        # 设置此次请求中的messages
+        messages = req_messages.copy()
+
+        # 检查vision
+        for msg in messages:
+            if 'content' in msg and isinstance(msg['content'], list):
+                for me in msg['content']:
+                    if me['type'] == 'image_base64':
+                        me['image_url'] = {'url': me['image_base64']}
+                        me['type'] = 'image_url'
+                        del me['image_base64']
+
+        args['messages'] = messages
+
+        # 发送请求
+        resp = await self._req(query, args, extra_body=extra_args, remove_think=remove_think)
+
+        # 处理请求结果
+        message = await self._make_msg(resp)
+
+        # ModelScope uses streaming, usage info not available
+        usage_info = {}
+
+        return message, usage_info
+
+    async def _req_stream(
+        self,
+        args: dict,
+        extra_body: dict = {},
+    ) -> chat_completion.ChatCompletion:
+        async for chunk in await self.client.chat.completions.create(**args, extra_body=extra_body):
+            yield chunk
+
+    async def _closure_stream(
+        self,
+        query: pipeline_query.Query,
+        req_messages: list[dict],
+        use_model: requester.RuntimeLLMModel,
+        use_funcs: list[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> provider_message.Message | typing.AsyncGenerator[provider_message.MessageChunk, None]:
+        self.client.api_key = use_model.provider.token_mgr.get_token()
+
+        args = {}
+        args['model'] = use_model.model_entity.name
+
+        if use_funcs:
+            tools = await self.ap.tool_mgr.generate_tools_for_openai(use_funcs)
+
+            if tools:
+                args['tools'] = tools
+
+        # 设置此次请求中的messages
+        messages = req_messages.copy()
+
+        # 检查vision
+        for msg in messages:
+            if 'content' in msg and isinstance(msg['content'], list):
+                for me in msg['content']:
+                    if me['type'] == 'image_base64':
+                        me['image_url'] = {'url': me['image_base64']}
+                        me['type'] = 'image_url'
+                        del me['image_base64']
+
+        args['messages'] = messages
+        args['stream'] = True
+
+        # 流式处理状态
+        # tool_calls_map: dict[str, provider_message.ToolCall] = {}
+        chunk_idx = 0
+        thinking_started = False
+        thinking_ended = False
+        role = 'assistant'  # 默认角色
+        # accumulated_reasoning = ''  # 仅用于判断何时结束思维链
+
+        async for chunk in self._req_stream(args, extra_body=extra_args):
+            # 解析 chunk 数据
+            if hasattr(chunk, 'choices') and chunk.choices:
+                choice = chunk.choices[0]
+                delta = choice.delta.model_dump() if hasattr(choice, 'delta') else {}
+                finish_reason = getattr(choice, 'finish_reason', None)
+            else:
+                delta = {}
+                finish_reason = None
+
+            # 从第一个 chunk 获取 role，后续使用这个 role
+            if 'role' in delta and delta['role']:
+                role = delta['role']
+
+            # 获取增量内容
+            delta_content = delta.get('content', '')
+            reasoning_content = delta.get('reasoning_content', '')
+
+            # 处理 reasoning_content
+            if reasoning_content:
+                # accumulated_reasoning += reasoning_content
+                # 如果设置了 remove_think，跳过 reasoning_content
+                if remove_think:
+                    chunk_idx += 1
+                    continue
+
+                # 第一次出现 reasoning_content，添加 <think> 开始标签
+                if not thinking_started:
+                    thinking_started = True
+                    delta_content = '<think>\n' + reasoning_content
+                else:
+                    # 继续输出 reasoning_content
+                    delta_content = reasoning_content
+            elif thinking_started and not thinking_ended and delta_content:
+                # reasoning_content 结束，normal content 开始，添加 </think> 结束标签
+                thinking_ended = True
+                delta_content = '\n</think>\n' + delta_content
+
+            # 处理 content 中已有的 <think> 标签（如果需要移除）
+            # if delta_content and remove_think and '<think>' in delta_content:
+            #     import re
+            #
+            #     # 移除 <think> 标签及其内容
+            #     delta_content = re.sub(r'<think>.*?</think>', '', delta_content, flags=re.DOTALL)
+
+            # 处理工具调用增量
+            if delta.get('tool_calls'):
+                for tool_call in delta['tool_calls']:
+                    if tool_call['id'] != '':
+                        tool_id = tool_call['id']
+                    if tool_call['function']['name'] is not None:
+                        tool_name = tool_call['function']['name']
+
+                    if tool_call['type'] is None:
+                        tool_call['type'] = 'function'
+                    tool_call['id'] = tool_id
+                    tool_call['function']['name'] = tool_name
+                    tool_call['function']['arguments'] = (
+                        '' if tool_call['function']['arguments'] is None else tool_call['function']['arguments']
+                    )
+
+            # 跳过空的第一个 chunk（只有 role 没有内容）
+            if chunk_idx == 0 and not delta_content and not reasoning_content and not delta.get('tool_calls'):
+                chunk_idx += 1
+                continue
+
+            # 构建 MessageChunk - 只包含增量内容
+            chunk_data = {
+                'role': role,
+                'content': delta_content if delta_content else None,
+                'tool_calls': delta.get('tool_calls'),
+                'is_final': bool(finish_reason),
+            }
+
+            # 移除 None 值
+            chunk_data = {k: v for k, v in chunk_data.items() if v is not None}
+
+            yield provider_message.MessageChunk(**chunk_data)
+            chunk_idx += 1
+            # return
+
+    async def invoke_llm(
+        self,
+        query: pipeline_query.Query,
+        model: entities.LLMModelInfo,
+        messages: typing.List[provider_message.Message],
+        funcs: typing.List[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> provider_message.Message:
+        req_messages = []  # req_messages 仅用于类内，外部同步由 query.messages 进行
+        for m in messages:
+            msg_dict = m.dict(exclude_none=True)
+            content = msg_dict.get('content')
+            if isinstance(content, list):
+                # 检查 content 列表中是否每个部分都是文本
+                if all(isinstance(part, dict) and part.get('type') == 'text' for part in content):
+                    # 将所有文本部分合并为一个字符串
+                    msg_dict['content'] = '\n'.join(part['text'] for part in content)
+            req_messages.append(msg_dict)
+
+        try:
+            return await self._closure(
+                query=query,
+                req_messages=req_messages,
+                use_model=model,
+                use_funcs=funcs,
+                extra_args=extra_args,
+                remove_think=remove_think,
+            )
+        except asyncio.TimeoutError:
+            raise errors.RequesterError('请求超时')
+        except openai.BadRequestError as e:
+            if 'context_length_exceeded' in e.message:
+                raise errors.RequesterError(f'上文过长，请重置会话: {e.message}')
+            else:
+                raise errors.RequesterError(f'请求参数错误: {e.message}')
+        except openai.AuthenticationError as e:
+            raise errors.RequesterError(f'无效的 api-key: {e.message}')
+        except openai.NotFoundError as e:
+            raise errors.RequesterError(f'请求路径错误: {e.message}')
+        except openai.RateLimitError as e:
+            raise errors.RequesterError(f'请求过于频繁或余额不足: {e.message}')
+        except openai.APIError as e:
+            raise errors.RequesterError(f'请求错误: {e.message}')
+
+    async def invoke_llm_stream(
+        self,
+        query: pipeline_query.Query,
+        model: requester.RuntimeLLMModel,
+        messages: typing.List[provider_message.Message],
+        funcs: typing.List[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> provider_message.MessageChunk:
+        req_messages = []  # req_messages 仅用于类内，外部同步由 query.messages 进行
+        for m in messages:
+            msg_dict = m.dict(exclude_none=True)
+            content = msg_dict.get('content')
+            if isinstance(content, list):
+                # 检查 content 列表中是否每个部分都是文本
+                if all(isinstance(part, dict) and part.get('type') == 'text' for part in content):
+                    # 将所有文本部分合并为一个字符串
+                    msg_dict['content'] = '\n'.join(part['text'] for part in content)
+            req_messages.append(msg_dict)
+
+        try:
+            async for item in self._closure_stream(
+                query=query,
+                req_messages=req_messages,
+                use_model=model,
+                use_funcs=funcs,
+                extra_args=extra_args,
+                remove_think=remove_think,
+            ):
+                yield item
+
+        except asyncio.TimeoutError:
+            raise errors.RequesterError('请求超时')
+        except openai.BadRequestError as e:
+            if 'context_length_exceeded' in e.message:
+                raise errors.RequesterError(f'上文过长，请重置会话: {e.message}')
+            else:
+                raise errors.RequesterError(f'请求参数错误: {e.message}')
+        except openai.AuthenticationError as e:
+            raise errors.RequesterError(f'无效的 api-key: {e.message}')
+        except openai.NotFoundError as e:
+            raise errors.RequesterError(f'请求路径错误: {e.message}')
+        except openai.RateLimitError as e:
+            raise errors.RequesterError(f'请求过于频繁或余额不足: {e.message}')
+        except openai.APIError as e:
+            raise errors.RequesterError(f'请求错误: {e.message}')
--- a/src/langbot/pkg/provider/modelmgr/requesters/modelscopechatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/modelscopechatcmpl.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: 魔搭社区
  icon: modelscope.svg
 spec:
-  litellm_provider: openai
  config:
  - name: base_url
    label:
@@ -32,8 +31,6 @@ spec:
    default: 120
  support_type:
  - llm
-  - text-embedding
-  - rerank
  provider_category: maas
 execution:
  python:
--- a/src/langbot/pkg/provider/modelmgr/requesters/moonshotchatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/moonshotchatcmpl.py
@@ -0,0 +1,67 @@
+from __future__ import annotations
+
+import typing
+
+
+from . import chatcmpl
+from .. import requester
+import langbot_plugin.api.entities.builtin.resource.tool as resource_tool
+import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
+import langbot_plugin.api.entities.builtin.provider.message as provider_message
+
+
+class MoonshotChatCompletions(chatcmpl.OpenAIChatCompletions):
+    """Moonshot ChatCompletion API 请求器"""
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'https://api.moonshot.cn/v1',
+        'timeout': 120,
+    }
+
+    async def _closure(
+        self,
+        query: pipeline_query.Query,
+        req_messages: list[dict],
+        use_model: requester.RuntimeLLMModel,
+        use_funcs: list[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> tuple[provider_message.Message, dict]:
+        self.client.api_key = use_model.provider.token_mgr.get_token()
+
+        args = {}
+        args['model'] = use_model.model_entity.name
+
+        if use_funcs:
+            tools = await self.ap.tool_mgr.generate_tools_for_openai(use_funcs)
+
+            if tools:
+                args['tools'] = tools
+
+        # 设置此次请求中的messages
+        messages = req_messages
+
+        # deepseek 不支持多模态，把content都转换成纯文字
+        for m in messages:
+            if 'content' in m and isinstance(m['content'], list):
+                m['content'] = ' '.join([c['text'] for c in m['content']])
+
+        # 删除空的，不知道干嘛的，直接删了。
+        # messages = [m for m in messages if m["content"].strip() != "" and ('tool_calls' not in m or not m['tool_calls'])]
+
+        args['messages'] = messages
+
+        # 发送请求
+        resp = await self._req(args, extra_body=extra_args)
+
+        # 处理请求结果
+        message = await self._make_msg(resp, remove_think)
+
+        # Extract token usage from response
+        usage_info = {}
+        if hasattr(resp, 'usage') and resp.usage:
+            usage_info['input_tokens'] = resp.usage.prompt_tokens or 0
+            usage_info['output_tokens'] = resp.usage.completion_tokens or 0
+            usage_info['total_tokens'] = resp.usage.total_tokens or 0
+
+        return message, usage_info
--- a/src/langbot/pkg/provider/modelmgr/requesters/moonshotchatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/moonshotchatcmpl.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: 月之暗面
  icon: moonshot.png
 spec:
-  litellm_provider: openai
  config:
  - name: base_url
    label:
@@ -25,8 +24,6 @@ spec:
    default: 120
  support_type:
  - llm
-  - text-embedding
-  - rerank
  provider_category: manufacturer
 execution:
  python:
--- a/src/langbot/pkg/provider/modelmgr/requesters/newapichatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/newapichatcmpl.py
@@ -0,0 +1,17 @@
+from __future__ import annotations
+
+import typing
+import openai
+
+from . import chatcmpl
+
+
+class NewAPIChatCompletions(chatcmpl.OpenAIChatCompletions):
+    """New API ChatCompletion API 请求器"""
+
+    client: openai.AsyncClient
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'http://localhost:3000/v1',
+        'timeout': 120,
+    }
--- a/src/langbot/pkg/provider/modelmgr/requesters/newapichatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/newapichatcmpl.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: New API
  icon: newapi.png
 spec:
-  litellm_provider: openai
  config:
  - name: base_url
    label:
--- a/src/langbot/pkg/provider/modelmgr/requesters/ollamachat.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/ollamachat.py
@@ -0,0 +1,314 @@
+from __future__ import annotations
+
+import asyncio
+import os
+import typing
+from typing import Union, Mapping, Any, AsyncIterator
+import uuid
+import json
+
+import ollama
+import httpx
+
+from .. import errors, requester
+import langbot_plugin.api.entities.builtin.resource.tool as resource_tool
+import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
+import langbot_plugin.api.entities.builtin.provider.message as provider_message
+
+REQUESTER_NAME: str = 'ollama-chat'
+
+
+class OllamaChatCompletions(requester.ProviderAPIRequester):
+    """Ollama平台 ChatCompletion API请求器"""
+
+    client: ollama.AsyncClient
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'http://127.0.0.1:11434',
+        'timeout': 120,
+    }
+
+    async def initialize(self):
+        os.environ['OLLAMA_HOST'] = self.requester_cfg['base_url']
+        self.client = ollama.AsyncClient(timeout=self.requester_cfg['timeout'])
+
+    def _infer_model_type(self, model_id: str) -> str:
+        normalized_model_id = (model_id or '').lower()
+        embedding_keywords = ('embedding', 'embed', 'bge-', 'e5-', 'm3e', 'gte-', 'text-embedding')
+        return 'embedding' if any(keyword in normalized_model_id for keyword in embedding_keywords) else 'llm'
+
+    def _infer_model_abilities(self, item: dict[str, typing.Any], model_id: str) -> list[str]:
+        normalized_model_id = (model_id or '').lower()
+        abilities: set[str] = set()
+        details = item.get('details', {}) or {}
+        families = details.get('families', []) or []
+        tokens = [normalized_model_id, str(details.get('family', '')).lower()]
+        tokens.extend(str(family).lower() for family in families)
+
+        if any(keyword in token for token in tokens for keyword in ('vision', 'vl', 'omni', 'llava', 'ocr')):
+            abilities.add('vision')
+        if any(keyword in token for token in tokens for keyword in ('tool', 'function')):
+            abilities.add('func_call')
+        return sorted(abilities)
+
+    async def scan_models(self, api_key: str | None = None) -> dict[str, typing.Any]:
+        del api_key
+        models_url = f'{self.requester_cfg["base_url"].rstrip("/")}/api/tags'
+
+        async with httpx.AsyncClient(trust_env=True, timeout=self.requester_cfg['timeout']) as client:
+            response = await client.get(models_url)
+            response.raise_for_status()
+            payload = response.json()
+
+        models: list[dict[str, typing.Any]] = []
+        for item in payload.get('models', []):
+            model_id = item.get('model') or item.get('name')
+            if not model_id:
+                continue
+            models.append(
+                {
+                    'id': model_id,
+                    'name': item.get('name', model_id),
+                    'type': self._infer_model_type(model_id),
+                    'abilities': self._infer_model_abilities(item, model_id),
+                }
+            )
+
+        models.sort(key=lambda item: (item['type'] != 'llm', item['name'].lower()))
+        return {
+            'models': models,
+            'debug': {
+                'request': {
+                    'method': 'GET',
+                    'url': models_url,
+                },
+                'response': payload,
+            },
+        }
+
+    async def _req(
+        self,
+        args: dict,
+    ) -> Union[Mapping[str, Any], AsyncIterator[Mapping[str, Any]]]:
+        return await self.client.chat(**args)
+
+    async def _closure(
+        self,
+        query: pipeline_query.Query,
+        req_messages: list[dict],
+        use_model: requester.RuntimeLLMModel,
+        use_funcs: list[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> provider_message.Message:
+        args = extra_args.copy()
+        args['model'] = use_model.model_entity.name
+
+        messages: list[dict] = req_messages.copy()
+        for msg in messages:
+            if 'content' in msg and isinstance(msg['content'], list):
+                text_content: list = []
+                image_urls: list = []
+                for me in msg['content']:
+                    if me['type'] == 'text':
+                        text_content.append(me['text'])
+                    elif me['type'] == 'image_base64':
+                        image_urls.append(me['image_base64'])
+
+                msg['content'] = '\n'.join(text_content)
+                msg['images'] = [url.split(',')[1] for url in image_urls]
+            if 'tool_calls' in msg:  # LangBot 内部以 str 存储 tool_calls 的参数，这里需要转换为 dict
+                for tool_call in msg['tool_calls']:
+                    tool_call['function']['arguments'] = json.loads(tool_call['function']['arguments'])
+        args['messages'] = messages
+
+        args['tools'] = []
+        if use_funcs:
+            tools = await self.ap.tool_mgr.generate_tools_for_openai(use_funcs)
+            if tools:
+                args['tools'] = tools
+
+        resp = await self._req(args)
+        message: provider_message.Message = await self._make_msg(resp)
+        return message
+
+    async def _make_msg(self, chat_completions: ollama.ChatResponse) -> provider_message.Message:
+        message: ollama.Message = chat_completions.message
+        if message is None:
+            raise ValueError("chat_completions must contain a 'message' field")
+
+        ret_msg: provider_message.Message = None
+
+        if message.content is not None:
+            ret_msg = provider_message.Message(role='assistant', content=message.content)
+        if message.tool_calls is not None and len(message.tool_calls) > 0:
+            tool_calls: list[provider_message.ToolCall] = []
+
+            for tool_call in message.tool_calls:
+                tool_calls.append(
+                    provider_message.ToolCall(
+                        id=uuid.uuid4().hex,
+                        type='function',
+                        function=provider_message.FunctionCall(
+                            name=tool_call.function.name,
+                            arguments=json.dumps(tool_call.function.arguments),
+                        ),
+                    )
+                )
+            ret_msg.tool_calls = tool_calls
+
+        return ret_msg
+
+    async def _prepare_messages(
+        self,
+        messages: typing.List[provider_message.Message],
+    ) -> list[dict]:
+        """Prepare messages for Ollama API request."""
+        req_messages: list = []
+        for m in messages:
+            msg_dict: dict = m.dict(exclude_none=True)
+            content: Any = msg_dict.get('content')
+            if isinstance(content, list):
+                if all(isinstance(part, dict) and part.get('type') == 'text' for part in content):
+                    msg_dict['content'] = '\n'.join(part['text'] for part in content)
+            req_messages.append(msg_dict)
+        return req_messages
+
+    async def invoke_llm(
+        self,
+        query: pipeline_query.Query,
+        model: requester.RuntimeLLMModel,
+        messages: typing.List[provider_message.Message],
+        funcs: typing.List[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> provider_message.Message:
+        req_messages = await self._prepare_messages(messages)
+        try:
+            return await self._closure(
+                query=query,
+                req_messages=req_messages,
+                use_model=model,
+                use_funcs=funcs,
+                extra_args=extra_args,
+                remove_think=remove_think,
+            )
+        except asyncio.TimeoutError:
+            raise errors.RequesterError('请求超时')
+
+    async def invoke_llm_stream(
+        self,
+        query: pipeline_query.Query,
+        model: requester.RuntimeLLMModel,
+        messages: typing.List[provider_message.Message],
+        funcs: typing.List[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> provider_message.MessageChunk:
+        req_messages = await self._prepare_messages(messages)
+
+        try:
+            args = extra_args.copy()
+            args['model'] = model.model_entity.name
+
+            # Process messages for Ollama format
+            msgs: list[dict] = req_messages.copy()
+            for msg in msgs:
+                if 'content' in msg and isinstance(msg['content'], list):
+                    text_content: list = []
+                    image_urls: list = []
+                    for me in msg['content']:
+                        if me['type'] == 'text':
+                            text_content.append(me['text'])
+                        elif me['type'] == 'image_base64':
+                            image_urls.append(me['image_base64'])
+                    msg['content'] = '\n'.join(text_content)
+                    msg['images'] = [url.split(',')[1] for url in image_urls]
+                if 'tool_calls' in msg:
+                    for tool_call in msg['tool_calls']:
+                        tool_call['function']['arguments'] = json.loads(tool_call['function']['arguments'])
+            args['messages'] = msgs
+
+            args['tools'] = []
+            if funcs:
+                tools = await self.ap.tool_mgr.generate_tools_for_openai(funcs)
+                if tools:
+                    args['tools'] = tools
+
+            args['stream'] = True
+
+            chunk_idx = 0
+            thinking_started = False
+            thinking_ended = False
+            role = 'assistant'
+
+            async for chunk in await self.client.chat(**args):
+                message: ollama.Message = chunk.message
+                done = chunk.done
+
+                delta_content = message.content or ''
+                reasoning_content = getattr(message, 'thinking', '') or ''
+
+                # Handle reasoning/thinking content
+                if reasoning_content:
+                    if remove_think:
+                        chunk_idx += 1
+                        continue
+
+                    if not thinking_started:
+                        thinking_started = True
+                        delta_content = '<think>\n' + reasoning_content
+                    else:
+                        delta_content = reasoning_content
+                elif thinking_started and not thinking_ended and delta_content:
+                    thinking_ended = True
+                    delta_content = '\n</think>\n' + delta_content
+
+                # Handle tool calls
+                tool_calls_data = None
+                if message.tool_calls:
+                    tool_calls_data = []
+                    for tc in message.tool_calls:
+                        tool_calls_data.append(
+                            {
+                                'id': uuid.uuid4().hex,
+                                'type': 'function',
+                                'function': {
+                                    'name': tc.function.name,
+                                    'arguments': json.dumps(tc.function.arguments),
+                                },
+                            }
+                        )
+
+                # Skip empty first chunk
+                if chunk_idx == 0 and not delta_content and not reasoning_content and not tool_calls_data:
+                    chunk_idx += 1
+                    continue
+
+                chunk_data = {
+                    'role': role,
+                    'content': delta_content if delta_content else None,
+                    'tool_calls': tool_calls_data,
+                    'is_final': bool(done),
+                }
+                chunk_data = {k: v for k, v in chunk_data.items() if v is not None}
+
+                yield provider_message.MessageChunk(**chunk_data)
+                chunk_idx += 1
+
+        except asyncio.TimeoutError:
+            raise errors.RequesterError('请求超时')
+
+    async def invoke_embedding(
+        self,
+        model: requester.RuntimeEmbeddingModel,
+        input_text: list[str],
+        extra_args: dict[str, typing.Any] = {},
+    ) -> list[list[float]]:
+        return (
+            await self.client.embed(
+                model=model.model_entity.name,
+                input=input_text,
+                **extra_args,
+            )
+        ).embeddings
--- a/src/langbot/pkg/provider/modelmgr/requesters/ollamachat.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/ollamachat.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: Ollama
  icon: ollama.svg
 spec:
-  litellm_provider: ollama
  config:
  - name: base_url
    label:
--- a/src/langbot/pkg/provider/modelmgr/requesters/openrouterchatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/openrouterchatcmpl.py
@@ -0,0 +1,25 @@
+from __future__ import annotations
+
+import typing
+import openai
+
+from . import modelscopechatcmpl
+
+
+class OpenRouterChatCompletions(modelscopechatcmpl.ModelScopeChatCompletions):
+    """OpenRouter ChatCompletion API 请求器"""
+
+    client: openai.AsyncClient
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'https://openrouter.ai/api/v1',
+        'timeout': 120,
+    }
+
+    async def scan_models(self, api_key: str | None = None) -> dict[str, typing.Any]:
+        original_base_url = self.requester_cfg.get('base_url', '')
+        self.requester_cfg['base_url'] = 'https://openrouter.ai/api/v1'
+        try:
+            return await super().scan_models(api_key)
+        finally:
+            self.requester_cfg['base_url'] = original_base_url
--- a/src/langbot/pkg/provider/modelmgr/requesters/openrouterchatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/openrouterchatcmpl.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: OpenRouter
  icon: openrouter.svg
 spec:
-  litellm_provider: openai
  config:
  - name: base_url
    label:
--- a/src/langbot/pkg/provider/modelmgr/requesters/ppiochatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/ppiochatcmpl.py
@@ -0,0 +1,208 @@
+from __future__ import annotations
+
+import openai
+import typing
+
+from . import chatcmpl
+from .. import requester
+import openai.types.chat.chat_completion as chat_completion
+import re
+import langbot_plugin.api.entities.builtin.provider.message as provider_message
+import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
+import langbot_plugin.api.entities.builtin.resource.tool as resource_tool
+
+
+class PPIOChatCompletions(chatcmpl.OpenAIChatCompletions):
+    """欧派云 ChatCompletion API 请求器"""
+
+    client: openai.AsyncClient
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'https://api.ppinfra.com/v3/openai',
+        'timeout': 120,
+    }
+
+    is_think: bool = False
+
+    async def _make_msg(
+        self,
+        chat_completion: chat_completion.ChatCompletion,
+        remove_think: bool,
+    ) -> provider_message.Message:
+        chatcmpl_message = chat_completion.choices[0].message.model_dump()
+        # print(chatcmpl_message.keys(), chatcmpl_message.values())
+
+        # 确保 role 字段存在且不为 None
+        if 'role' not in chatcmpl_message or chatcmpl_message['role'] is None:
+            chatcmpl_message['role'] = 'assistant'
+
+        reasoning_content = chatcmpl_message['reasoning_content'] if 'reasoning_content' in chatcmpl_message else None
+
+        # deepseek的reasoner模型
+        chatcmpl_message['content'] = await self._process_thinking_content(
+            chatcmpl_message['content'], reasoning_content, remove_think
+        )
+
+        # 移除 reasoning_content 字段，避免传递给 Message
+        if 'reasoning_content' in chatcmpl_message:
+            del chatcmpl_message['reasoning_content']
+
+        message = provider_message.Message(**chatcmpl_message)
+
+        return message
+
+    async def _process_thinking_content(
+        self,
+        content: str,
+        reasoning_content: str = None,
+        remove_think: bool = False,
+    ) -> tuple[str, str]:
+        """处理思维链内容
+
+        Args:
+            content: 原始内容
+            reasoning_content: reasoning_content 字段内容
+            remove_think: 是否移除思维链
+
+        Returns:
+            处理后的内容
+        """
+        if remove_think:
+            content = re.sub(r'<think>.*?</think>', '', content, flags=re.DOTALL)
+        else:
+            if reasoning_content is not None:
+                content = '<think>\n' + reasoning_content + '\n</think>\n' + content
+        return content
+
+    async def _make_msg_chunk(
+        self,
+        delta: dict[str, typing.Any],
+        idx: int,
+    ) -> provider_message.MessageChunk:
+        # 处理流式chunk和完整响应的差异
+        # print(chat_completion.choices[0])
+
+        # 确保 role 字段存在且不为 None
+        if 'role' not in delta or delta['role'] is None:
+            delta['role'] = 'assistant'
+
+        reasoning_content = delta['reasoning_content'] if 'reasoning_content' in delta else None
+
+        delta['content'] = '' if delta['content'] is None else delta['content']
+        # print(reasoning_content)
+
+        # deepseek的reasoner模型
+
+        if reasoning_content is not None:
+            delta['content'] += reasoning_content
+
+        message = provider_message.MessageChunk(**delta)
+
+        return message
+
+    async def _closure_stream(
+        self,
+        query: pipeline_query.Query,
+        req_messages: list[dict],
+        use_model: requester.RuntimeLLMModel,
+        use_funcs: list[resource_tool.LLMTool] = None,
+        extra_args: dict[str, typing.Any] = {},
+        remove_think: bool = False,
+    ) -> provider_message.Message | typing.AsyncGenerator[provider_message.MessageChunk, None]:
+        self.client.api_key = use_model.provider.token_mgr.get_token()
+
+        args = {}
+        args['model'] = use_model.model_entity.name
+
+        if use_funcs:
+            tools = await self.ap.tool_mgr.generate_tools_for_openai(use_funcs)
+
+            if tools:
+                args['tools'] = tools
+
+        # 设置此次请求中的messages
+        messages = req_messages.copy()
+
+        # 检查vision
+        for msg in messages:
+            if 'content' in msg and isinstance(msg['content'], list):
+                for me in msg['content']:
+                    if me['type'] == 'image_base64':
+                        me['image_url'] = {'url': me['image_base64']}
+                        me['type'] = 'image_url'
+                        del me['image_base64']
+
+        args['messages'] = messages
+        args['stream'] = True
+
+        # tool_calls_map: dict[str, provider_message.ToolCall] = {}
+        chunk_idx = 0
+        thinking_started = False
+        thinking_ended = False
+        role = 'assistant'  # 默认角色
+        async for chunk in self._req_stream(args, extra_body=extra_args):
+            # 解析 chunk 数据
+            if hasattr(chunk, 'choices') and chunk.choices:
+                choice = chunk.choices[0]
+                delta = choice.delta.model_dump() if hasattr(choice, 'delta') else {}
+                finish_reason = getattr(choice, 'finish_reason', None)
+            else:
+                delta = {}
+                finish_reason = None
+
+            # 从第一个 chunk 获取 role，后续使用这个 role
+            if 'role' in delta and delta['role']:
+                role = delta['role']
+
+            # 获取增量内容
+            delta_content = delta.get('content', '')
+            # reasoning_content = delta.get('reasoning_content', '')
+
+            if remove_think:
+                if delta['content'] is not None:
+                    if '<think>' in delta['content'] and not thinking_started and not thinking_ended:
+                        thinking_started = True
+                        continue
+                    elif delta['content'] == r'</think>' and not thinking_ended:
+                        thinking_ended = True
+                        continue
+                    elif thinking_ended and delta['content'] == '\n\n' and thinking_started:
+                        thinking_started = False
+                        continue
+                    elif thinking_started and not thinking_ended:
+                        continue
+
+            # delta_tool_calls = None
+            if delta.get('tool_calls'):
+                for tool_call in delta['tool_calls']:
+                    if tool_call['id'] and tool_call['function']['name']:
+                        tool_id = tool_call['id']
+                        tool_name = tool_call['function']['name']
+
+                    if tool_call['id'] is None:
+                        tool_call['id'] = tool_id
+                    if tool_call['function']['name'] is None:
+                        tool_call['function']['name'] = tool_name
+                    if tool_call['function']['arguments'] is None:
+                        tool_call['function']['arguments'] = ''
+                    if tool_call['type'] is None:
+                        tool_call['type'] = 'function'
+
+            # 跳过空的第一个 chunk（只有 role 没有内容）
+            if chunk_idx == 0 and not delta_content and not delta.get('tool_calls'):
+                chunk_idx += 1
+                continue
+
+            # 构建 MessageChunk - 只包含增量内容
+            chunk_data = {
+                'role': role,
+                'content': delta_content if delta_content else None,
+                'tool_calls': delta.get('tool_calls'),
+                'is_final': bool(finish_reason),
+            }
+
+            # 移除 None 值
+            chunk_data = {k: v for k, v in chunk_data.items() if v is not None}
+
+            yield provider_message.MessageChunk(**chunk_data)
+            chunk_idx += 1
--- a/src/langbot/pkg/provider/modelmgr/requesters/ppiochatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/ppiochatcmpl.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: 派欧云
  icon: ppio.svg
 spec:
-  litellm_provider: openai
  config:
  - name: base_url
    label:
--- a/src/langbot/pkg/provider/modelmgr/requesters/qhaigcchatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/qhaigcchatcmpl.py
@@ -0,0 +1,17 @@
+from __future__ import annotations
+
+import openai
+import typing
+
+from . import chatcmpl
+
+
+class QHAIGCChatCompletions(chatcmpl.OpenAIChatCompletions):
+    """启航 AI ChatCompletion API 请求器"""
+
+    client: openai.AsyncClient
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'https://api.qhaigc.com/v1',
+        'timeout': 120,
+    }
--- a/src/langbot/pkg/provider/modelmgr/requesters/qhaigcchatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/qhaigcchatcmpl.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: 启航 AI
  icon: qhaigc.png
 spec:
-  litellm_provider: openai
  config:
  - name: base_url
    label:
--- a/src/langbot/pkg/provider/modelmgr/requesters/qiniuchatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/qiniuchatcmpl.py
@@ -2,16 +2,19 @@ from __future__ import annotations

 import typing

-from . import litellmchat
+import openai
+
+from . import chatcmpl


-class QiniuChatCompletions(litellmchat.LiteLLMRequester):
+class QiniuChatCompletions(chatcmpl.OpenAIChatCompletions):
    """七牛云 ChatCompletion API 请求器"""

+    client: openai.AsyncClient
+
    default_config: dict[str, typing.Any] = {
        'base_url': 'https://api.qnaigc.com/v1',
        'timeout': 120,
-        'custom_llm_provider': 'openai',
    }

    async def scan_models(self, api_key: str | None = None) -> dict[str, typing.Any]:
--- a/src/langbot/pkg/provider/modelmgr/requesters/shengsuanyun.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/shengsuanyun.py
@@ -0,0 +1,32 @@
+from __future__ import annotations
+
+import openai
+import typing
+
+from . import chatcmpl
+import openai.types.chat.chat_completion as chat_completion
+
+
+class ShengSuanYunChatCompletions(chatcmpl.OpenAIChatCompletions):
+    """胜算云(ModelSpot.AI) ChatCompletion API 请求器"""
+
+    client: openai.AsyncClient
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'https://router.shengsuanyun.com/api/v1',
+        'timeout': 120,
+    }
+
+    async def _req(
+        self,
+        args: dict,
+        extra_body: dict = {},
+    ) -> chat_completion.ChatCompletion:
+        return await self.client.chat.completions.create(
+            **args,
+            extra_body=extra_body,
+            extra_headers={
+                'HTTP-Referer': 'https://langbot.app',
+                'X-Title': 'LangBot',
+            },
+        )
--- a/src/langbot/pkg/provider/modelmgr/requesters/shengsuanyun.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/shengsuanyun.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: 胜算云
  icon: shengsuanyun.svg
 spec:
-  litellm_provider: openai
  config:
  - name: base_url
    label:
--- a/src/langbot/pkg/provider/modelmgr/requesters/siliconflowchatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/siliconflowchatcmpl.py
@@ -0,0 +1,17 @@
+from __future__ import annotations
+
+import typing
+import openai
+
+from . import chatcmpl
+
+
+class SiliconFlowChatCompletions(chatcmpl.OpenAIChatCompletions):
+    """SiliconFlow ChatCompletion API 请求器"""
+
+    client: openai.AsyncClient
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'https://api.siliconflow.cn/v1',
+        'timeout': 120,
+    }
--- a/src/langbot/pkg/provider/modelmgr/requesters/siliconflowchatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/siliconflowchatcmpl.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: 硅基流动
  icon: siliconflow.svg
 spec:
-  litellm_provider: openai
  config:
  - name: base_url
    label:
--- a/src/langbot/pkg/provider/modelmgr/requesters/spacechatcmpl.py
+++ b/src/langbot/pkg/provider/modelmgr/requesters/spacechatcmpl.py
@@ -0,0 +1,17 @@
+from __future__ import annotations
+
+import typing
+import openai
+
+from . import chatcmpl
+
+
+class LangBotSpaceChatCompletions(chatcmpl.OpenAIChatCompletions):
+    """LangBot Space ChatCompletion API 请求器"""
+
+    client: openai.AsyncClient
+
+    default_config: dict[str, typing.Any] = {
+        'base_url': 'https://api.langbot.cloud/v1',
+        'timeout': 120,
+    }
--- a/src/langbot/pkg/provider/modelmgr/requesters/spacechatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/spacechatcmpl.yaml
@@ -7,7 +7,6 @@ metadata:
    zh_Hans: Space
  icon: space.webp
 spec:
-  litellm_provider: openai
  config:
  - name: base_url
    label:
--- a/src/langbot/pkg/provider/modelmgr/requesters/tencent.svg
+++ b/src/langbot/pkg/provider/modelmgr/requesters/tencent.svg
@@ -1,5 +0,0 @@
-<svg width="60" height="50" viewBox="0 0 60 50" xmlns="http://www.w3.org/2000/svg">
-  <rect width="60" height="50" rx="8" fill="#0052D9"/>
-  <text x="30" y="28" font-family="Arial, sans-serif" font-size="10" font-weight="bold" fill="white" text-anchor="middle">Tencent</text>
-  <text x="30" y="40" font-family="Arial, sans-serif" font-size="8" fill="white" text-anchor="middle">Hunyuan</text>
-</svg>
--- a/src/langbot/pkg/provider/modelmgr/requesters/tencentchatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/tencentchatcmpl.yaml
@@ -1,30 +0,0 @@
-apiVersion: v1
-kind: LLMAPIRequester
-metadata:
-  name: tencent-chat-completions
-  label:
-    en_US: Tencent Hunyuan
-    zh_Hans: 腾讯混元
-  icon: tencent.svg
-spec:
-  litellm_provider: openai
-  config:
-  - name: base_url
-    label:
-      en_US: Base URL
-      zh_Hans: 基础 URL
-    type: string
-    required: true
-    default: https://hunyuan.tencentcloudapi.com/v1
-  - name: timeout
-    label:
-      en_US: Timeout
-      zh_Hans: 超时时间
-    type: integer
-    required: true
-    default: 120
-  support_type:
-  - llm
-  - text-embedding
-  - rerank
-  provider_category: manufacturer
--- a/src/langbot/pkg/provider/modelmgr/requesters/together.svg
+++ b/src/langbot/pkg/provider/modelmgr/requesters/together.svg
@@ -1,5 +0,0 @@
-<svg width="60" height="50" viewBox="0 0 60 50" xmlns="http://www.w3.org/2000/svg">
-  <rect width="60" height="50" rx="8" fill="#8B5CF6"/>
-  <text x="30" y="28" font-family="Arial, sans-serif" font-size="10" font-weight="bold" fill="white" text-anchor="middle">Together</text>
-  <text x="30" y="40" font-family="Arial, sans-serif" font-size="8" fill="white" text-anchor="middle">AI</text>
-</svg>
--- a/src/langbot/pkg/provider/modelmgr/requesters/togetherchatcmpl.yaml
+++ b/src/langbot/pkg/provider/modelmgr/requesters/togetherchatcmpl.yaml
@@ -1,30 +0,0 @@
-apiVersion: v1
-kind: LLMAPIRequester
-metadata:
-  name: together-chat-completions
-  label:
-    en_US: Together AI
-    zh_Hans: Together AI
-  icon: together.svg
-spec:
-  litellm_provider: together_ai
-  config:
-  - name: base_url
-    label:
-      en_US: Base URL
-      zh_Hans: 基础 URL
-    type: string
-    required: true
-    default: https://api.together.xyz/v1
-  - name: timeout
-    label:
-      en_US: Timeout
-      zh_Hans: 超时时间
-    type: integer
-    required: true
-    default: 120
-  support_type:
-  - llm
-  - text-embedding
-  - rerank
-  provider_category: manufacturer
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
RockChinQ	882c9ae8f5	fix(box): trust Box-reported skill paths when filesystem is not shared In separated deployments (Docker Compose, k8s sidecar, --standalone-box, remote runtime.endpoint) the Box runtime owns its own filesystem, so the skill package_root it reports via list_skills is not resolvable on the LangBot side. LangBot's reload_skills and build_skill_extra_mounts validated those paths with os.path.isdir() against its own filesystem, which silently dropped every skill in such deployments — breaking the sandbox skill feature for the nsjail/SaaS backend. Add BoxService.shares_filesystem_with_box, derived from the connector transport (stdio = shared, WebSocket = separated), with an explicit override seam for tests/embedders. Gate both isdir() guards on it: keep local validation in shared-fs stdio mode, trust Box-reported paths otherwise. The Box runtime only reports skills found on its own filesystem, so those paths are valid there by construction. Adds topology-derivation tests (real connector, no mocks) and skill-retention tests for both shared and separated filesystems.	2026-06-07 12:46:52 -04:00
RockChinQ	47fe9bde03	docs(docker): move k8s deployment docs to wiki, drop README_K8S.md The Kubernetes deployment guide now lives only in the wiki (docs.langbot.app -> Installation -> Kubernetes). Remove the in-repo docker/README_K8S.md, repoint the README language variants and the docker-compose / kubernetes.yaml header comments to the wiki, and keep kubernetes.yaml self-describing via inline comments.	2026-06-07 11:36:39 -04:00
RockChinQ	5c3a619e2d	docs(docker): add Box sandbox runtime to k8s manifest and deploy guide The k8s manifest was missing the Box runtime that backs the sandbox tools, the activate skill tool, skill add/edit and stdio MCP. Add a langbot-box Deployment/Service (port 5410), wire langbot to it via BOX__RUNTIME__ENDPOINT (explicit Service name since the in-container default langbot_box uses an underscore, invalid for k8s DNS), and share the Box workspace root as a node hostPath pinned via podAffinity so the node Docker daemon resolves bind-mount paths consistently. Document the component, the shared-FS constraint, security implications and readiness checks in README_K8S.md (zh + en).	2026-06-07 11:18:27 -04:00
RockChinQ	e223edeb45	docs(agents): add --standalone-box flag and box config keys	2026-06-07 08:57:43 -04:00
RockChinQ	d2c3146334	docs(agents): refresh AGENTS.md for current architecture and runtime/box debugging	2026-06-07 08:43:30 -04:00
Haoxuan Xing	7d9c8e3065	Merge pull request #2231 from langbot-app/TyperBody-patch-1 Update key capabilities in README.md	2026-06-07 13:08:19 +08:00
Haoxuan Xing	f12ed81e1e	Update key capabilities in README.md Added links to Deerflow and Weknora in the capabilities section.	2026-06-07 13:05:46 +08:00
Haoxuan Xing	6d4d19b6d7	Merge pull request #2230 from langbot-app/feat/addweknoradeerflow Add DeerFlow LangGraph API as a Provider Runner	2026-06-07 12:22:55 +08:00
Typer_Body	07b90f12a2	ruff3	2026-06-07 02:38:05 +08:00
Typer_Body	fd896c6974	ruff2	2026-06-07 02:35:10 +08:00
Typer_Body	1fbfa868fb	ruff	2026-06-07 02:31:42 +08:00
Typer_Body	ad05819c2e	readme	2026-06-07 02:26:25 +08:00
Typer_Body	0c6f71738c	deerflow	2026-06-07 02:17:40 +08:00
Typer_Body	af451e7006	weknora2	2026-06-07 01:14:02 +08:00
Typer_Body	59f20bcc73	weknora	2026-06-07 01:08:39 +08:00
RockChinQ	7eca3cdfca	feat(web): show sub-entity name in document title on detail pages Detail pages (plugin / MCP / pipeline / knowledge base / skill) only showed the type in the tab title. Drive the /home document title from HomeLayout, which has the selected entity name via context: '<entity> · <type> · LangBot' when a sub-entity is open, '<type> · LangBot' otherwise. The top-level hook now skips /home and only handles login/register/reset-password/wizard. Type label falls back to a route-derived i18n key on direct page loads.	2026-06-06 12:12:08 -04:00
RockChinQ	c40354f838	feat(web): dynamic document title per route The browser tab title was hard-coded to 'LangBot' in index.html and never changed. Add a useDocumentTitle hook that maps the active route to an existing i18n key and sets document.title to '<page> · LangBot', driven by a new top-level RootLayout route element. Re-runs on navigation and on language change so the title stays localized. Falls back to the bare app name for unmapped routes.	2026-06-06 12:07:41 -04:00
RockChinQ	21a5b4658a	fix(plugin-market): keep fixed card width regardless of result count The result grid used auto-fit tracks, so a single search result stretched to fill the whole row. Switch to fixed responsive column counts (1/2/3/4 across breakpoints), matching langbot-space, so cards keep a consistent max width no matter how many results are shown.	2026-06-06 11:40:02 -04:00
RockChinQ	073acaa053	feat(plugin-market): move extension count into search box placeholder Mirror the langbot-space marketplace change: drop the '共 xxx 个扩展' stats line below the tag filter, surface the count in the search placeholder ('搜索 xxx 个扩展、能力或场景...') when no query is active, and show the total at the bottom via allLoadedCount when searching. Adds searchPlaceholderCount + allLoadedCount to all 8 locales.	2026-06-06 11:33:46 -04:00
RockChinQ	38759b229d	feat(plugin-market): show per-format extension counts in type filter Mirror the LangBot Space marketplace: the advanced-filter type options (plugin / MCP / skill) now display their live extension count, e.g. "插件 (74)". Counts are fetched on mount via three lightweight searchMarketplaceExtensions calls (page_size=1) reading total per type. The all-formats option intentionally shows no count.	2026-06-06 08:11:59 -04:00
RockChinQ	efe32e34ae	fix(deps): patch Dependabot vulnerability alerts (Python + web) Python (pyproject.toml + uv.lock): - aiohttp 3.13.5->3.14.0, langchain-core 1.3.2->1.4.1, langsmith 0.7.36->0.8.9, lxml 6.0.2->6.1.1, Mako 1.3.11->1.3.12, PyJWT 2.11.0->2.13.0, python-multipart 0.0.26->0.0.32, urllib3 2.6.3->2.7.0, Pygments 2.19.2->2.20.0, idna 3.11->3.18, pip 26.0->26.1.2, python-dotenv 1.2.1->1.2.2, requests 2.32.5->2.34.2, starlette 0.52.1->1.2.1, uv 0.11.7->0.11.19 web (package.json + both lockfiles): - axios ->1.17.0, postcss ->8.5.15, react-router(-dom) ->7.17.0 (direct) - overrides for transitive: flatted >=3.4.2, follow-redirects >=1.16.0, minimatch (3.1.3 / 9.0.7), picomatch (2.3.2 / 4.0.4) - regenerated both package-lock.json and pnpm-lock.yaml in sync Verified: uv sync + core imports OK; pnpm --frozen-lockfile + tsc + vite build pass. Not fixable (no upstream patch yet, tracked separately): - chromadb (critical, <=1.5.9 is latest) — awaiting upstream release - PyPDF2 (medium, deprecated; needs migration to pypdf, code change)	2026-06-06 06:06:59 -04:00
Junyan Chin	46db4de11a	Update QQ Group link in README_CN.md	2026-06-06 17:20:19 +08:00
RockChinQ	170a6756f4	fix(add-extension): load real icon in install confirm dialog from URL params When the install confirm dialog is opened via URL query params (e.g. from a marketplace deep link), installInfo carried no icon, so the icon fell back to the /resources/icon endpoint which 404s for extensions whose icon is an external URL (simpleicons / iconify), showing a Package placeholder. Fetch the icon from the marketplace detail API (mcp/skill/plugin) after opening the dialog and inject it into installInfo, and reset the icon-failed state when the resolved URL changes so the <img> retries instead of sticking on the placeholder.	2026-06-06 04:45:46 -04:00
RockChinQ	7330732f62	fix(ci): bump migration head assertion to 0004, apply prettier - Update test_migrations / test_migrations_postgres head assertion from 0003 to 0004 after adding the mcp readme migration. - Reformat MCPForm.tsx / MCPReadme.tsx to satisfy prettier/prettier.	2026-06-06 03:56:14 -04:00
RockChinQ	b08e5ca09a	feat(mcp): add Docs/Tools tablist on detail page, tidy sidebar label Wrap the MCP detail right panel in a compact left-aligned Docs/Tools tablist (Docs first). Move the tool count into the Tools tab label and drop the redundant panel title/subtitle; connecting/failed states still render the status component. Shorten the sidebar 'Installed Extensions' entry to 'Installed' across all 8 locales, and add tabTools/tabDocs/ noReadme strings.	2026-06-06 03:52:17 -04:00
RockChinQ	dff80a0c0a	fix(marketplace): use external icon URL when icon field is absolute Many MCP / skill records store their icon as an absolute external URL (simpleicons.org / iconify.design) rather than an uploaded file, so the /resources/icon endpoint 404s and the card icon breaks. Add resolveMarketplaceIconURL() which prefers an absolute http(s) icon field and otherwise falls back to the resources endpoint.	2026-06-06 03:52:09 -04:00
RockChinQ	f54ae4b91c	feat(mcp): persist and display marketplace README Capture the README markdown from LangBot Space when installing an MCP server and store it on the mcp_servers record (new readme column + alembic migration 0004). The detail page can then render docs offline, independent of the server's runtime/connection state.	2026-06-06 03:52:00 -04:00
RockChinQ	e5b3cced1f	feat(market): show 24 plugins per page	2026-06-05 11:33:02 -04:00