* feat(provider): add rerank model management as a core model type
* feat(provider): add rerank support to existing requesters and new rerank providers
* feat(web): add rerank model management UI and pipeline config
* fix(provider): correct rerank support_type after verification
- Add rerank to OpenRouter (confirmed /api/v1/rerank endpoint)
- Remove rerank from Ollama (no native support, PR #7219 unmerged)
- Remove rerank from JiekouAI (no rerank docs found, URL path mismatch)
* fix(provider): remove alru_cache from model getters and add rerank param hints
* fix: resolve lint errors
- Remove unused alru_cache import from modelmgr.py
- Remove unused error_message variable in invoke_rerank
- Fix prettier formatting in frontend files
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix: remove unused exception variable
- Change `except Exception as e:` to `except Exception:` since e is not used
- Fix prettier formatting in ProviderCard.tsx
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix: apply ruff format
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(template): add rerank config fields to default pipeline config
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* chore: remove PR.md
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(ui): remove duplicate rerank model form in AddModelPopover
The form was being rendered twice: once in TabsContent manual mode
and again in a separate conditional block for rerank tab.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* feat(models): add provider model scanning
* fix: double close button
* feat: update plugin module
* fix(monitoring): WeChat Work feedback recording bugs (#2108)
* fix(monitoring): fix WeChat Work feedback recording bugs
- Fix feedback events silently dropped when stream session expires:
dispatch feedback handlers regardless of session availability
- Fix IntegrityError on repeated feedback (like→dislike) for same
message: implement UPSERT logic in record_feedback()
- Fix cancel feedback (type=3) not removing records: add delete logic
- Fix inaccurate_reasons validation error: convert int reason codes
to strings before creating FeedbackEvent (Pydantic expects List[str])
- Fix feedback timestamps 8 hours off in frontend: use parseUTCTimestamp
instead of new Date() for UTC timestamp parsing
- Fix StreamSessionManager.cleanup missing _feedback_index cleanup
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(monitoring): apply ruff format to wecom feedback files
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: 6mvp6 <13727783693@163.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add feat for receive files in wecombot
* fix: ruff error
* fix: always show sidebar plus buttons on touch/mobile devices (#2115)
Agent-Logs-Url: https://github.com/langbot-app/LangBot/sessions/e27a4886-fbad-4a7a-8558-67a387852753
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: RockChinQ <45992437+RockChinQ@users.noreply.github.com>
* fix: SPA fallback for all frontend routes, not just /home/*
After migrating from Next.js to Vite SPA, routes like /auth/space/callback
returned 404 because the static file server only had SPA fallback for /home/*.
Now all non-API routes fall back to index.html for React Router to handle.
* style: ruff format main.py
* feat: add marketplace link when no parser available for file upload
Links to /home/market?category=Parser, same pattern as knowledge engine selector.
* fix: lint error
* fix(user): allow password login and password change for Space accounts with local password set
Previously, Space accounts were unconditionally blocked from password login
and password change based on account_type. Now the check verifies whether
the user actually has a local password set, allowing Space users who have
set a local password to authenticate and change it normally.
* feat: add edition field to telemetry payload
Sends constants.edition (community/saas) with each telemetry event
so Space can distinguish between community and SaaS instances.
* style: ruff format telemetry.py
* fix(dingtalk): use voice recognition text instead of raw audio binary
When DingTalk sends a voice message to the bot, the callback JSON contains
a 'recognition' field with the speech-to-text result (powered by Qwen).
Previously, LangBot only extracted the 'downloadCode' to download the raw
audio binary and passed it as 'file_base64' to LLM APIs, which caused
400 errors since most models don't support this content type.
This patch:
- Extracts the 'recognition' field from DingTalk audio message content
- Uses it as plain text input to the LLM instead of raw audio
- Falls back to audio binary only when no recognition text is available
- Fixes duplicate text issue for audio messages with recognition
Fixes voice messages returning 'Request failed' on all LLM models.
* feat: integrate Alembic for database migrations
Replace manual if-sqlite/if-postgres branching with Alembic:
- Add alembic dependency
- Create programmatic alembic env (no CLI/alembic.ini needed)
- Support async engines via run_sync passthrough
- render_as_batch=True for SQLite ALTER TABLE compatibility
- Auto-stamp baseline on first run (existing DB at version 25)
- Run alembic upgrade head after legacy migrations
- Include sample migration showing schema + data migration patterns
- Add alembic dir to package-data for distribution
* ci: add migration test workflow for SQLite and PostgreSQL
Tests alembic upgrade on both databases:
- Stamp baseline on existing schema
- Upgrade to head
- Idempotent re-upgrade
- Fresh DB upgrade from scratch
* feat: add autogenerate support and CLI entrypoint for alembic
- autogenerate: compare ORM models vs DB schema to generate migrations
- CLI: python -m langbot.pkg.persistence.alembic_runner <command>
- autogenerate, upgrade, stamp, current
- Reads data/config.yaml for DB connection
* fix: add filereader for dingtalk,lark (#2122)
* fix: add filereader for dingtalk
* feat: add lark
* feat: update uv.lock
* chore: update version to 4.9.6 in pyproject.toml, __init__.py, and uv.lock
* fix: update langbot-plugin version to 0.3.8
* fix: update langbot-plugin version to 0.3.8
* docs: update database migration instructions in AGENTS.md
* fix(dashscopeapi): fix null value check in reasoning content processing logic (#2128)
* fix(n8n-runner): fix output_key not applied when n8n returns plain JSON (#2119)
* fix: bump dependencies to resolve Dependabot security alerts (#2130)
* fix: bump dependencies to resolve Dependabot security alerts
Python:
- aiohttp: >=3.11.18 → >=3.13.4 (duplicate Host headers, header injection, redirect leak, multipart DoS)
- cryptography: >=44.0.3 → >=46.0.7 (buffer overflow with non-contiguous buffers)
- pillow: >=11.2.1 → >=12.2.0 (FITS GZIP decompression bomb, HIGH)
- langchain-text-splitters: >=0.0.1 → >=1.1.2 (SSRF redirect bypass)
- langchain-core: add >=1.2.28 (incomplete f-string validation)
- langsmith: add >=0.7.31 (streaming token redaction bypass)
- python-multipart: add >=0.0.26 (multipart DoS)
- Mako: add >=1.3.11 (path traversal)
- pytest: >=8.4.1 → >=9.0.3 (tmpdir handling)
- uv: >=0.7.11 → >=0.11.6 (arbitrary file deletion)
JavaScript (web/):
- vite: ^8.0.3 → ^8.0.5 (fs.deny bypass, WebSocket file read, path traversal, HIGH)
- axios: ^1.13.5 → ^1.15.0 (cloud metadata exfiltration)
- lodash: ^4.17.23 → ^4.18.0 (code injection via _.template, prototype pollution, HIGH)
* fix: update pnpm-lock.yaml for bumped dependencies
* feat(ci): add i18n key consistency check for frontend locales (#2133)
* feat(ci): add i18n key consistency check workflow
Agent-Logs-Url: https://github.com/langbot-app/LangBot/sessions/c7bf50da-189b-49a5-9671-dbe8e70ff9d0
Co-authored-by: RockChinQ <45992437+RockChinQ@users.noreply.github.com>
* feat(ci): replace eval with line-by-line parser, add permissions block
Agent-Logs-Url: https://github.com/langbot-app/LangBot/sessions/c7bf50da-189b-49a5-9671-dbe8e70ff9d0
Co-authored-by: RockChinQ <45992437+RockChinQ@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: RockChinQ <45992437+RockChinQ@users.noreply.github.com>
* feat(models): add provider model scanning
* feat(models): add 'select all' functionality and enrich model abilities
* fix:ruff
* fix:ruff
---------
Co-authored-by: WangCham <651122857@qq.com>
Co-authored-by: 6mvp6 <119733319+6mvp6@users.noreply.github.com>
Co-authored-by: 6mvp6 <13727783693@163.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Guanchao Wang <wangcham233@gmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: RockChinQ <45992437+RockChinQ@users.noreply.github.com>
Co-authored-by: RockChinQ <rockchinq@gmail.com>
Co-authored-by: haiyangbg <zhouhaiyangaa@gmail.com>
Co-authored-by: Rock Chin <1010553892@qq.com>
Co-authored-by: Amadeus <115918672+AmadeusKurisu1@users.noreply.github.com>
Co-authored-by: hzhhong <hung.z.h916@gmail.com>
Co-authored-by: fdc310 <2213070223@qq.com>
Add chromaembed.py using Chroma's DefaultEmbeddingFunction (all-MiniLM-L6-v2)
for local embedding generation via ONNX Runtime. Also simplify seekdbembed.py
and add ndarray-to-list conversion for JSON serialization compatibility.
Extract knowledge base UUID list into query.variables['_knowledge_base_uuids']
in PreProcessor so plugins can modify it during PromptPreProcessing. Runner now
reads from variables instead of pipeline_config. Also pass session_name,
bot_uuid, and sender_id to kb.retrieve() in the RETRIEVE_KNOWLEDGE_BASE handler
so knowledge engines receive proper session context.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* [issue:1933] RAG engine plugin architecture (#1967)
* refactor: migrate RAG knowledge services to a plugin-oriented host service architecture.
* feat(rag): phase 2 core refactor with RPC Action handlers
* feat: 为 RAG 插件添加知识库创建和删除事件通知,并优化了 RAG 动作的参数传递和枚举使用。
* feat: 统一知识库管理为RAG引擎,支持动态配置并移除旧的外部知识库组件。
* refactor(rag): remove plugin_adapter, inline logic into RuntimeKnowledgeBase
BREAKING CHANGE: RAGPluginAdapter has been removed. All plugin
communication is now handled directly by RuntimeKnowledgeBase.
Architecture change:
- Before: RuntimeKnowledgeBase → RAGPluginAdapter → plugin_connector
- After: RuntimeKnowledgeBase → plugin_connector (direct)
Changes to kbmgr.py (RuntimeKnowledgeBase):
- Remove RAGPluginAdapter import and usage
- Inline plugin communication methods:
- _on_kb_create(): Notify plugin when KB is created
- _on_kb_delete(): Notify plugin when KB is deleted
- _ingest_document(): Call plugin for document ingestion
- _retrieve(): Call plugin for retrieval
- _delete_document(): Call plugin to delete document
- Simplify dispose(): Only notify plugin, no built-in VDB assumption
Changes to base.py (KnowledgeBaseInterface):
- Remove get_type() abstract method (outdated internal/external concept)
- Add get_rag_engine_plugin_id() abstract method
Changes to localagent.py:
- Remove get_type() call
- Simplify top_k retrieval from KB entity
Deleted files:
- pkg/rag/knowledge/plugin_adapter.py
Benefits:
- Reduced abstraction layer, simpler code
- Plugin communication logic centralized in RuntimeKnowledgeBase
- Easier to understand and maintain
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor(api): remove ExternalKnowledgeBase infrastructure
BREAKING CHANGE: ExternalKnowledgeBase has been completely removed.
All knowledge bases are now unified under the single KnowledgeBase model,
differentiated by their rag_engine_plugin_id.
Deleted files:
- pkg/api/http/controller/groups/knowledge/external.py
(ExternalKBController with /external-bases routes)
- pkg/api/http/service/external_kb.py
(ExternalKnowledgeBaseService)
- pkg/rag/knowledge/external.py
(ExternalKnowledgeBase implementation)
Modified files:
- pkg/entity/persistence/rag.py:
Remove ExternalKnowledgeBase SQLAlchemy table definition
- pkg/core/app.py:
Remove external_kb_service attribute from LangBotApplication
- pkg/core/stages/build_app.py:
Remove external_kb_service initialization
Migration notes:
- Existing external knowledge base data should be migrated manually
- API consumers should use /api/v1/knowledge/bases for all KB operations
- Use /api/v1/knowledge/engines to discover available RAG engines
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor(plugin): remove list_knowledge_retrievers from connector
Remove deprecated list_knowledge_retrievers functionality from the
plugin communication layer. This aligns with the SDK change that
removed the LIST_KNOWLEDGE_RETRIEVERS action.
Changes:
- connector.py: Remove list_knowledge_retrievers() method
- handler.py: Remove list_knowledge_retrievers() handler
The functionality is replaced by the new /api/v1/knowledge/engines
endpoint which lists available RAGEngine components with their
capabilities and configuration schemas.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor(service): update knowledge service with capability-based checks
Replace type-based checks with capability-based checks for file
operations, aligning with the unified knowledge base architecture.
Changes to knowledge.py:
- store_file(): Replace get_type() check with doc_ingestion capability check
- delete_file(): Replace get_type() check with doc_ingestion capability check
- list_rag_engines(): Remove list_knowledge_retrievers call, simplify to
only list RAGEngine components (KnowledgeRetriever type removed)
Changes to pipelines.py:
- Minor cleanup related to knowledge base references
The capability-based approach allows RAG engines to declare their
supported features (doc_ingestion, chunking_config, rerank, hybrid_search)
and the system responds accordingly, rather than hardcoding behavior
based on internal/external type distinction.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat(web): unify knowledge base UI, remove external KB components
BREAKING CHANGE: The internal/external knowledge base distinction
has been removed from the frontend. All knowledge bases are now
displayed in a unified list, differentiated by their RAG engine.
Changes to page.tsx:
- Remove Tab component (内置/外置 tabs)
- Remove selectedKbType state
- Unified knowledge base list display
- Single "Create Knowledge Base" button for all types
Changes to KBDetailDialog.tsx:
- Remove kbType prop
- Simplify dialog logic for unified KB handling
- Documents menu item conditionally shown based on doc_ingestion capability
Changes to KBForm.tsx:
- Remove retriever type handling code
- Simplify form for unified KB creation
- Dynamic form rendering based on RAG engine's creation_schema
Changes to KBCardVO.ts:
- Remove 'type' field from KBCardVO interface
Changes to BackendClient.ts:
- Remove all external KB related methods:
- getExternalKnowledgeBases()
- getExternalKnowledgeBase()
- createExternalKnowledgeBase()
- updateExternalKnowledgeBase()
- deleteExternalKnowledgeBase()
- retrieveFromExternalKnowledgeBase()
Changes to api/index.ts:
- Remove ExternalKnowledgeBase interface definition
UI/UX improvements:
- Users no longer need to understand internal vs external distinction
- RAG engine selection is now the primary differentiator
- Documents panel visibility is capability-driven (doc_ingestion)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor(plugin): code review improvements for RAG handlers
- Unify embed_model field naming to embedding_model_uuid only
- Add structured error responses with error_type for RAG actions
- Fix file_size and mime_type detection in _store_file_task
- Improve error handling with detailed error context (error_type, original_error)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor(rag): refactor KB dynamic form and vector manager
- Frontend: Refactor Knowledge Base form using DynamicForm components.
- Frontend: Remove obsolete jsonSchemaConverter utility.
- Backend: Update VectorManager and PluginHandler to support new RAG architecture.
- Chore: Update dependencies in pyproject.toml.
* fix: code review fixes for RAG refactor
- Remove DEBUG stderr outputs in handler.py
- Move repeated `import json` to file top
- Add warning log for unimplemented delete_by_filter
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor(rag): consolidate valid_fields into entity constants
Define MUTABLE_FIELDS, CREATE_FIELDS, ALL_DB_FIELDS as class
constants in KnowledgeBase entity to eliminate duplication.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: 将知识库获取和RAG引擎信息丰富逻辑移至知识库管理器。
* refactor(rag): introduce RAGRuntimeService and clean up plugin handler
- Create RAGRuntimeService to encapsulate RAG capability implementation (Embedding, VectorOps).
- Refactor PluginHandler to delegate RAG actions to RAGRuntimeService.
- Move KnowledgeService enrichment and creation logic to RAGManager.
- Register RAGRuntimeService in Application and BuildAppStage.
- Clean up legacy code in KnowledgeService.
* refactor(rag): standardize logger and fix type hints
- Use self.ap.logger consistently in kbmgr.py and runtime.py, removing module-level loggers.
- Fix type hints for retrieve_knowledge in handler.py and connector.py to match implementation returning dict.
* refactor: 将引擎徽章的样式从 Tailwind CSS 类迁移到 CSS 模块。
* fix(web): resolve React rendering errors in plugins page
- Fix missing key prop in PluginComponentList by using ternary instead of Fragment
- Fix RAGEngine.name type to I18nObject and use extractI18nObject() for rendering
- Preserves multi-language support
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(rag): update runtime service and web components
* refactor: 优化知识库设置结构并增强前端距离显示健壮性。
* fix: 处理前端距离显示中的空值。
* fix(rag): document retrieve ui and kbmgr top_k validation
* 更新 uv.lock 中的 PyPI 镜像源为官方地址。
* fix: address code review issues for RAG engine plugin architecture
P0 fixes:
- Fix ALL_DB_FIELDS missing collection_id and emoji fields
- Move rag_engine_plugin_id to CREATE_FIELDS (immutable after creation)
- Fix creation_settings mutable default value (dict -> None)
- Rename vector delete method to delete_by_file_id for correct semantics
- Fix delete_by_filter to raise NotImplementedError instead of silent no-op
- Add database migration script (dbm019) for new columns and table cleanup
P1 fixes:
- Clean up design-hesitation comments in connector.py
- Add _parse_plugin_id() with format validation for all RAG methods
- Make _retrieve() raise exceptions instead of silently returning empty results
- Extract _make_rag_error_response() helper for clean error formatting
- Remove unused imports from handler.py
P2 fixes:
- Fix runtime.py indentation inconsistencies
- Simplify get_file_stream to use storage abstraction uniformly
- Reduce redundant DB queries in knowledge service (extract _check_doc_capability)
- Fix engines.py URL encoding: use <path:plugin_id> instead of __ replacement
- Add read-only mode for engine settings in KBForm edit mode
- Simplify page.tsx handleKBCardClick to pass only kbId string
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: address code review findings for RAG plugin architecture
- Frontend: add retrieval_settings param to retrieveKnowledgeBase API call
- Backend: return {uuid} from PUT knowledge base to match frontend expectation
- Backend: validate query is non-empty in retrieve endpoint (400 on empty)
- Backend: rename vector_delete ids→file_ids for semantic clarity, keep
backward compat by accepting both 'file_ids' and 'ids' in RPC handler
- Backend: ensure rag_engine.name fallback is always I18nObject-compatible
dict, preventing frontend extractI18nObject from receiving plain strings
- Migration: fix misleading docstring about external_kb data migration
Co-authored-by: Cursor <cursoragent@cursor.com>
* Update langbot-plugin version to 0.2.6
* chore: update required database version from 18 to 19
* refactor: remove unused polymorphic component framework
* chore: fix lint and format issues for python and frontend
* fix(plugin): remove legacy `ids` fallback in rag_vector_delete handler
SDK now sends `file_ids` directly, the `ids` backward-compat fallback
is no longer needed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(rag): deep review fixes for critical bugs, security and quality
Critical:
- Fix StorageMgr.load() -> storage_provider.load() (C1, AttributeError)
- Update required_database_version 18 -> 19 (C2, migration never runs)
Security:
- Add path traversal validation in get_file_stream (C11)
- Add vectors/ids/metadata length validation in rag_vector_upsert (C12)
Logic fixes:
- Legacy KBs: set capabilities to [] instead of ['doc_ingestion'] (C4)
- Fix store_file return type int -> str (C5)
- Fix retrieve_knowledge return [] -> {'results': []} when disabled (C6)
- Re-raise exception in _on_kb_create instead of silently swallowing (C7)
- Log warning when KB not found in memory during delete (C8)
API fixes:
- Catch ValueError as 400 in create_knowledge_base endpoint (C15)
- Validate plugin_id format in engines endpoints (C16)
Quality:
- Remove dead if/else in migration with identical branches (C17)
- Fix variable shadowing: rag_context -> rag_context_text (C18)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: remove unused os import to fix ruff lint
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(plugin): remove PolymorphicComponent sync from LangBot side
Remove sync_polymorphic_component_instances() from connector and handler,
and the post-connection sync call in initialize(). This dead code synced
an always-empty list of polymorphic instances that were never created.
Companion change to langbot-plugin-sdk PolymorphicComponent removal.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(rag): fix vector_delete count bug and remove vestigial instance_id parameter
1. vector_delete: assign return value from delete_by_filter to count
instead of silently returning 0 for filter-based deletion.
2. Remove instance_id parameter from the entire retrieve_knowledge
call chain (kbmgr → connector → handler → runtime). This parameter
was a remnant of the PolymorphicComponent mechanism and is no longer
used — RAGEngine operates as a stateless singleton.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat(web): 支持 creation_schema 字段级别的 editable 属性控制编辑模式可修改性
- IDynamicFormItemSchema 添加 editable 可选属性
- DynamicFormItemConfig 透传 editable 属性
- DynamicFormComponent 接收 isEditing prop,按字段 editable 值控制禁用
- KBForm 解析 editable 并传递 isEditing 给动态表单组件
- editable 未指定时默认可编辑,editable: false 时编辑模式下禁用该字段
* feat(storage): 添加 size() 抽象方法及 LocalStorage/S3 实现
支持获取存储对象大小,S3 使用 head_object 避免下载整个文件
* fix(migration): 删除 external_knowledge_bases 表前记录日志警告
- 迁移时如果表中存在数据,先 warning 日志记录避免无感数据丢失
- 添加 chunk 清理注释说明:仅对旧版非插件架构 KB 有效
* fix(web): 修复检索结果长文本撑大容器导致查询按钮不可见
KBDetailDialog 的 main 容器添加 min-w-0 overflow-x-hidden,
限制 flex-1 子容器宽度,防止 Dify RAG 长文本撑出 Dialog 边界
* fix(rag): address code review issues for plugin architecture PR
- Fix SQL injection in migration helpers by using bind parameters
- Move numpy import to module level in vector/mgr.py
- Improve path traversal validation using posixpath.normpath
- Add call_rag_retrieve to connector, eliminating duplicate plugin_id
parsing in kbmgr.py _retrieve
- Normalize typing style to modern dict/list/None syntax
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style(web): fix prettier formatting errors
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(rag): update embedding handling in RuntimeConnectionHandler
- Renamed RAG_EMBED_DOCUMENTS and RAG_EMBED_QUERY actions to INVOKE_EMBEDDING for clarity.
- Removed embed_documents and embed_query methods from RuntimeEmbeddingModel and RAGRuntimeService.
- Integrated embedding model retrieval directly in the invoke_embedding method, improving error handling for missing models.
- Updated the embedding invocation logic to streamline the process and enhance error reporting.
* refactor(web): replace KnowledgeRetriever with RAGEngine across frontend and tests
KnowledgeRetriever component type has been removed in favor of the new
RAGEngine architecture. Update all remaining references in i18n locales,
plugin component icon mappings, marketplace filter, and unit tests.
Addresses reviewer notes from RockChinQ on PR #1967.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(rag): address critical bugs found in deep review
- Fix path traversal bypass in runtime.py (check all path components for '..')
- Use normalized path for file loading instead of raw user input
- Change knowledge_bases from list to dict for O(1) lookup and race safety
- Add rollback on KB creation failure (clean up DB + runtime on plugin error)
- Add null check after KB update in knowledge service
- Fix file extension parsing to use os.path.splitext instead of split('.')
(handles multi-dot filenames like 'report.v2.pdf' correctly)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(rag): address remaining review issues across frontend and backend
Frontend:
- Fix KB delete: use async/await with error handling instead of fire-and-forget
- Fix capabilities null check: add optional chaining to prevent crash
- Add toast.error on KB info load failure instead of silent console.error
- Replace hard-coded Chinese validation message with i18n key
- Replace hard-coded English error messages in DynamicFormItemComponent with i18n
- Optimize document polling: stop when all documents reach terminal state
- Add i18n keys (fieldRequired, loadKnowledgeBaseFailed,
deleteKnowledgeBaseFailed, getKnowledgeBaseListError) to all 4 locales
Backend:
- Fix KB delete atomicity: delete from DB first, then notify plugin
- Add RAG engine plugin existence validation before creating KB
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style(rag): fix ruff formatting in kbmgr.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Junyan Qin <rockchinq@gmail.com>
* chore: bump langbot-plugin to 0.3.0 (#1992)
* chore: correct sdk version to 0.3.0a1
* feat: normalize rag related actions' names
* refactor(rag): align IngestionContext fields with SDK changes
Remove redundant `chunking_strategy` field and rename `custom_settings`
to `creation_settings` to match the updated SDK entity definitions
(langbot-plugin-sdk#36).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: fix ruff formatting
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(rag): enforce immutability of embedding_model_uuid and non-editable creation_settings fields
Remove embedding_model_uuid from MUTABLE_FIELDS to prevent post-creation
modification via API. Add backend validation for creation_settings to
preserve fields marked editable:false in the plugin's creation schema.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style(rag): fix ruff formatting in knowledge service
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(rag): split settings into immutable creation_settings and mutable retrieval_settings
- Remove standalone embedding_model_uuid and top_k columns from KB entity
- Add retrieval_settings column; update MUTABLE_FIELDS/CREATE_FIELDS accordingly
- Merge migration logic into dbm019 (add retrieval_settings, migrate top_k
and embedding_model_uuid into JSON settings, drop old columns on PostgreSQL)
- Remove _filter_creation_settings and per-field editable concept
- Frontend: creation_settings fields are all disabled when editing,
retrieval_settings fields are always editable via a second DynamicFormComponent
- Remove editable from IDynamicFormItemSchema, DynamicFormItemConfig
- Clean up KBCardVO, KnowledgeBase API type, and localagent runner
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* bugfix: if ingest_document failed,not raise exep
* fix: ruff lint
* refactor(rag): remove unused _get_kb_entity method from RAGRuntimeService
* feat(vector): implement metadata filters for vector_search and vector_delete (#1997)
Add functional metadata filter support across all 5 VDB backends using
Chroma-style where syntax as the canonical format. Previously the filters
parameter existed throughout the stack but was entirely ignored.
- Add filter_utils.py with normalize_filter() and strip_unsupported_fields()
- Implement filter in search() and add delete_by_filter() for all backends:
Chroma/SeekDB (native passthrough), Qdrant (translated to models.Filter),
Milvus (translated to expr string), pgvector (translated to SQLAlchemy conditions)
- Milvus/pgvector limited to {text, file_id, chunk_uuid}; other fields logged and ignored
- Replace delete_by_filter() NotImplementedError with backend delegation in mgr.py
- Populate retrieval_context['filters'] from settings in kbmgr._retrieve()
- Pass search_type/query_text/documents through handler and runtime service
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style(vector): fix ruff formatting
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(vector): remove numpy dependency and fix SeekDB search modes
- Remove numpy array conversion for query vectors; all VDB backends
accept list[float] directly
- Remove redundant get_or_create_collection call from upsert; backends
handle collection creation internally in add_embeddings
- Fix SeekDB to raise ValueError when vector dimension is unknown
instead of defaulting to 384
- Use hybrid_search() for full-text and hybrid search modes in SeekDB,
since pyseekdb's query() always requires embeddings
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(vector): escape single quotes in SeekDB documents and metadata
Document text containing apostrophes (e.g. "don't", "it's") causes
SQL syntax errors in OceanBase because single quotes were not in the
escape table. Add single-quote escaping and apply the escape table to
the documents parameter in add_embeddings(), not just metadata.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(vector): use standard SQL escaping for single quotes in SeekDB
Change single quote escaping from MySQL-style \' to standard SQL ''
(doubled quote). The backslash escape is not recognized by OceanBase
in NO_BACKSLASH_ESCAPES mode, causing SQL syntax errors when metadata
text contains apostrophes (e.g. O'Shea in academic citations).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(rag): persist retrieval_settings on knowledge base creation
retrieval_settings was not being passed from the service layer to
RAGManager.create_knowledge_base(), causing retrieval schema fields
(e.g. query_rewrite) to be lost on initial KB creation. They only
took effect after a subsequent edit/update.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat(web): add show_if conditional rendering for dynamic forms
Support conditional field visibility in plugin-defined forms via
show_if rules (eq, neq, in operators). Fields can depend on values
from the same form or cross-reference between creation and retrieval
settings via externalDependentValues.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(rag): replace base64 with chunked file transfer for get_rag_file_stream
Use send_file() instead of base64 encoding for returning file content
in the GET_RAG_FILE_STREAM handler, avoiding memory issues with large files.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat(parser): add parser plugin integration and capability-aware upload UI (#2000)
* feat(parser): add parser plugin integration and capability-aware upload UI
Backend: add parser plugin API endpoints (list/invoke), connector and
handler support for parser actions, and KB manager passthrough.
Frontend: thread ragEngineCapabilities prop to FileUploadZone and use
doc_parsing capability to conditionally show the RAG engine option in
the parser selector. When no parser is available, show a warning
prompting users to install a parser plugin.
Update i18n: rename builtInParser to "Provided by RAG engine" and add
noParserAvailable warning message in all 4 locales.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(parser): replace base64 with chunked file transfer and remove stale cache
- Remove @alru_cache from list_parsers() and list_rag_engines()
- Replace inline base64 file content with send_file/read_local_file
chunked transfer pattern in parse_document and invoke_parser flows
- Remove unused base64 import from kbmgr.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat(web): add Parser component kind to plugin market UI and i18n
Add Parser to kindIconMap, market filter toggle, and all 4 locale files
so parser plugins are properly displayed and filterable in the plugin
market, matching the existing RAGEngine treatment.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style(web): fix prettier formatting from merge
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: rename RAGEngine to KnowledgeEngine across frontend and backend
* fix(web): fix I18nObject import path in FileUploadZone and KBDoc
* chore: format files involved in RAGEngine to KnowledgeEngine refactor
* refactor: change rag engine to knowledge engine
* fix: update langbot-plugin version to 0.3.0rc1
* chore: disable migration 20 for now
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Junyan Qin <rockchinq@gmail.com>
- Add the _extract_dify_text_output method to uniformly handle the parsing of Dify output content
- Modify the content extraction method for the answer node in workflow mode
- Add workflow mode detection logic to support the workflow_started event
- Handle error state checks upon completion of the workflow
- Improve the message chunking logic for both basic and workflow modes
- Add a mechanism to capture answer content upon completion of a workflow node
* perf: reduce memory usage by ~200MB+ at startup
Two key optimizations:
1. Use importlib.util.find_spec() instead of __import__() in dependency
checking. find_spec() only locates modules without executing them,
avoiding loading all 36 dependencies (~222MB) into memory at startup.
2. Introduce shared aiohttp.ClientSession via httpclient module.
Previously, every HTTP request created a new ClientSession, which
creates a new TCPConnector and SSL context, loading system root
certificates each time (~270MB total allocations observed via memray).
Now all HTTP client code reuses shared sessions.
- satori.py and coze_server_api/client.py are left unchanged as they
create one session per adapter lifecycle (not per-request).
Profiling data (memray):
- Peak memory: 403MB
- SSL context creation: 270MB / 6.7M allocations (67% of total)
- Dependency import: 222MB (55% of peak)
- Expected reduction: 150-350MB at startup
* fix: remove unused aiohttp imports (ruff F401)
* style: ruff format
- Updated create_llm_model method to include auto_set_to_default_pipeline parameter.
- Adjusted ModelManager to set auto_set_to_default_pipeline to False when creating models.
- Improved logic for setting the default pipeline model based on the new parameter.
* feat: add GitHub Actions workflow for linting with Ruff
* refactor: rename lint job and add formatting step to Ruff workflow
* chore: run ruff format
* chore: rename Ruff lint job to 'Lint' and add frontend linting workflow
* feat: add telemetry support for query execution tracking and configuration
* feat: integrate telemetry manager and enable telemetry data sending
* feat: integrate telemetry manager and enhance error handling for telemetry sending
* feat: update telemetry configuration to use 'space' instead of 'telemetry' and adjust related parameters
* feat: integrate telemetry manager and enable telemetry data sending
* feat: integrate telemetry manager and enhance error handling for telemetry sending
* feat: add instance id
* feat: enhance telemetry management with asynchronous task handling and improve model retrieval caching
---------
Co-authored-by: Junyan Qin <rockchinq@gmail.com>
* feat: add SeekDB vector database support for knowledge bases
This commit adds complete integration of OceanBase's SeekDB as a vector
database option for LangBot's knowledge base feature.
## Changes
### Core Implementation
- Add SeekDB adapter implementing VectorDatabase interface
- Support both embedded and server deployment modes
- HNSW indexing with cosine similarity
- Async operations with error handling
- Comprehensive logging
### System Integration
- Register SeekDB in VectorDBManager
- Add pyseekdb>=0.1.0 dependency
- Add SeekDB configuration template
- Update README with vector database section
### Documentation
- Complete integration guide with platform compatibility warnings
- Configuration examples for all deployment modes
- Troubleshooting guide for common issues
- Code examples demonstrating usage patterns
- Comprehensive test reports and status documentation
## Testing
Architecture validated end-to-end using ChromaDB:
- File upload → parsing → chunking → embedding → storage
- 828 bytes → 3 chunks → 3 vectors stored successfully
- BGE-M3 model (384 dimensions)
- Status: Completed ✅
## Platform Compatibility
### Embedded Mode
- ✅ Linux: Fully supported
- ❌ macOS: Not supported (pylibseekdb is Linux-only)
- ❌ Windows: Not supported (pylibseekdb is Linux-only)
### Server Mode
- ✅ Linux: Fully supported
- ⚠️ macOS: Known issue (oceanbase/seekdb#36)
- ⚠️ Windows: Untested
### Remote Connection
- ✅ All platforms supported
## Known Issues
macOS Docker server mode affected by upstream bug:
https://github.com/oceanbase/seekdb/issues/36
Workaround: Use ChromaDB/Qdrant or connect to remote SeekDB server.
## Files Added
- src/langbot/pkg/vector/vdbs/seekdb.py
- docs/SEEKDB_INTEGRATION.md
- examples/seekdb_example.py
- SEEKDB_INTEGRATION_SUMMARY.md
- SEEKDB_INTEGRATION_COMPLETE.md
- SEEKDB_TEST_STATUS.md
- SEEKDB_FINAL_SUMMARY.md
- SEEKDB_INTEGRATION_DONE.md
- GITHUB_ISSUE_36_COMMENT.md
## Files Modified
- src/langbot/pkg/vector/mgr.py
- src/langbot/pkg/vector/vdbs/__init__.py
- pyproject.toml
- src/langbot/templates/config.yaml
- README.md
- README_EN.md
🤖 Generated with [Claude Code](https://claude.com/claude-code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
* chore: remove unused docs
* feature: minimal seekdb change (#1866)
* feat: add SeekDB embedding requester and configuration
This commit introduces a new SeekDB embedding requester, which utilizes the local embedding function from pyseekdb. It includes the necessary Python implementation and a corresponding YAML configuration file for integration. Additionally, a new SVG icon for SeekDB is added to enhance the visual representation in the UI.
* fix: update EmbeddingForm to conditionally render URL field based on model provider
This commit modifies the EmbeddingForm component to conditionally display the URL input field only when the current model provider is not 'seekdb-embedding'. Additionally, it updates the condition for rendering the API key field to exclude both 'ollama-chat' and 'seekdb-embedding' providers.
* chore: update Python version requirement in pyproject.toml to support Python 3.11
* fix: add config default value, when it makes fronted not show spec
* fix: seekdb.py clean metadata. change api
* fix: enhance error handling in SeekDB embedding initialization
This commit adds improved error handling to the SeekDB embedding function. It ensures that a RuntimeError is raised if the embedding function fails to initialize, and wraps the embedding call in a try-except block to catch and raise a RequesterError with a descriptive message in case of failure.
* refactor: update SeekDB database management to use AdminClient
This commit refactors the SeekDB database management logic to utilize the AdminClient for database operations. It replaces the previous temp_client with admin_client for listing and creating databases, ensuring a more robust interaction with the SeekDB API.
* refactor: update SeekDB embedding model initialization to use task manager
This commit refactors the SeekDB embedding model initialization by replacing the direct asyncio task creation with the task manager's create_task method. This change enhances task management and provides a clearer naming convention for the embedding model initialization task.
* perf: integration
* chore: remove unnecessary files
* fix: linter errors
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Happy <yesreply@happy.engineering>
Co-authored-by: 名为a的全局变量 <1051233107@qq.com>
* Expanded WeCom message parsing to capture msgtype, inline voice/video/file/link data, bounded base64 downloads, and richer mixed-message attachments (src/langbot/libs/wecom_ai_bot_api/api.py); added event accessors for new fields (src/langbot/libs/wecom_ai_bot_api/wecombotevent.py).
Converter now maps richer WeCom payloads (text, images, files, voice, video, links) into platform message chain with fallbacks when nothing parsable is present (src/langbot/pkg/platform/sources/wecombot.py).
Preprocessor now turns voice inputs into file URLs for downstream runners (src/langbot/pkg/pipeline/preproc/preproc.py).
Dify runner uploads all incoming files (images/audio/video/docs) after downloading or decoding data URLs, infers MIME types, and passes typed file descriptors into chat/workflow calls (src/langbot/pkg/provider/runners/difysvapi.py).
* Update src/langbot/pkg/platform/sources/wecombot.py
Fixed the issue of duplicate text in the comments.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update src/langbot/libs/wecom_ai_bot_api/api.py
Modify the way you approach challenges.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update src/langbot/pkg/platform/sources/wecombot.py
Changing the variable names makes more sense.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat: use from_base64 for the voice file converting
---------
Co-authored-by: tabriswang <tabriswang@finecomn.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Junyan Qin <rockchinq@gmail.com>
* Initial plan
* Add backend support for external knowledge bases
Co-authored-by: RockChinQ <45992437+RockChinQ@users.noreply.github.com>
* Add frontend support for external knowledge bases with tabs UI
Co-authored-by: RockChinQ <45992437+RockChinQ@users.noreply.github.com>
* Add i18n translations for all languages (Traditional Chinese and Japanese)
Co-authored-by: RockChinQ <45992437+RockChinQ@users.noreply.github.com>
* Update knowledge base tab list styling to match plugins page
Co-authored-by: RockChinQ <45992437+RockChinQ@users.noreply.github.com>
* perf: margin-top for kb page
* refactor: switch RetrievalResultEntry to langbot_plugin pkg ones
* feat: knowledge retriever listing and creating
* stash
* refactor: unify sync mechanism for polymorphic components
* feat: use unified retireval result struct in retrieval test page
* chore: remove unused methods
* feat: retriever icon displaying
* feat: localagent retrieval with external kbs
* chore: bump version of langbot-plugin to 0.2.0b1
* fix: i18n
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: RockChinQ <45992437+RockChinQ@users.noreply.github.com>
Co-authored-by: Junyan Qin <rockchinq@gmail.com>
* fix: fix n8n streaming support issue
Add streaming support detection and proper message type handling for
n8n service API runner. Previously, when streaming was enabled, n8n
integration would fail due to incorrect message type usage.
1. Added streaming capability detection by checking adapter's
is_stream_output_supported method
2. Implemented conditional message generation using MessageChunk for
streaming mode and Message for non-streaming mode
3. Added proper error handling for adapters that don't support streaming
detection
* fix: add n8n webhook streaming model ,Optimized the streaming output when calling n8n.
---------
Co-authored-by: Dong_master <2213070223@qq.com>
* Initial plan
* Add package structure and resource path utilities
- Created langbot/ package with __init__.py and __main__.py entry point
- Added paths utility to find frontend and resource files from package installation
- Updated config loading to use resource paths
- Updated frontend serving to use resource paths
- Added MANIFEST.in for package data inclusion
- Updated pyproject.toml with build system and entry points
Co-authored-by: RockChinQ <45992437+RockChinQ@users.noreply.github.com>
* Add PyPI publishing workflow and update license
- Created GitHub Actions workflow to build frontend and publish to PyPI
- Added license field to pyproject.toml to fix deprecation warning
- Updated .gitignore to exclude build artifacts
- Tested package building successfully
Co-authored-by: RockChinQ <45992437+RockChinQ@users.noreply.github.com>
* Add PyPI installation documentation
- Created PYPI_INSTALLATION.md with detailed installation and usage instructions
- Updated README.md to feature uvx/pip installation as recommended method
- Updated README_EN.md with same changes for English documentation
Co-authored-by: RockChinQ <45992437+RockChinQ@users.noreply.github.com>
* Address code review feedback
- Made package-data configuration more specific to langbot package only
- Improved path detection with caching to avoid repeated file I/O
- Removed sys.path searching which was incorrect for package data
- Removed interactive input() call for non-interactive environment compatibility
- Simplified error messages for version check
Co-authored-by: RockChinQ <45992437+RockChinQ@users.noreply.github.com>
* Fix code review issues
- Use specific exception types instead of bare except
- Fix misleading comments about directory levels
- Remove redundant existence check before makedirs with exist_ok=True
- Use context manager for file opening to ensure proper cleanup
Co-authored-by: RockChinQ <45992437+RockChinQ@users.noreply.github.com>
* Simplify package configuration and document behavioral differences
- Removed redundant package-data configuration, relying on MANIFEST.in
- Added documentation about behavioral differences between package and source installation
- Clarified that include-package-data=true uses MANIFEST.in for data files
Co-authored-by: RockChinQ <45992437+RockChinQ@users.noreply.github.com>
* chore: update pyproject.toml
* chore: try pack templates in langbot/
* chore: update
* chore: update
* chore: update
* chore: update
* chore: update
* chore: adjust dir structure
* chore: fix imports
* fix: read default-pipeline-config.json
* fix: read default-pipeline-config.json
* fix: tests
* ci: publish pypi
* chore: bump version 4.6.0-beta.1 for testing
* chore: add templates/**
* fix: send adapters and requesters icons
* chore: bump version 4.6.0b2 for testing
* chore: add platform field for docker-compose.yaml
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: RockChinQ <45992437+RockChinQ@users.noreply.github.com>
Co-authored-by: Junyan Qin <rockchinq@gmail.com>