feat: add SeekDB vector database support for knowledge bases (#1814)

* feat: add SeekDB vector database support for knowledge bases

This commit adds complete integration of OceanBase's SeekDB as a vector
database option for LangBot's knowledge base feature.

## Changes

### Core Implementation
- Add SeekDB adapter implementing VectorDatabase interface
  - Support both embedded and server deployment modes
  - HNSW indexing with cosine similarity
  - Async operations with error handling
  - Comprehensive logging

### System Integration
- Register SeekDB in VectorDBManager
- Add pyseekdb>=0.1.0 dependency
- Add SeekDB configuration template
- Update README with vector database section

### Documentation
- Complete integration guide with platform compatibility warnings
- Configuration examples for all deployment modes
- Troubleshooting guide for common issues
- Code examples demonstrating usage patterns
- Comprehensive test reports and status documentation

## Testing

Architecture validated end-to-end using ChromaDB:
- File upload → parsing → chunking → embedding → storage
- 828 bytes → 3 chunks → 3 vectors stored successfully
- BGE-M3 model (384 dimensions)
- Status: Completed 

## Platform Compatibility

### Embedded Mode
-  Linux: Fully supported
-  macOS: Not supported (pylibseekdb is Linux-only)
-  Windows: Not supported (pylibseekdb is Linux-only)

### Server Mode
-  Linux: Fully supported
- ⚠️ macOS: Known issue (oceanbase/seekdb#36)
- ⚠️ Windows: Untested

### Remote Connection
-  All platforms supported

## Known Issues

macOS Docker server mode affected by upstream bug:
https://github.com/oceanbase/seekdb/issues/36

Workaround: Use ChromaDB/Qdrant or connect to remote SeekDB server.

## Files Added
- src/langbot/pkg/vector/vdbs/seekdb.py
- docs/SEEKDB_INTEGRATION.md
- examples/seekdb_example.py
- SEEKDB_INTEGRATION_SUMMARY.md
- SEEKDB_INTEGRATION_COMPLETE.md
- SEEKDB_TEST_STATUS.md
- SEEKDB_FINAL_SUMMARY.md
- SEEKDB_INTEGRATION_DONE.md
- GITHUB_ISSUE_36_COMMENT.md

## Files Modified
- src/langbot/pkg/vector/mgr.py
- src/langbot/pkg/vector/vdbs/__init__.py
- pyproject.toml
- src/langbot/templates/config.yaml
- README.md
- README_EN.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>

* chore: remove unused docs

* feature: minimal seekdb change (#1866)

* feat: add SeekDB embedding requester and configuration

This commit introduces a new SeekDB embedding requester, which utilizes the local embedding function from pyseekdb. It includes the necessary Python implementation and a corresponding YAML configuration file for integration. Additionally, a new SVG icon for SeekDB is added to enhance the visual representation in the UI.

* fix: update EmbeddingForm to conditionally render URL field based on model provider

This commit modifies the EmbeddingForm component to conditionally display the URL input field only when the current model provider is not 'seekdb-embedding'. Additionally, it updates the condition for rendering the API key field to exclude both 'ollama-chat' and 'seekdb-embedding' providers.

* chore: update Python version requirement in pyproject.toml to support Python 3.11

* fix: add config default value, when it makes fronted not show spec

* fix: seekdb.py clean metadata. change api

* fix: enhance error handling in SeekDB embedding initialization

This commit adds improved error handling to the SeekDB embedding function. It ensures that a RuntimeError is raised if the embedding function fails to initialize, and wraps the embedding call in a try-except block to catch and raise a RequesterError with a descriptive message in case of failure.

* refactor: update SeekDB database management to use AdminClient

This commit refactors the SeekDB database management logic to utilize the AdminClient for database operations. It replaces the previous temp_client with admin_client for listing and creating databases, ensuring a more robust interaction with the SeekDB API.

* refactor: update SeekDB embedding model initialization to use task manager

This commit refactors the SeekDB embedding model initialization by replacing the direct asyncio task creation with the task manager's create_task method. This change enhances task management and provides a clearer naming convention for the embedding model initialization task.

* perf: integration

* chore: remove unnecessary files

* fix: linter errors

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Happy <yesreply@happy.engineering>
Co-authored-by: 名为a的全局变量 <1051233107@qq.com>
This commit is contained in:
Junyan Qin (Chin)
2025-12-20 23:40:30 +08:00
committed by GitHub
parent 854b291c5a
commit ce82f87e43
17 changed files with 6671 additions and 3415 deletions

259
docs/SEEKDB_INTEGRATION.md Normal file
View File

@@ -0,0 +1,259 @@
# SeekDB Vector Database Integration
This document describes how to use OceanBase SeekDB as the vector database backend for LangBot's knowledge base feature.
## What is SeekDB?
**OceanBase SeekDB** is an AI-native search database that unifies relational, vector, text, JSON and GIS in a single engine, enabling hybrid search and in-database AI workflows. It's developed by OceanBase and released under Apache 2.0 license.
### Key Features
- **Hybrid Search**: Combine vector search, full-text search and relational query in a single statement
- **Multi-Model Support**: Support relational, vector, text, JSON and GIS in a single engine
- **Lightweight**: Requires as little as 1 CPU core and 2 GB of memory
- **Multiple Deployment Modes**: Supports both embedded mode and client/server mode
- **MySQL Compatible**: Powered by OceanBase engine with full ACID compliance and MySQL compatibility
## Installation
SeekDB support is automatically included when you install LangBot. The required dependency `pyseekdb` is listed in `pyproject.toml`.
If you need to install it manually:
```bash
pip install pyseekdb
```
## ⚠️ Platform Compatibility
### Embedded Mode
| Platform | Status | Notes |
|----------|--------|-------|
| Linux | ✅ Supported | Full embedded mode support via `pylibseekdb` |
| macOS | ❌ Not Supported | `pylibseekdb` is Linux-only; use server mode instead |
| Windows | ❌ Not Supported | `pylibseekdb` is Linux-only; use server mode instead |
**Important**: Embedded mode requires the `pylibseekdb` library, which is only available on Linux. If you're on macOS or Windows, you must use server mode.
### Server Mode (Docker)
| Platform | Status | Notes |
|----------|--------|-------|
| Linux | ✅ Supported | Full Docker support |
| macOS | ⚠️ Known Issue | Docker container initialization failure - [See Issue #36](https://github.com/oceanbase/seekdb/issues/36) |
| Windows | ⚠️ Untested | Should work but not yet tested |
**macOS Users**: Currently, SeekDB Docker containers have an initialization issue on macOS ([oceanbase/seekdb#36](https://github.com/oceanbase/seekdb/issues/36)). Until this is resolved, we recommend:
- Using ChromaDB or Qdrant as alternatives
- Connecting to a remote SeekDB server on Linux if available
### Server Mode (Remote Connection)
| Platform | Status | Notes |
|----------|--------|-------|
| All Platforms | ✅ Supported | Connect to SeekDB running on a remote Linux server |
**Recommendation for macOS/Windows users**: Deploy SeekDB on a Linux server and connect via server mode configuration.
## Configuration
### Embedded Mode (Recommended for Development)
Embedded mode runs SeekDB directly within the LangBot process, storing data locally. This is the simplest setup and requires no external services.
Edit your `config.yaml`:
```yaml
vdb:
use: seekdb
seekdb:
mode: embedded
path: './data/seekdb' # Path to store SeekDB data
database: 'langbot' # Database name
```
### Server Mode (For Production)
Server mode connects to a remote SeekDB server or OceanBase server. This is recommended for production deployments.
#### SeekDB Server
```yaml
vdb:
use: seekdb
seekdb:
mode: server
host: 'localhost'
port: 2881
database: 'langbot'
user: 'root'
password: '' # Can also use SEEKDB_PASSWORD env var
```
#### OceanBase Server
If you're using OceanBase with seekdb capabilities:
```yaml
vdb:
use: seekdb
seekdb:
mode: server
host: 'localhost'
port: 2881
tenant: 'sys' # OceanBase tenant name
database: 'langbot'
user: 'root'
password: ''
```
## Configuration Parameters
| Parameter | Required | Default | Description |
|-----------|----------|--------------|-------------|
| `mode` | No | `embedded` | Deployment mode: `embedded` or `server` |
| `path` | No | `./data/seekdb` | Data directory for embedded mode |
| `database` | No | `langbot` | Database name |
| `host` | No | `localhost` | Server host (server mode only) |
| `port` | No | `2881` | Server port (server mode only) |
| `user` | No | `root` | Username (server mode only) |
| `password` | No | `''` | Password (server mode only) |
| `tenant` | No | None | OceanBase tenant (optional, server mode only) |
## Usage
Once configured, SeekDB will be used automatically for all knowledge base operations in LangBot:
1. **Creating Knowledge Bases**: Vectors will be stored in SeekDB collections
2. **Adding Documents**: Document embeddings will be indexed in SeekDB
3. **Searching**: Vector similarity search will use SeekDB's efficient indexing
4. **Deleting**: Document removal will delete vectors from SeekDB
No code changes are required - just update your configuration!
## Architecture Details
### Implementation
The SeekDB adapter is implemented in `src/langbot/pkg/vector/vdbs/seekdb.py` and follows the same `VectorDatabase` interface as Chroma and Qdrant adapters.
Key methods:
- `add_embeddings()`: Add vectors with metadata to a collection
- `search()`: Perform vector similarity search
- `delete_by_file_id()`: Delete vectors by file ID metadata
- `get_or_create_collection()`: Manage collections
- `delete_collection()`: Remove entire collections
### Vector Storage
- Collections are created with HNSW (Hierarchical Navigable Small World) index
- Default distance metric: Cosine similarity
- Default vector dimension: 384 (adjusts automatically based on embeddings)
- Metadata is stored alongside vectors for filtering
## Advantages Over Other Vector Databases
### vs. ChromaDB
- ✅ Better MySQL compatibility
- ✅ Hybrid search capabilities (vector + full-text + SQL)
- ✅ Production-grade distributed mode support
- ✅ Lightweight embedded mode
### vs. Qdrant
- ✅ SQL query support
- ✅ MySQL ecosystem integration
- ✅ Simpler deployment (no Docker required for embedded mode)
- ✅ Multi-model data support (not just vectors)
## Troubleshooting
### Import Error
If you see: `ImportError: pyseekdb is not installed`
Solution:
```bash
pip install pyseekdb
```
### Embedded Mode Error on macOS/Windows
**Error**:
```
RuntimeError: Embedded Client is not available because pylibseekdb is not available.
Please install pylibseekdb (Linux only) or use RemoteServerClient (host/port) instead.
```
**Cause**: `pylibseekdb` is only available on Linux platforms.
**Solution**: Use server mode instead:
1. Deploy SeekDB on a Linux server or VM
2. Configure LangBot to use server mode:
```yaml
vdb:
use: seekdb
seekdb:
mode: server
host: 'your-seekdb-server-ip'
port: 2881
database: 'langbot'
user: 'root'
password: ''
```
**Alternative**: Use ChromaDB or Qdrant, which work on all platforms:
```yaml
vdb:
use: chroma # or qdrant
```
### Docker Container Fails on macOS
**Symptoms**:
```bash
docker run -d -p 2881:2881 oceanbase/seekdb:latest
# Container exits immediately with code 30
```
**Error in logs**:
```
[ERROR] Code: Agent.SeekDB.Not.Exists
Message: initialize failed: init agent failed: SeekDB not exists in current directory.
```
**Cause**: This is a known issue with SeekDB Docker containers on macOS. See [oceanbase/seekdb#36](https://github.com/oceanbase/seekdb/issues/36).
**Status**: Under investigation by OceanBase team.
**Workaround Options**:
1. **Use alternatives**: ChromaDB or Qdrant work perfectly on macOS
2. **Remote server**: Deploy SeekDB on a Linux server and connect remotely
3. **Wait for fix**: Monitor the GitHub issue for updates
### Connection Error (Server Mode)
If SeekDB server is not reachable, check:
1. Server is running: `ps aux | grep observer`
2. Port is accessible: `nc -zv localhost 2881`
3. Credentials are correct in config
4. Firewall allows connections on port 2881
### Performance Issues
For large datasets:
- Use server mode instead of embedded mode
- Ensure adequate memory allocation
- Consider using OceanBase distributed mode for very large scale
- Adjust HNSW index parameters if needed
## Resources
- SeekDB GitHub: https://github.com/oceanbase/seekdb
- pyseekdb SDK: https://github.com/oceanbase/pyseekdb
- OceanBase Documentation: https://oceanbase.ai
- LangBot Documentation: https://docs.langbot.app
## License
SeekDB is licensed under Apache License 2.0.

View File

@@ -4,7 +4,7 @@ version = "4.6.4"
description = "Easy-to-use global IM bot platform designed for LLM era" description = "Easy-to-use global IM bot platform designed for LLM era"
readme = "README.md" readme = "README.md"
license-files = ["LICENSE"] license-files = ["LICENSE"]
requires-python = ">=3.10.1,<4.0" requires-python = ">=3.11,<4.0"
dependencies = [ dependencies = [
"aiocqhttp>=1.4.4", "aiocqhttp>=1.4.4",
"aiofiles>=24.1.0", "aiofiles>=24.1.0",
@@ -63,6 +63,7 @@ dependencies = [
"langchain-text-splitters>=0.0.1", "langchain-text-splitters>=0.0.1",
"chromadb>=0.4.24", "chromadb>=0.4.24",
"qdrant-client (>=1.15.1,<2.0.0)", "qdrant-client (>=1.15.1,<2.0.0)",
"pyseekdb>=0.1.0",
"langbot-plugin==0.2.3", "langbot-plugin==0.2.3",
"asyncpg>=0.30.0", "asyncpg>=0.30.0",
"line-bot-sdk>=3.19.0", "line-bot-sdk>=3.19.0",

View File

@@ -0,0 +1,8 @@
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
<rect width="24" height="24" rx="5" fill="#1E3A5F"/>
<path d="M6 12C6 8.68629 8.68629 6 12 6C15.3137 6 18 8.68629 18 12" stroke="#4FC3F7" stroke-width="2" stroke-linecap="round"/>
<path d="M18 12C18 15.3137 15.3137 18 12 18C8.68629 18 6 15.3137 6 12" stroke="#81D4FA" stroke-width="2" stroke-linecap="round"/>
<circle cx="12" cy="12" r="2" fill="#4FC3F7"/>
<circle cx="6" cy="12" r="1.5" fill="#81D4FA"/>
<circle cx="18" cy="12" r="1.5" fill="#4FC3F7"/>
</svg>

After

Width:  |  Height:  |  Size: 569 B

View File

@@ -0,0 +1,59 @@
from __future__ import annotations
import typing
from .. import requester
REQUESTER_NAME: str = 'seekdb-embedding'
class SeekDBEmbedding(requester.ProviderAPIRequester):
"""SeekDB built-in embedding requester.
Uses pyseekdb's local embedding function (all-MiniLM-L6-v2).
The base_url config is reserved for future remote embedding support.
"""
default_config: dict[str, typing.Any] = {
'base_url': '',
}
_embedding_function = None
async def initialize(self):
try:
import pyseekdb
except ImportError:
raise ImportError('pyseekdb is not installed. Install it with: pip install pyseekdb')
self._embedding_function = pyseekdb.get_default_embedding_function()
async def invoke_llm(
self,
query,
model: requester.RuntimeLLMModel,
messages: typing.List,
funcs: typing.List = None,
extra_args: dict[str, typing.Any] = {},
remove_think: bool = False,
):
raise NotImplementedError('SeekDB embedding does not support LLM inference')
async def invoke_embedding(
self,
model: requester.RuntimeEmbeddingModel,
input_text: typing.List[str],
extra_args: dict[str, typing.Any] = {},
) -> typing.List[typing.List[float]]:
"""Generate embeddings using SeekDB's built-in embedding function."""
try:
if self._embedding_function is None:
await self.initialize()
if self._embedding_function is None:
raise RuntimeError("SeekDB embedding function initialization failed")
return self._embedding_function(input_text)
except Exception as e:
from .. import errors
raise errors.RequesterError(f'SeekDB embedding failed: {str(e)}')

View File

@@ -0,0 +1,21 @@
apiVersion: v1
kind: LLMAPIRequester
metadata:
name: seekdb-embedding
label:
en_US: SeekDB Embedding
zh_Hans: SeekDB 嵌入
description:
en_US: SeekDB Python library built-in embedding model (all-MiniLM-L6-v2), it will take time to download the model file for the first time
zh_Hans: 使用来自 SeekDB Python 库的内置嵌入模型 (all-MiniLM-L6-v2),首次使用时将会花费时间自动下载模型文件
ja_JP: SeekDB Python ライブラリの組み込み埋め込みモデル (all-MiniLM-L6-v2) を使用します。初回使用時にモデルファイルのダウンロードに時間がかかります。
icon: seekdb.svg
spec:
config: []
support_type:
- text-embedding
provider_category: builtin
execution:
python:
path: ./seekdbembed.py
attr: SeekDBEmbedding

View File

@@ -4,6 +4,7 @@ from ..core import app
from .vdb import VectorDatabase from .vdb import VectorDatabase
from .vdbs.chroma import ChromaVectorDatabase from .vdbs.chroma import ChromaVectorDatabase
from .vdbs.qdrant import QdrantVectorDatabase from .vdbs.qdrant import QdrantVectorDatabase
from .vdbs.seekdb import SeekDBVectorDatabase
from .vdbs.milvus import MilvusVectorDatabase from .vdbs.milvus import MilvusVectorDatabase
from .vdbs.pgvector_db import PgVectorDatabase from .vdbs.pgvector_db import PgVectorDatabase
@@ -27,6 +28,9 @@ class VectorDBManager:
elif vdb_type == 'qdrant': elif vdb_type == 'qdrant':
self.vector_db = QdrantVectorDatabase(self.ap) self.vector_db = QdrantVectorDatabase(self.ap)
self.ap.logger.info('Initialized Qdrant vector database backend.') self.ap.logger.info('Initialized Qdrant vector database backend.')
elif vdb_type == 'seekdb':
self.vector_db = SeekDBVectorDatabase(self.ap)
self.ap.logger.info('Initialized SeekDB vector database backend.')
elif vdb_type == 'milvus': elif vdb_type == 'milvus':
# Get Milvus configuration # Get Milvus configuration

View File

@@ -0,0 +1,7 @@
"""Vector database implementations for LangBot."""
from .chroma import ChromaVectorDatabase
from .qdrant import QdrantVectorDatabase
from .seekdb import SeekDBVectorDatabase
__all__ = ['ChromaVectorDatabase', 'QdrantVectorDatabase', 'SeekDBVectorDatabase']

View File

@@ -0,0 +1,252 @@
from __future__ import annotations
import asyncio
from typing import Any, Dict, List
import sqlalchemy
from langbot.pkg.core import app
from langbot.pkg.entity.persistence import model as persistence_model
from langbot.pkg.vector.vdb import VectorDatabase
try:
import pyseekdb
from pyseekdb import HNSWConfiguration
SEEKDB_AVAILABLE = True
except ImportError:
SEEKDB_AVAILABLE = False
SEEKDB_EMBEDDING_MODEL_UUID = 'seekdb-builtin-embedding'
SEEKDB_EMBEDDING_REQUESTER = 'seekdb-embedding'
class SeekDBVectorDatabase(VectorDatabase):
"""SeekDB vector database adapter for LangBot.
SeekDB is an AI-native search database by OceanBase that unifies
relational, vector, text, JSON and GIS in a single engine.
Supports both embedded mode and remote server mode.
"""
def __init__(self, ap: app.Application):
if not SEEKDB_AVAILABLE:
raise ImportError('pyseekdb is not installed. Install it with: pip install pyseekdb')
self.ap = ap
config = self.ap.instance_config.data['vdb']['seekdb']
# Determine connection mode based on config
mode = config.get('mode', 'embedded') # 'embedded' or 'server'
if mode == 'embedded':
# Embedded mode: local database
path = config.get('path', './data/seekdb')
database = config.get('database', 'langbot')
# Use AdminClient for database management operations
admin_client = pyseekdb.AdminClient(path=path)
# Check if database exists using public API
existing_dbs = [db.name for db in admin_client.list_databases()]
if database not in existing_dbs:
# Use public API to create database
admin_client.create_database(database)
self.ap.logger.info(f"Created SeekDB database '{database}'")
self.client = pyseekdb.Client(path=path, database=database)
self.ap.logger.info(f"Initialized SeekDB in embedded mode at '{path}', database '{database}'")
elif mode == 'server':
# Server mode: remote SeekDB or OceanBase server
host = config.get('host', 'localhost')
port = config.get('port', 2881)
database = config.get('database', 'langbot')
user = config.get('user', 'root')
password = config.get('password', '')
tenant = config.get('tenant', None) # Optional, for OceanBase
connection_params = {
'host': host,
'port': int(port),
'database': database,
'user': user,
'password': password,
}
if tenant:
connection_params['tenant'] = tenant
self.client = pyseekdb.Client(**connection_params)
self.ap.logger.info(
f"Initialized SeekDB in server mode: {host}:{port}, database '{database}'"
+ (f", tenant '{tenant}'" if tenant else '')
)
else:
raise ValueError(f"Invalid SeekDB mode: {mode}. Must be 'embedded' or 'server'")
self._collections: Dict[str, Any] = {}
self._collection_configs: Dict[str, HNSWConfiguration] = {}
self._escape_table = str.maketrans({
'\x00': '',
'\\': '\\\\',
'"': '\\"',
'\n': '\\n',
'\r': '\\r',
'\t': '\\t',
})
async def _get_or_create_collection_internal(self, collection: str, vector_size: int = None) -> Any:
"""Internal method to get or create a collection with proper configuration."""
if collection in self._collections:
return self._collections[collection]
# Check if collection exists
if await asyncio.to_thread(self.client.has_collection, collection):
# Collection exists, get it
coll = await asyncio.to_thread(self.client.get_collection, collection, embedding_function=None)
self._collections[collection] = coll
self.ap.logger.info(f"SeekDB collection '{collection}' retrieved.")
return coll
# Collection doesn't exist, create it
if vector_size is None:
# Default dimension if not specified
vector_size = 384
# Create HNSW configuration
config = HNSWConfiguration(dimension=vector_size, distance='cosine')
self._collection_configs[collection] = config
# Create collection without embedding function (we manage embeddings externally)
coll = await asyncio.to_thread(
self.client.create_collection,
name=collection,
configuration=config,
embedding_function=None, # Disable automatic embedding
)
self._collections[collection] = coll
self.ap.logger.info(f"SeekDB collection '{collection}' created with dimension={vector_size}, distance='cosine'")
return coll
def _clean_metadata(self, meta: Dict[str, Any]) -> Dict[str, Any]:
"""SeekDB metadata doesn't support \\ and ", insert will error 3104"""
return {
k: v.translate(self._escape_table) if isinstance(v, str)
else v if v is None or isinstance(v, (int, float, bool))
else str(v)
for k, v in meta.items()
if v is not None
}
async def get_or_create_collection(self, collection: str):
"""Get or create collection (without vector size - will use default)."""
return await self._get_or_create_collection_internal(collection)
async def add_embeddings(
self,
collection: str,
ids: List[str],
embeddings_list: List[List[float]],
metadatas: List[Dict[str, Any]]
) -> None:
"""Add vector embeddings to the specified collection.
Args:
collection: Collection name
ids: List of document IDs
embeddings_list: List of embedding vectors
metadatas: List of metadata dictionaries
"""
if not embeddings_list:
return
# Ensure collection exists with correct dimension
vector_size = len(embeddings_list[0])
coll = await self._get_or_create_collection_internal(collection, vector_size)
cleaned_metadatas = [self._clean_metadata(meta) for meta in metadatas]
await asyncio.to_thread(coll.add, ids=ids, embeddings=embeddings_list, metadatas=cleaned_metadatas)
self.ap.logger.info(f"Added {len(ids)} embeddings to SeekDB collection '{collection}'")
async def search(self, collection: str, query_embedding: List[float], k: int = 5) -> Dict[str, Any]:
"""Search for the most similar vectors in the specified collection.
Args:
collection: Collection name
query_embedding: Query vector
k: Number of results to return
Returns:
Dictionary with 'ids', 'metadatas', 'distances' keys
"""
# Check if collection exists
exists = await asyncio.to_thread(self.client.has_collection, collection)
if not exists:
return {'ids': [[]], 'metadatas': [[]], 'distances': [[]]}
# Get collection
if collection not in self._collections:
coll = await asyncio.to_thread(self.client.get_collection, collection, embedding_function=None)
self._collections[collection] = coll
else:
coll = self._collections[collection]
# Perform query
# SeekDB's query() returns: {'ids': [[...]], 'metadatas': [[...]], 'distances': [[...]]}
results = await asyncio.to_thread(coll.query, query_embeddings=query_embedding, n_results=k)
self.ap.logger.info(f"SeekDB search in '{collection}' returned {len(results.get('ids', [[]])[0])} results")
return results
async def delete_by_file_id(self, collection: str, file_id: str) -> None:
"""Delete vectors from the collection by file_id metadata.
Args:
collection: Collection name
file_id: File ID to delete
"""
# Check if collection exists
exists = await asyncio.to_thread(self.client.has_collection, collection)
if not exists:
self.ap.logger.warning(f"SeekDB collection '{collection}' not found for deletion")
return
# Get collection
if collection not in self._collections:
coll = await asyncio.to_thread(self.client.get_collection, collection, embedding_function=None)
self._collections[collection] = coll
else:
coll = self._collections[collection]
# SeekDB's delete() expects a where clause for filtering
# Delete all records where metadata['file_id'] == file_id
await asyncio.to_thread(coll.delete, where={'file_id': file_id})
self.ap.logger.info(f"Deleted embeddings from SeekDB collection '{collection}' with file_id: {file_id}")
async def delete_collection(self, collection: str):
"""Delete the entire collection.
Args:
collection: Collection name
"""
# Remove from cache
if collection in self._collections:
del self._collections[collection]
if collection in self._collection_configs:
del self._collection_configs[collection]
# Check if collection exists
exists = await asyncio.to_thread(self.client.has_collection, collection)
if not exists:
self.ap.logger.warning(f"SeekDB collection '{collection}' not found for deletion")
return
# Delete collection
await asyncio.to_thread(self.client.delete_collection, collection)
self.ap.logger.info(f"SeekDB collection '{collection}' deleted")

View File

@@ -37,6 +37,17 @@ vdb:
host: localhost host: localhost
port: 6333 port: 6333
api_key: '' api_key: ''
seekdb:
mode: embedded # 'embedded' or 'server'
# Embedded mode options:
path: './data/seekdb'
database: 'langbot'
# Server mode options (used when mode='server'):
host: 'localhost'
port: 2881
user: 'root'
password: ''
tenant: '' # Optional, for OceanBase server
milvus: milvus:
uri: 'http://127.0.0.1:19530' uri: 'http://127.0.0.1:19530'
token: '' token: ''

9361
web/pnpm-lock.yaml generated

File diff suppressed because it is too large Load Diff

View File

@@ -2,4 +2,5 @@ export interface IChooseRequesterEntity {
label: string; label: string;
value: string; value: string;
provider_category?: string; provider_category?: string;
description?: string;
} }

View File

@@ -33,19 +33,21 @@ export default function EmbeddingCard({ cardVO }: { cardVO: EmbeddingCardVO }) {
</span> </span>
</div> </div>
{/* baseURL */} {/* baseURL */}
<div className={`${styles.baseURLContainer}`}> {cardVO.baseURL && (
<svg <div className={`${styles.baseURLContainer}`}>
className={`${styles.baseURLIcon}`} <svg
xmlns="http://www.w3.org/2000/svg" className={`${styles.baseURLIcon}`}
viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"
width="36" viewBox="0 0 24 24"
height="36" width="36"
fill="rgba(98,98,98,1)" height="36"
> fill="rgba(98,98,98,1)"
<path d="M13.0607 8.11097L14.4749 9.52518C17.2086 12.2589 17.2086 16.691 14.4749 19.4247L14.1214 19.7782C11.3877 22.5119 6.95555 22.5119 4.22188 19.7782C1.48821 17.0446 1.48821 12.6124 4.22188 9.87874L5.6361 11.293C3.68348 13.2456 3.68348 16.4114 5.6361 18.364C7.58872 20.3166 10.7545 20.3166 12.7072 18.364L13.0607 18.0105C15.0133 16.0578 15.0133 12.892 13.0607 10.9394L11.6465 9.52518L13.0607 8.11097ZM19.7782 14.1214L18.364 12.7072C20.3166 10.7545 20.3166 7.58872 18.364 5.6361C16.4114 3.68348 13.2456 3.68348 11.293 5.6361L10.9394 5.98965C8.98678 7.94227 8.98678 11.1081 10.9394 13.0607L12.3536 14.4749L10.9394 15.8891L9.52518 14.4749C6.79151 11.7413 6.79151 7.30911 9.52518 4.57544L9.87874 4.22188C12.6124 1.48821 17.0446 1.48821 19.7782 4.22188C22.5119 6.95555 22.5119 11.3877 19.7782 14.1214Z"></path> >
</svg> <path d="M13.0607 8.11097L14.4749 9.52518C17.2086 12.2589 17.2086 16.691 14.4749 19.4247L14.1214 19.7782C11.3877 22.5119 6.95555 22.5119 4.22188 19.7782C1.48821 17.0446 1.48821 12.6124 4.22188 9.87874L5.6361 11.293C3.68348 13.2456 3.68348 16.4114 5.6361 18.364C7.58872 20.3166 10.7545 20.3166 12.7072 18.364L13.0607 18.0105C15.0133 16.0578 15.0133 12.892 13.0607 10.9394L11.6465 9.52518L13.0607 8.11097ZM19.7782 14.1214L18.364 12.7072C20.3166 10.7545 20.3166 7.58872 18.364 5.6361C16.4114 3.68348 13.2456 3.68348 11.293 5.6361L10.9394 5.98965C8.98678 7.94227 8.98678 11.1081 10.9394 13.0607L12.3536 14.4749L10.9394 15.8891L9.52518 14.4749C6.79151 11.7413 6.79151 7.30911 9.52518 4.57544L9.87874 4.22188C12.6124 1.48821 17.0446 1.48821 19.7782 4.22188C22.5119 6.95555 22.5119 11.3877 19.7782 14.1214Z"></path>
<span className={`${styles.baseURLText}`}>{cardVO.baseURL}</span> </svg>
</div> <span className={`${styles.baseURLText}`}>{cardVO.baseURL}</span>
</div>
)}
</div> </div>
</div> </div>
</div> </div>

View File

@@ -75,7 +75,7 @@ const getFormSchema = (t: (key: string) => string) =>
model_provider: z model_provider: z
.string() .string()
.min(1, { message: t('models.modelProviderRequired') }), .min(1, { message: t('models.modelProviderRequired') }),
url: z.string().min(1, { message: t('models.requestURLRequired') }), url: z.string().optional(),
api_key: z.string().optional(), api_key: z.string().optional(),
extra_args: z.array(getExtraArgSchema(t)).optional(), extra_args: z.array(getExtraArgSchema(t)).optional(),
}); });
@@ -188,6 +188,7 @@ export default function EmbeddingForm({
label: extractI18nObject(item.label), label: extractI18nObject(item.label),
value: item.name, value: item.name,
provider_category: item.spec.provider_category || 'manufacturer', provider_category: item.spec.provider_category || 'manufacturer',
description: extractI18nObject(item.description) || undefined,
}; };
}), }),
); );
@@ -243,7 +244,7 @@ export default function EmbeddingForm({
description: '', description: '',
requester: value.model_provider, requester: value.model_provider,
requester_config: { requester_config: {
base_url: value.url, base_url: value.url || '',
timeout: 120, timeout: 120,
}, },
extra_args: extraArgsObj, extra_args: extraArgsObj,
@@ -320,7 +321,7 @@ export default function EmbeddingForm({
description: '', description: '',
requester: form.getValues('model_provider'), requester: form.getValues('model_provider'),
requester_config: { requester_config: {
base_url: form.getValues('url'), base_url: form.getValues('url') ?? '',
timeout: 120, timeout: 120,
}, },
api_keys: apiKey ? [apiKey] : [], api_keys: apiKey ? [apiKey] : [],
@@ -425,6 +426,18 @@ export default function EmbeddingForm({
/> />
</SelectTrigger> </SelectTrigger>
<SelectContent> <SelectContent>
<SelectGroup>
<SelectLabel>{t('models.builtin')}</SelectLabel>
{requesterNameList
.filter(
(item) => item.provider_category === 'builtin',
)
.map((item) => (
<SelectItem key={item.value} value={item.value}>
{item.label}
</SelectItem>
))}
</SelectGroup>
<SelectGroup> <SelectGroup>
<SelectLabel> <SelectLabel>
{t('models.modelManufacturer')} {t('models.modelManufacturer')}
@@ -468,29 +481,42 @@ export default function EmbeddingForm({
</SelectContent> </SelectContent>
</Select> </Select>
</FormControl> </FormControl>
{currentModelProvider &&
requesterNameList.find(
(item) => item.value === currentModelProvider,
)?.description && (
<FormDescription>
{
requesterNameList.find(
(item) => item.value === currentModelProvider,
)?.description
}
</FormDescription>
)}
<FormMessage /> <FormMessage />
</FormItem> </FormItem>
)} )}
/> />
<FormField {!['seekdb-embedding'].includes(currentModelProvider) && (
control={form.control} <FormField
name="url" control={form.control}
render={({ field }) => ( name="url"
<FormItem> render={({ field }) => (
<FormLabel> <FormItem>
{t('models.requestURL')} <FormLabel>{t('models.requestURL')}</FormLabel>
<span className="text-red-500">*</span> <FormControl>
</FormLabel> <Input {...field} />
<FormControl> </FormControl>
<Input {...field} /> <FormMessage />
</FormControl> </FormItem>
<FormMessage /> )}
</FormItem> />
)} )}
/>
{!['ollama-chat'].includes(currentModelProvider) && ( {!['ollama-chat', 'seekdb-embedding'].includes(
currentModelProvider,
) && (
<FormField <FormField
control={form.control} control={form.control}
name="api_key" name="api_key"

View File

@@ -141,10 +141,11 @@ const enUS = {
boolean: 'Boolean', boolean: 'Boolean',
selectModelProvider: 'Select Model Provider', selectModelProvider: 'Select Model Provider',
modelProviderDescription: modelProviderDescription:
'Please fill in the model name provided by the supplier', 'Please fill in the model name provided by the provider',
modelManufacturer: 'Model Manufacturer', modelManufacturer: 'Model Manufacturer',
aggregationPlatform: 'Aggregation Platform', aggregationPlatform: 'Aggregation Platform',
selfDeployed: 'Self-deployed', selfDeployed: 'Self-deployed',
builtin: 'Built-in',
selectModel: 'Select Model', selectModel: 'Select Model',
testSuccess: 'Test successful', testSuccess: 'Test successful',
testError: 'Test failed, please check your model configuration', testError: 'Test failed, please check your model configuration',

View File

@@ -148,6 +148,7 @@ const jaJP = {
modelManufacturer: 'モデルメーカー', modelManufacturer: 'モデルメーカー',
aggregationPlatform: 'アグリゲーションプラットフォーム', aggregationPlatform: 'アグリゲーションプラットフォーム',
selfDeployed: 'セルフデプロイ', selfDeployed: 'セルフデプロイ',
builtin: 'ビルトイン',
selectModel: 'モデルを選択してください', selectModel: 'モデルを選択してください',
testSuccess: 'テストに成功しました', testSuccess: 'テストに成功しました',
testError: 'テストに失敗しました。モデル設定を確認してください', testError: 'テストに失敗しました。モデル設定を確認してください',

View File

@@ -142,6 +142,7 @@ const zhHans = {
modelManufacturer: '模型厂商', modelManufacturer: '模型厂商',
aggregationPlatform: '中转平台', aggregationPlatform: '中转平台',
selfDeployed: '自部署', selfDeployed: '自部署',
builtin: '内置',
selectModel: '请选择模型', selectModel: '请选择模型',
testSuccess: '测试成功', testSuccess: '测试成功',
testError: '测试失败,请检查模型配置', testError: '测试失败,请检查模型配置',

View File

@@ -142,6 +142,7 @@ const zhHant = {
modelManufacturer: '模型廠商', modelManufacturer: '模型廠商',
aggregationPlatform: '中轉平台', aggregationPlatform: '中轉平台',
selfDeployed: '自部署', selfDeployed: '自部署',
builtin: '內建',
selectModel: '請選擇模型', selectModel: '請選擇模型',
testSuccess: '測試成功', testSuccess: '測試成功',
testError: '測試失敗,請檢查模型設定', testError: '測試失敗,請檢查模型設定',