mirror of
https://github.com/langbot-app/LangBot.git
synced 2026-06-11 00:06:04 +00:00
Feat/test build (#2174)
* fix(ci): update unit-test workflow paths to match current source layout Replace stale pkg/** filter with src/langbot/** and add uv.lock. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(tests): update README to reflect current test layout - Fix stale paths: tests/pipeline → tests/unit_tests/pipeline - Update CI Python versions: 3.11, 3.12, 3.13 - Add test directory structure for box, config, platform, plugin, provider, storage - Document pytest markers and uv commands - Mention planned E2E tests Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add shared test factories package Create tests/factories/ with reusable test factories: - FakeApp: mock application with all dependencies - Message chains: text_chain, mention_chain, image_chain - Query factories: text_query, group_text_query, command_query, etc. No test changes - maintains backward compatibility. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add fake provider factory Add tests/factories/provider.py with: - FakeProvider: deterministic fake LLM provider - Error simulation: timeout, auth, rate-limit, malformed - Request capture for assertions - fake_model: mock model with attached provider Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add fake platform factory Add tests/factories/platform.py with: - FakePlatform: simulated platform adapter - Inbound message construction: friend/group/image - Mention-bot flag simulation - Outbound message capture for assertions - Streaming output support simulation - Send failure simulation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add comprehensive message/query factories Extend tests/factories/message.py with: - file_query: file attachment query - unsupported_query: unknown message segment - voice_query: audio/voice query - at_all_query: group @All mention - query_with_session: query with session object - query_with_config: query with custom pipeline config Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add fake message flow smoke test Create tests/smoke/test_fake_message_flow.py: - TestFakeMessageFlow: factory verification tests - TestMessageFlowIntegration: minimal flow smoke test - Tests FakeApp, FakeProvider, FakePlatform, query factories - Verifies LANGBOT_FAKE_PONG marker response - Captures outbound messages for assertions Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add developer test-quick command Add scripts/test-quick.sh and Makefile with: - test-quick: runs ruff check + unit tests + smoke tests - No real provider keys or platform accounts required - Suitable for local branch self-test Update tests/README.md: - Document test-quick command - Document test factories package - Add smoke tests and factories directory structure Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): make test-quick reliable as developer gate Fixes for D-001验收问题: 1. test-quick.sh: use set -euo pipefail, uv run ruff, no tail pipe 2. Remove unused imports in factories (app.py, platform.py, provider.py) 3. Fix unused variable in smoke test 4. Add noqa: E402 to test_n8nsvapi.py lazy imports 5. Update smoke test docs: "minimal fake flow" not full pipeline Now test-quick is a reliable gate: lint failures exit 1, test failures propagate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(unit): add preproc and taskmgr unit tests U-001: Pipeline Preprocessor tests - Normal text message processing - Empty message handling - Image segment with/without vision model - Model selection and fallback - Variable extraction U-004: Core Task Manager tests (pattern-based) - Task creation and tracking patterns - Task cancellation patterns - Scope-based cancellation - Task type filtering - Pruning completed tasks - Wait all tasks Taskmgr tests use pattern-based approach to avoid circular import in source code (taskmgr → app → http_controller → migration → taskmgr). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(unit): add config loader unit tests U-005: Config Loader tests - Valid YAML config loading - Valid JSON config loading - Invalid YAML/JSON error behavior - Missing config file creation from template - Template completion for missing keys - ConfigManager load/dump operations - Exists check for both YAML and JSON All tests use tmp_path fixture, no real project config. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(unit): add chat and command handler pattern tests U-002: Chat Handler tests (pattern-based) - Normal message event emission pattern - prevent_default handling - User message alteration pattern - Runner selection pattern - Streaming/non-streaming response patterns - Exception handling modes (show-error, show-hint, hide) - Message history update pattern - Telemetry payload pattern U-003: Command Handler tests (pattern-based) - Command parsing and text extraction - Event creation pattern - Privilege/admin check pattern - Command result handling (text, error, image) - prevent_default handling - String truncation helper Uses pattern-based testing to avoid circular import issues in source code. Direct imports of handler modules trigger circular import chain. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * style: fix unused imports after ruff auto-fix Remove unused imports in test files: - test_config_loader.py: remove unused os - test_taskmgr.py: remove unused Mock - test_preproc.py: remove unused unsupported_query, image_chain Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(unit): improve taskmgr tests to test real classes U-004 improved: Tests now import and test actual classes: - TaskContext: new(), trace(), to_dict(), placeholder() - TaskWrapper: task creation, context, exception/result capture, cancel, to_dict - AsyncTaskManager: create_task, create_user_task, cancel_task, cancel_by_scope - Task pruning behavior Uses pre-mocking technique: - Mock langbot.pkg.core.app before import (breaks circular chain) - Mock langbot.pkg.core.entities with proper Enum All 24 tests now test real class behavior, not patterns. taskmgr.py coverage should improve significantly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(test): consolidate FakeApp and add sys.modules isolation utility - Extract tests/utils/import_isolation.py with isolated_sys_modules context manager - Extend tests/factories/app.py FakeApp with handler-specific attributes - Refactor test_chat_handler.py to use centralized FakeApp and cached imports - Refactor test_command_handler.py with mock_execute_factory fixture - Refactor test_smoke.py to move import-time sys.modules manipulation into fixture - Add SQLite migration integration tests (G-002) - Add HTTP API smoke integration tests (G-005) - Update CI workflow to call pytest for SQLite migrations (G-004) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add developer quality gate consolidation (G-007) - Add scripts/test-integration-fast.sh for fast integration tests - Add scripts/test-coverage.sh with 12% baseline threshold - Update Makefile with test-integration-fast, test-coverage, test-all-local - Update CI workflow with integration and coverage jobs - Add smoke marker to pytest.ini - Update tests/README.md with quality gate layers documentation - Add tests/integration/pipeline/ for pipeline stage-chain tests Quality gate layers: - Quick: ruff + unit + smoke (~2 min) - Fast Integration: SQLite/API/Pipeline (~3 min) - Coverage: 12% threshold gate (~8 min) - Full Local: all three combined Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add PostgreSQL migration slow integration tests (G-003) - Add tests/integration/persistence/test_migrations_postgres.py - All tests marked with @pytest.mark.slow - Tests skip when TEST_POSTGRES_URL is not set (no local PostgreSQL) - Database isolation via clean_tables and clean_alembic_version fixtures - Update CI workflow to use pytest instead of inline Python script - Remove TODO(G-003) comment - Update tests/README.md with PostgreSQL test documentation Covered scenarios: - Baseline stamp sets revision - Upgrade from baseline to head - Upgrade idempotent - Get current on unstamped DB returns None Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): Phase 1.5 coverage expansion - COV-001 to COV-013 Coverage baseline raised from 13.65% to 26% (+12.35%) Gate raised from 12% to 18% Tasks completed: - COV-001: Command system unit tests (100% coverage) - COV-002: API service unit tests batch 1 (user/apikey/model/provider) - COV-003: Provider model manager unit tests - COV-004: Pipeline remaining stage tests (aggregator/cntfilter/longtext/msgtrun) - COV-005: Storage and utils coverage pass - COV-006: Gate ratchet 12%→15% - COV-007: Gate ratchet 15%→18% - COV-008: API service batch 2 (bot/pipeline/webhook/space/maintenance/mcp) - COV-009: Blocked - API controller circular import issue documented - COV-010: Plugin runtime unit tests (+0.08%) - COV-011: RAG and vector unit tests (+0.68%) - COV-012: Core boot and migration unit tests - COV-013: Provider requester logic unit tests (+0.62%) Key additions: - tests/utils/import_isolation.py: sys.modules isolation for circular imports - Provider requester mock tests: proved HTTP-dependent code can be tested locally - Vector filter utilities: 100% coverage on pure functions - API services: fake persistence pattern for unit testing Blocked issue COV-009 documented in langbot-test-plan/1.5/issues/ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(phase1): add unit tests for telemetry, plugin, rag, persistence Add initial unit tests for Phase 1 of test coverage improvement: - telemetry: test initialization, payload sanitization, early returns (14.3% → 62.9%) - plugin: test _parse_plugin_id static method - rag: test _to_i18n_name static method - persistence: test serialize_model with datetime handling Overall core coverage: 41.9% → 42.2% Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(phase2): add unit tests for core, persistence, plugin, utils - Add test_handler_helpers.py for plugin handler helpers (7 tests) - Add test_mgr_methods.py for persistence manager (5 tests) - Add test_app_config_validation.py for core app config (12 tests) - Add test_knowledge_service.py for API knowledge service (22 tests) - Add test_kbmgr.py for RAG knowledge base manager (39 tests) - Add test_survey_manager.py for survey manager (22 tests) - Add test_connector_methods.py for plugin connector (24 tests) - Add test_funcschema.py for utils function schema (9 tests) - Add test_platform.py for utils platform detection (7 tests) - Add test_extract_deps.py for plugin deps extraction (7 tests) - Add test_database_decorator.py for persistence decorator (7 tests) - Add test_load_config.py for core config loading (19 tests) - Add COVERAGE_EXCLUSIONS.md documenting external adapter exclusions - Fix test_chat_session_limit.py path for portability Coverage: core 28% → 30%, persistence 24% → 24.4%, plugin 27% → 28% Total: 1082 tests passed, core module coverage 45.5% Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(integration): add API controller integration tests - Add test_pipelines.py (10 tests) covering pipelines CRUD operations - GET/POST/PUT/DELETE on /api/v1/pipelines - Extensions endpoint - Metadata endpoint - Coverage: pipelines controller 27% → 80% - Add test_providers.py (10 tests) covering provider/model management - Provider CRUD with model counts - LLM model CRUD - Coverage: providers controller 23% → 81%, models 29% → 45% Tests use Quart TestClient with mocked services for real HTTP behavior without external dependencies. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(integration): add knowledge, bots, and model endpoints tests - Add test_knowledge.py (10 tests) covering knowledge base management - CRUD operations on /api/v1/knowledge/bases - Files management endpoints - Retrieve endpoint with validation - Coverage: knowledge/base.py 26% → 91% - Add test_bots.py (9 tests) covering bot management - CRUD operations on /api/v1/platform/bots - Logs endpoint - Send message endpoint with validation - Coverage: platform/bots.py 24% → 87% - Extend test_providers.py (+4 tests) for embedding/rerank models - Embedding models CRUD - Rerank models CRUD - Coverage: provider/models.py 29% → 60% Total integration tests: 53 (smoke 12 + pipelines 10 + providers 14 + knowledge 10 + bots 9) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(integration): add embed and monitoring endpoint tests Add integration tests for embed widget and monitoring API endpoints: - test_embed.py: 15 tests for widget.js, logo, turnstile, messages, reset, feedback - test_monitoring.py: 15 tests for overview, messages, llm-calls, sessions, errors, export Coverage improvements: - embed.py: 17% → 56% - monitoring.py: 17% → 93% Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(e2e): add minimal startup E2E tests Add E2E tests for LangBot startup flow: - tests/e2e/utils/config_factory.py: minimal config generation - tests/e2e/utils/process_manager.py: LangBot subprocess management - tests/e2e/conftest.py: E2E fixtures (session-scoped process) - tests/e2e/test_startup.py: 12 tests for startup verification Tests verify: - boot.py + stages execution - database initialization (SQLite) - API availability - migrations applied Uses embedded databases (SQLite, Chroma) - no external dependencies. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(quality): fix fake tests and add missing coverage P0 fixes: - telemetry: rewrite fake tests with real behavior verification (25 tests) - config: delete copied-source tests, use proper imports (2 deleted) - persistence: fix try-except pass to verify specific errors P1 fixes: - pipeline: add real FixedWindowAlgo tests instead of mocks (12 tests) - provider: add SessionManager and ToolManager tests (25 tests) - storage: add S3StorageProvider tests with moto mock (16 tests) - plugin: add handler action tests for setting inheritance (15 tests) - rag: add file storage and ZIP processing tests (21 tests) - vector: add VDB filter conversion tests (30 tests) P2 fixes: - pipeline/msgtrun: strengthen assertions for exact message count - api: add response structure validation in integration tests New test files: - provider/test_session_manager.py - provider/test_tool_manager.py - storage/test_s3storage.py - plugin/test_handler_actions.py - rag/test_file_storage.py - vector/test_vdb_filter_conversion.py Source code bugs documented: - provider: TokenManager.next_token() ZeroDivisionError - telemetry: send_tasks class variable shared state - command: empty command IndexError, unused parameters - utils: funcschema KeyError - entity: vector.py independent declarative_base Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(test): update coverage stats and test structure - Update coverage from 22% to 30% - Add new test files to structure: - provider: session_manager, tool_manager - storage: s3storage - plugin: handler_actions - rag: file_storage - vector: vdb_filter_conversion - telemetry: rewritten tests - Update module coverage percentages Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: add 105 new unit tests for untested core functionality Add comprehensive tests for B-class issues (core functionality untested): Pipeline: - test_pool.py: QueryPool ID generation, caching, async context (12 tests) - test_ratelimit.py: Fixed timing-sensitive test tolerance - test_pipelinemgr.py: Use real Pydantic StageProcessResult instead of Mock Utils: - test_version.py: Version comparison functions (20 tests) - test_logcache.py: Log page management and retrieval (18 tests) - test_httpclient.py: HTTP session pool management (10 tests) - test_proxy.py: Proxy configuration from env and config (10 tests) - test_image.py: URL parsing and base64 extraction (12 tests) - test_pkgmgr.py: Pip command generation (8 tests) Discover: - test_engine.py: I18nString, Metadata, Component manifest (15 tests) Test count: 1193 → 1298 (+105 tests) Note: Some B-class issues cannot be tested due to circular import bugs filed as GitHub issues #2175 (pipeline) and #2176 (persistence). * test: tighten phase 1 coverage contracts * test: align ci integration isolation --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
0
tests/unit_tests/vector/__init__.py
Normal file
0
tests/unit_tests/vector/__init__.py
Normal file
210
tests/unit_tests/vector/test_filter_utils.py
Normal file
210
tests/unit_tests/vector/test_filter_utils.py
Normal file
@@ -0,0 +1,210 @@
|
||||
"""Tests for vector filter utilities."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import pytest
|
||||
|
||||
from langbot.pkg.vector.filter_utils import (
|
||||
SUPPORTED_OPS,
|
||||
normalize_filter,
|
||||
strip_unsupported_fields,
|
||||
)
|
||||
|
||||
|
||||
class TestNormalizeFilter:
|
||||
"""Tests for normalize_filter function."""
|
||||
|
||||
def test_normalize_filter_empty_dict(self):
|
||||
"""Empty dict returns empty list."""
|
||||
result = normalize_filter({})
|
||||
assert result == []
|
||||
|
||||
def test_normalize_filter_none(self):
|
||||
"""None returns empty list."""
|
||||
result = normalize_filter(None)
|
||||
assert result == []
|
||||
|
||||
def test_normalize_filter_implicit_eq(self):
|
||||
"""Bare value becomes implicit $eq."""
|
||||
result = normalize_filter({'file_id': 'abc123'})
|
||||
|
||||
assert len(result) == 1
|
||||
assert result[0] == ('file_id', '$eq', 'abc123')
|
||||
|
||||
def test_normalize_filter_explicit_eq(self):
|
||||
"""Explicit $eq operator."""
|
||||
result = normalize_filter({'file_id': {'$eq': 'abc123'}})
|
||||
|
||||
assert len(result) == 1
|
||||
assert result[0] == ('file_id', '$eq', 'abc123')
|
||||
|
||||
def test_normalize_filter_comparison_operators(self):
|
||||
"""Test comparison operators: $gt, $gte, $lt, $lte."""
|
||||
result = normalize_filter({'created_at': {'$gte': 1700000000}})
|
||||
|
||||
assert len(result) == 1
|
||||
assert result[0] == ('created_at', '$gte', 1700000000)
|
||||
|
||||
def test_normalize_filter_ne_operator(self):
|
||||
"""Test $ne operator."""
|
||||
result = normalize_filter({'status': {'$ne': 'deleted'}})
|
||||
|
||||
assert len(result) == 1
|
||||
assert result[0] == ('status', '$ne', 'deleted')
|
||||
|
||||
def test_normalize_filter_in_operator(self):
|
||||
"""Test $in operator with list value."""
|
||||
result = normalize_filter({'file_type': {'$in': ['pdf', 'docx', 'txt']}})
|
||||
|
||||
assert len(result) == 1
|
||||
assert result[0] == ('file_type', '$in', ['pdf', 'docx', 'txt'])
|
||||
|
||||
def test_normalize_filter_nin_operator(self):
|
||||
"""Test $nin operator."""
|
||||
result = normalize_filter({'status': {'$nin': ['deleted', 'archived']}})
|
||||
|
||||
assert len(result) == 1
|
||||
assert result[0] == ('status', '$nin', ['deleted', 'archived'])
|
||||
|
||||
def test_normalize_filter_multiple_conditions(self):
|
||||
"""Multiple top-level keys are AND-ed (returned as multiple triples)."""
|
||||
result = normalize_filter({
|
||||
'file_id': 'abc',
|
||||
'status': {'$ne': 'deleted'},
|
||||
'created_at': {'$gte': 1700000000}
|
||||
})
|
||||
|
||||
assert len(result) == 3
|
||||
# Order should match dict iteration order
|
||||
field_ops = [(field, op) for field, op, _ in result]
|
||||
assert ('file_id', '$eq') in field_ops
|
||||
assert ('status', '$ne') in field_ops
|
||||
assert ('created_at', '$gte') in field_ops
|
||||
|
||||
def test_normalize_filter_unsupported_operator_raises(self):
|
||||
"""Unsupported operator raises ValueError."""
|
||||
with pytest.raises(ValueError, match='Unsupported filter operator'):
|
||||
normalize_filter({'field': {'$regex': 'pattern'}})
|
||||
|
||||
def test_normalize_filter_all_supported_ops(self):
|
||||
"""Test all supported operators are recognized."""
|
||||
for op in SUPPORTED_OPS:
|
||||
if op in ('$in', '$nin'):
|
||||
filter_dict = {'field': {op: ['value1', 'value2']}}
|
||||
else:
|
||||
filter_dict = {'field': {op: 'value'}}
|
||||
|
||||
result = normalize_filter(filter_dict)
|
||||
assert len(result) == 1
|
||||
assert result[0][1] == op
|
||||
|
||||
|
||||
class TestStripUnsupportedFields:
|
||||
"""Tests for strip_unsupported_fields function."""
|
||||
|
||||
def test_strip_keeps_supported_fields(self):
|
||||
"""Fields in supported_fields are kept."""
|
||||
triples = [
|
||||
('file_id', '$eq', 'abc'),
|
||||
('chunk_uuid', '$ne', 'def'),
|
||||
]
|
||||
|
||||
result = strip_unsupported_fields(triples, {'file_id', 'chunk_uuid'})
|
||||
|
||||
assert len(result) == 2
|
||||
assert result == triples
|
||||
|
||||
def test_strip_removes_unsupported_fields(self):
|
||||
"""Fields not in supported_fields are removed."""
|
||||
triples = [
|
||||
('file_id', '$eq', 'abc'),
|
||||
('unknown_field', '$ne', 'def'),
|
||||
]
|
||||
|
||||
result = strip_unsupported_fields(triples, {'file_id'})
|
||||
|
||||
assert len(result) == 1
|
||||
assert result[0] == ('file_id', '$eq', 'abc')
|
||||
|
||||
def test_strip_empty_triples(self):
|
||||
"""Empty triples list returns empty list."""
|
||||
result = strip_unsupported_fields([], {'file_id'})
|
||||
assert result == []
|
||||
|
||||
def test_strip_all_unsupported(self):
|
||||
"""All fields unsupported returns empty list."""
|
||||
triples = [
|
||||
('unknown1', '$eq', 'a'),
|
||||
('unknown2', '$eq', 'b'),
|
||||
]
|
||||
|
||||
result = strip_unsupported_fields(triples, {'file_id'})
|
||||
|
||||
assert result == []
|
||||
|
||||
def test_strip_with_field_aliases(self):
|
||||
"""Field aliases are resolved before checking support."""
|
||||
triples = [
|
||||
('uuid', '$eq', 'abc'), # alias for chunk_uuid
|
||||
('file_id', '$eq', 'def'),
|
||||
]
|
||||
|
||||
result = strip_unsupported_fields(
|
||||
triples,
|
||||
{'file_id', 'chunk_uuid'},
|
||||
field_aliases={'uuid': 'chunk_uuid'}
|
||||
)
|
||||
|
||||
assert len(result) == 2
|
||||
# 'uuid' should be resolved to 'chunk_uuid'
|
||||
assert result[0] == ('chunk_uuid', '$eq', 'abc')
|
||||
assert result[1] == ('file_id', '$eq', 'def')
|
||||
|
||||
def test_strip_alias_not_in_supported(self):
|
||||
"""Alias resolved but still not in supported_fields is dropped."""
|
||||
triples = [
|
||||
('uuid', '$eq', 'abc'), # alias for chunk_uuid, but not supported
|
||||
]
|
||||
|
||||
result = strip_unsupported_fields(
|
||||
triples,
|
||||
{'file_id'}, # chunk_uuid not supported
|
||||
field_aliases={'uuid': 'chunk_uuid'}
|
||||
)
|
||||
|
||||
assert result == []
|
||||
|
||||
def test_strip_preserves_operator_and_value(self):
|
||||
"""Strip only affects field name, not operator or value."""
|
||||
triples = [
|
||||
('file_id', '$in', ['a', 'b', 'c']),
|
||||
]
|
||||
|
||||
result = strip_unsupported_fields(triples, {'file_id'})
|
||||
|
||||
assert result[0] == ('file_id', '$in', ['a', 'b', 'c'])
|
||||
|
||||
def test_strip_none_aliases(self):
|
||||
"""None field_aliases is treated as empty dict."""
|
||||
triples = [
|
||||
('file_id', '$eq', 'abc'),
|
||||
]
|
||||
|
||||
result = strip_unsupported_fields(triples, {'file_id'}, field_aliases=None)
|
||||
|
||||
assert len(result) == 1
|
||||
assert result[0] == ('file_id', '$eq', 'abc')
|
||||
|
||||
|
||||
class TestSupportedOpsConstant:
|
||||
"""Tests for SUPPORTED_OPS constant."""
|
||||
|
||||
def test_supported_ops_contains_expected(self):
|
||||
"""SUPPORTED_OPS contains all expected operators."""
|
||||
expected = {'$eq', '$ne', '$gt', '$gte', '$lt', '$lte', '$in', '$nin'}
|
||||
assert SUPPORTED_OPS == expected
|
||||
|
||||
def test_supported_ops_is_frozenset(self):
|
||||
"""SUPPORTED_OPS is a frozenset for immutability."""
|
||||
from collections.abc import Set
|
||||
assert isinstance(SUPPORTED_OPS, Set)
|
||||
338
tests/unit_tests/vector/test_mgr.py
Normal file
338
tests/unit_tests/vector/test_mgr.py
Normal file
@@ -0,0 +1,338 @@
|
||||
"""Tests for VectorDBManager provider selection logic.
|
||||
|
||||
Tests the initialization logic that selects the appropriate VDB backend
|
||||
based on configuration, without actually creating real VDB instances.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
from tests.utils.import_isolation import isolated_sys_modules
|
||||
|
||||
|
||||
class TestVectorDBManagerInitialization:
|
||||
"""Tests for VectorDBManager.initialize provider selection."""
|
||||
|
||||
def _create_mock_app(self, vdb_config: dict | None):
|
||||
"""Create mock app with vdb configuration."""
|
||||
mock_app = MagicMock()
|
||||
mock_app.instance_config = MagicMock()
|
||||
mock_app.instance_config.data = MagicMock()
|
||||
mock_app.instance_config.data.get = MagicMock(return_value=vdb_config)
|
||||
mock_app.logger = MagicMock()
|
||||
mock_app.logger.info = MagicMock()
|
||||
mock_app.logger.warning = MagicMock()
|
||||
return mock_app
|
||||
|
||||
def _make_vector_import_mocks(self):
|
||||
"""Create mocks for VDB backends to prevent real imports."""
|
||||
mocks = {}
|
||||
|
||||
# Mock core.app to break circular import
|
||||
mocks['langbot.pkg.core.app'] = MagicMock()
|
||||
|
||||
# Mock all VDB backend implementations
|
||||
for backend in ['chroma', 'qdrant', 'seekdb', 'milvus', 'pgvector_db']:
|
||||
mocks[f'langbot.pkg.vector.vdbs.{backend}'] = MagicMock()
|
||||
|
||||
return mocks
|
||||
|
||||
def test_initialize_no_config_defaults_to_chroma(self):
|
||||
"""No vdb config defaults to Chroma."""
|
||||
mock_app = self._create_mock_app(None)
|
||||
|
||||
mocks = self._make_vector_import_mocks()
|
||||
# Create mock Chroma class
|
||||
mock_chroma_class = MagicMock()
|
||||
mocks['langbot.pkg.vector.vdbs.chroma'].ChromaVectorDatabase = mock_chroma_class
|
||||
|
||||
with isolated_sys_modules(mocks):
|
||||
# Import after mocking
|
||||
from langbot.pkg.vector.mgr import VectorDBManager
|
||||
|
||||
mgr = VectorDBManager(mock_app)
|
||||
|
||||
# Run initialize synchronously for test
|
||||
import asyncio
|
||||
asyncio.get_event_loop().run_until_complete(mgr.initialize())
|
||||
|
||||
# Chroma should be instantiated
|
||||
mock_chroma_class.assert_called_once_with(mock_app)
|
||||
mock_app.logger.warning.assert_called()
|
||||
|
||||
def test_initialize_chroma_backend(self):
|
||||
"""Explicit chroma config uses Chroma backend."""
|
||||
vdb_config = {'use': 'chroma'}
|
||||
mock_app = self._create_mock_app(vdb_config)
|
||||
|
||||
mocks = self._make_vector_import_mocks()
|
||||
mock_chroma_class = MagicMock()
|
||||
mocks['langbot.pkg.vector.vdbs.chroma'].ChromaVectorDatabase = mock_chroma_class
|
||||
|
||||
with isolated_sys_modules(mocks):
|
||||
from langbot.pkg.vector.mgr import VectorDBManager
|
||||
|
||||
mgr = VectorDBManager(mock_app)
|
||||
|
||||
import asyncio
|
||||
asyncio.get_event_loop().run_until_complete(mgr.initialize())
|
||||
|
||||
mock_chroma_class.assert_called_once_with(mock_app)
|
||||
mock_app.logger.info.assert_called()
|
||||
|
||||
def test_initialize_qdrant_backend(self):
|
||||
"""Qdrant config uses Qdrant backend."""
|
||||
vdb_config = {'use': 'qdrant'}
|
||||
mock_app = self._create_mock_app(vdb_config)
|
||||
|
||||
mocks = self._make_vector_import_mocks()
|
||||
mock_qdrant_class = MagicMock()
|
||||
mocks['langbot.pkg.vector.vdbs.qdrant'].QdrantVectorDatabase = mock_qdrant_class
|
||||
|
||||
with isolated_sys_modules(mocks):
|
||||
from langbot.pkg.vector.mgr import VectorDBManager
|
||||
|
||||
mgr = VectorDBManager(mock_app)
|
||||
|
||||
import asyncio
|
||||
asyncio.get_event_loop().run_until_complete(mgr.initialize())
|
||||
|
||||
mock_qdrant_class.assert_called_once_with(mock_app)
|
||||
|
||||
def test_initialize_seekdb_backend(self):
|
||||
"""SeekDB config uses SeekDB backend."""
|
||||
vdb_config = {'use': 'seekdb'}
|
||||
mock_app = self._create_mock_app(vdb_config)
|
||||
|
||||
mocks = self._make_vector_import_mocks()
|
||||
mock_seekdb_class = MagicMock()
|
||||
mocks['langbot.pkg.vector.vdbs.seekdb'].SeekDBVectorDatabase = mock_seekdb_class
|
||||
|
||||
with isolated_sys_modules(mocks):
|
||||
from langbot.pkg.vector.mgr import VectorDBManager
|
||||
|
||||
mgr = VectorDBManager(mock_app)
|
||||
|
||||
import asyncio
|
||||
asyncio.get_event_loop().run_until_complete(mgr.initialize())
|
||||
|
||||
mock_seekdb_class.assert_called_once_with(mock_app)
|
||||
|
||||
def test_initialize_milvus_backend_with_uri(self):
|
||||
"""Milvus config with custom URI."""
|
||||
vdb_config = {
|
||||
'use': 'milvus',
|
||||
'milvus': {
|
||||
'uri': 'http://localhost:19530',
|
||||
'token': 'root:Milvus',
|
||||
'db_name': 'langbot_db'
|
||||
}
|
||||
}
|
||||
mock_app = self._create_mock_app(vdb_config)
|
||||
|
||||
mocks = self._make_vector_import_mocks()
|
||||
mock_milvus_class = MagicMock()
|
||||
mocks['langbot.pkg.vector.vdbs.milvus'].MilvusVectorDatabase = mock_milvus_class
|
||||
|
||||
with isolated_sys_modules(mocks):
|
||||
from langbot.pkg.vector.mgr import VectorDBManager
|
||||
|
||||
mgr = VectorDBManager(mock_app)
|
||||
|
||||
import asyncio
|
||||
asyncio.get_event_loop().run_until_complete(mgr.initialize())
|
||||
|
||||
mock_milvus_class.assert_called_once_with(
|
||||
mock_app,
|
||||
uri='http://localhost:19530',
|
||||
token='root:Milvus',
|
||||
db_name='langbot_db'
|
||||
)
|
||||
|
||||
def test_initialize_milvus_backend_defaults(self):
|
||||
"""Milvus defaults when config not fully specified."""
|
||||
vdb_config = {'use': 'milvus'}
|
||||
mock_app = self._create_mock_app(vdb_config)
|
||||
|
||||
mocks = self._make_vector_import_mocks()
|
||||
mock_milvus_class = MagicMock()
|
||||
mocks['langbot.pkg.vector.vdbs.milvus'].MilvusVectorDatabase = mock_milvus_class
|
||||
|
||||
with isolated_sys_modules(mocks):
|
||||
from langbot.pkg.vector.mgr import VectorDBManager
|
||||
|
||||
mgr = VectorDBManager(mock_app)
|
||||
|
||||
import asyncio
|
||||
asyncio.get_event_loop().run_until_complete(mgr.initialize())
|
||||
|
||||
# Should use default values
|
||||
mock_milvus_class.assert_called_once_with(
|
||||
mock_app,
|
||||
uri='./data/milvus.db',
|
||||
token=None,
|
||||
db_name='default'
|
||||
)
|
||||
|
||||
def test_initialize_pgvector_with_connection_string(self):
|
||||
"""pgvector with connection string."""
|
||||
vdb_config = {
|
||||
'use': 'pgvector',
|
||||
'pgvector': {
|
||||
'connection_string': 'postgresql://user:pass@host:5432/langbot'
|
||||
}
|
||||
}
|
||||
mock_app = self._create_mock_app(vdb_config)
|
||||
|
||||
mocks = self._make_vector_import_mocks()
|
||||
mock_pgvector_class = MagicMock()
|
||||
mocks['langbot.pkg.vector.vdbs.pgvector_db'].PgVectorDatabase = mock_pgvector_class
|
||||
|
||||
with isolated_sys_modules(mocks):
|
||||
from langbot.pkg.vector.mgr import VectorDBManager
|
||||
|
||||
mgr = VectorDBManager(mock_app)
|
||||
|
||||
import asyncio
|
||||
asyncio.get_event_loop().run_until_complete(mgr.initialize())
|
||||
|
||||
mock_pgvector_class.assert_called_once_with(
|
||||
mock_app,
|
||||
connection_string='postgresql://user:pass@host:5432/langbot'
|
||||
)
|
||||
|
||||
def test_initialize_pgvector_with_individual_params(self):
|
||||
"""pgvector with individual connection parameters."""
|
||||
vdb_config = {
|
||||
'use': 'pgvector',
|
||||
'pgvector': {
|
||||
'host': 'db.example.com',
|
||||
'port': 5433,
|
||||
'database': 'vectordb',
|
||||
'user': 'admin',
|
||||
'password': 'secret'
|
||||
}
|
||||
}
|
||||
mock_app = self._create_mock_app(vdb_config)
|
||||
|
||||
mocks = self._make_vector_import_mocks()
|
||||
mock_pgvector_class = MagicMock()
|
||||
mocks['langbot.pkg.vector.vdbs.pgvector_db'].PgVectorDatabase = mock_pgvector_class
|
||||
|
||||
with isolated_sys_modules(mocks):
|
||||
from langbot.pkg.vector.mgr import VectorDBManager
|
||||
|
||||
mgr = VectorDBManager(mock_app)
|
||||
|
||||
import asyncio
|
||||
asyncio.get_event_loop().run_until_complete(mgr.initialize())
|
||||
|
||||
mock_pgvector_class.assert_called_once_with(
|
||||
mock_app,
|
||||
host='db.example.com',
|
||||
port=5433,
|
||||
database='vectordb',
|
||||
user='admin',
|
||||
password='secret'
|
||||
)
|
||||
|
||||
def test_initialize_pgvector_defaults(self):
|
||||
"""pgvector defaults when no config params."""
|
||||
vdb_config = {'use': 'pgvector'}
|
||||
mock_app = self._create_mock_app(vdb_config)
|
||||
|
||||
mocks = self._make_vector_import_mocks()
|
||||
mock_pgvector_class = MagicMock()
|
||||
mocks['langbot.pkg.vector.vdbs.pgvector_db'].PgVectorDatabase = mock_pgvector_class
|
||||
|
||||
with isolated_sys_modules(mocks):
|
||||
from langbot.pkg.vector.mgr import VectorDBManager
|
||||
|
||||
mgr = VectorDBManager(mock_app)
|
||||
|
||||
import asyncio
|
||||
asyncio.get_event_loop().run_until_complete(mgr.initialize())
|
||||
|
||||
mock_pgvector_class.assert_called_once_with(
|
||||
mock_app,
|
||||
host='localhost',
|
||||
port=5432,
|
||||
database='langbot',
|
||||
user='postgres',
|
||||
password='postgres'
|
||||
)
|
||||
|
||||
def test_initialize_unknown_backend_defaults_to_chroma(self):
|
||||
"""Unknown vdb type defaults to Chroma with warning."""
|
||||
vdb_config = {'use': 'unknown_backend'}
|
||||
mock_app = self._create_mock_app(vdb_config)
|
||||
|
||||
mocks = self._make_vector_import_mocks()
|
||||
mock_chroma_class = MagicMock()
|
||||
mocks['langbot.pkg.vector.vdbs.chroma'].ChromaVectorDatabase = mock_chroma_class
|
||||
|
||||
with isolated_sys_modules(mocks):
|
||||
from langbot.pkg.vector.mgr import VectorDBManager
|
||||
|
||||
mgr = VectorDBManager(mock_app)
|
||||
|
||||
import asyncio
|
||||
asyncio.get_event_loop().run_until_complete(mgr.initialize())
|
||||
|
||||
mock_chroma_class.assert_called_once_with(mock_app)
|
||||
mock_app.logger.warning.assert_called()
|
||||
# Should warn about no valid backend
|
||||
warning_msg = mock_app.logger.warning.call_args[0][0]
|
||||
assert 'No valid' in warning_msg or 'defaulting' in warning_msg
|
||||
|
||||
|
||||
class TestVectorDBManagerProxies:
|
||||
"""Tests for VectorDBManager proxy methods."""
|
||||
|
||||
def test_get_supported_search_types_no_vector_db(self):
|
||||
"""get_supported_search_types returns vector when no vector_db."""
|
||||
mock_app = MagicMock()
|
||||
mock_app.instance_config = MagicMock()
|
||||
mock_app.instance_config.data = MagicMock()
|
||||
mock_app.instance_config.data.get = MagicMock(return_value=None)
|
||||
mock_app.logger = MagicMock()
|
||||
|
||||
mocks = {'langbot.pkg.core.app': MagicMock()}
|
||||
for backend in ['chroma', 'qdrant', 'seekdb', 'milvus', 'pgvector_db']:
|
||||
mocks[f'langbot.pkg.vector.vdbs.{backend}'] = MagicMock()
|
||||
|
||||
with isolated_sys_modules(mocks):
|
||||
from langbot.pkg.vector.mgr import VectorDBManager
|
||||
|
||||
mgr = VectorDBManager(mock_app)
|
||||
mgr.vector_db = None # Explicitly None
|
||||
|
||||
result = mgr.get_supported_search_types()
|
||||
assert result == ['vector']
|
||||
|
||||
def test_get_supported_search_types_with_vector_db(self):
|
||||
"""get_supported_search_types delegates to vector_db."""
|
||||
mock_app = MagicMock()
|
||||
|
||||
# Create mock vector_db with supported_search_types
|
||||
mock_vector_db = MagicMock()
|
||||
mock_vector_db.supported_search_types = MagicMock(
|
||||
return_value=[
|
||||
MagicMock(value='vector'),
|
||||
MagicMock(value='full_text'),
|
||||
]
|
||||
)
|
||||
|
||||
mocks = {'langbot.pkg.core.app': MagicMock()}
|
||||
for backend in ['chroma', 'qdrant', 'seekdb', 'milvus', 'pgvector_db']:
|
||||
mocks[f'langbot.pkg.vector.vdbs.{backend}'] = MagicMock()
|
||||
|
||||
with isolated_sys_modules(mocks):
|
||||
from langbot.pkg.vector.mgr import VectorDBManager
|
||||
|
||||
mgr = VectorDBManager(mock_app)
|
||||
mgr.vector_db = mock_vector_db
|
||||
|
||||
result = mgr.get_supported_search_types()
|
||||
assert result == ['vector', 'full_text']
|
||||
173
tests/unit_tests/vector/test_vdb_base.py
Normal file
173
tests/unit_tests/vector/test_vdb_base.py
Normal file
@@ -0,0 +1,173 @@
|
||||
"""Tests for VectorDatabase base class and SearchType enum."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from unittest.mock import AsyncMock
|
||||
import pytest
|
||||
|
||||
from langbot.pkg.vector.vdb import SearchType, VectorDatabase
|
||||
|
||||
|
||||
class TestSearchType:
|
||||
"""Tests for SearchType enum."""
|
||||
|
||||
def test_search_type_values(self):
|
||||
"""Test SearchType enum values."""
|
||||
assert SearchType.VECTOR.value == 'vector'
|
||||
assert SearchType.FULL_TEXT.value == 'full_text'
|
||||
assert SearchType.HYBRID.value == 'hybrid'
|
||||
|
||||
def test_search_type_is_string_enum(self):
|
||||
"""SearchType is a string enum."""
|
||||
assert isinstance(SearchType.VECTOR, str)
|
||||
assert SearchType.VECTOR == 'vector'
|
||||
|
||||
def test_search_type_from_string(self):
|
||||
"""Can create SearchType from string."""
|
||||
assert SearchType('vector') == SearchType.VECTOR
|
||||
assert SearchType('full_text') == SearchType.FULL_TEXT
|
||||
assert SearchType('hybrid') == SearchType.HYBRID
|
||||
|
||||
|
||||
class TestVectorDatabaseAbstractMethods:
|
||||
"""Tests for VectorDatabase abstract methods."""
|
||||
|
||||
def test_vector_database_is_abstract(self):
|
||||
"""VectorDatabase is abstract and cannot be instantiated directly."""
|
||||
with pytest.raises(TypeError):
|
||||
VectorDatabase()
|
||||
|
||||
def test_abstract_methods_required(self):
|
||||
"""Subclass must implement all abstract methods."""
|
||||
class IncompleteVectorDB(VectorDatabase):
|
||||
pass
|
||||
|
||||
with pytest.raises(TypeError):
|
||||
IncompleteVectorDB()
|
||||
|
||||
def test_supported_search_types_default(self):
|
||||
"""Default supported_search_types returns [VECTOR]."""
|
||||
class MinimalVectorDB(VectorDatabase):
|
||||
async def add_embeddings(self, collection, ids, embeddings_list, metadatas, documents=None):
|
||||
pass
|
||||
|
||||
async def search(self, collection, query_embedding, k=5, search_type='vector', query_text='', filter=None, vector_weight=None):
|
||||
pass
|
||||
|
||||
async def delete_by_file_id(self, collection, file_id):
|
||||
pass
|
||||
|
||||
async def delete_by_filter(self, collection, filter):
|
||||
pass
|
||||
|
||||
async def get_or_create_collection(self, collection):
|
||||
pass
|
||||
|
||||
async def delete_collection(self, collection):
|
||||
pass
|
||||
|
||||
db = MinimalVectorDB()
|
||||
assert db.supported_search_types() == [SearchType.VECTOR]
|
||||
|
||||
def test_list_by_filter_default_implementation(self):
|
||||
"""list_by_filter has default implementation returning empty."""
|
||||
class MinimalVectorDB(VectorDatabase):
|
||||
async def add_embeddings(self, collection, ids, embeddings_list, metadatas, documents=None):
|
||||
pass
|
||||
|
||||
async def search(self, collection, query_embedding, k=5, search_type='vector', query_text='', filter=None, vector_weight=None):
|
||||
pass
|
||||
|
||||
async def delete_by_file_id(self, collection, file_id):
|
||||
pass
|
||||
|
||||
async def delete_by_filter(self, collection, filter):
|
||||
pass
|
||||
|
||||
async def get_or_create_collection(self, collection):
|
||||
pass
|
||||
|
||||
async def delete_collection(self, collection):
|
||||
pass
|
||||
|
||||
db = MinimalVectorDB()
|
||||
# list_by_filter should return empty list and -1 for total
|
||||
import asyncio
|
||||
result = asyncio.get_event_loop().run_until_complete(
|
||||
db.list_by_filter('test_collection')
|
||||
)
|
||||
assert result == ([], -1)
|
||||
|
||||
|
||||
class TestVectorDatabaseInterface:
|
||||
"""Tests for VectorDatabase interface contracts."""
|
||||
|
||||
@pytest.fixture
|
||||
def mock_vector_db(self):
|
||||
"""Create a minimal mock VectorDatabase for testing."""
|
||||
class MockVectorDB(VectorDatabase):
|
||||
def __init__(self):
|
||||
self.add_embeddings = AsyncMock()
|
||||
self.search = AsyncMock(return_value={
|
||||
'ids': [['id1', 'id2']],
|
||||
'distances': [[0.1, 0.2]],
|
||||
'metadatas': [[{'key': 'val1'}, {'key': 'val2'}]]
|
||||
})
|
||||
self.delete_by_file_id = AsyncMock()
|
||||
self.delete_by_filter = AsyncMock(return_value=5)
|
||||
self.get_or_create_collection = AsyncMock()
|
||||
self.delete_collection = AsyncMock()
|
||||
|
||||
async def add_embeddings(self, collection, ids, embeddings_list, metadatas, documents=None):
|
||||
pass
|
||||
|
||||
async def search(self, collection, query_embedding, k=5, search_type='vector', query_text='', filter=None, vector_weight=None):
|
||||
pass
|
||||
|
||||
async def delete_by_file_id(self, collection, file_id):
|
||||
pass
|
||||
|
||||
async def delete_by_filter(self, collection, filter):
|
||||
pass
|
||||
|
||||
async def get_or_create_collection(self, collection):
|
||||
pass
|
||||
|
||||
async def delete_collection(self, collection):
|
||||
pass
|
||||
|
||||
return MockVectorDB()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_add_embeddings_signature(self, mock_vector_db):
|
||||
"""add_embeddings has expected signature."""
|
||||
await mock_vector_db.add_embeddings(
|
||||
collection='test',
|
||||
ids=['id1', 'id2'],
|
||||
embeddings_list=[[0.1, 0.2], [0.3, 0.4]],
|
||||
metadatas=[{'a': 1}, {'b': 2}],
|
||||
documents=['doc1', 'doc2']
|
||||
)
|
||||
mock_vector_db.add_embeddings.assert_called_once()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_search_signature(self, mock_vector_db):
|
||||
"""search has expected signature with all optional params."""
|
||||
import numpy as np
|
||||
|
||||
await mock_vector_db.search(
|
||||
collection='test',
|
||||
query_embedding=np.array([0.1, 0.2]),
|
||||
k=10,
|
||||
search_type='hybrid',
|
||||
query_text='search text',
|
||||
filter={'file_id': 'abc'},
|
||||
vector_weight=0.7
|
||||
)
|
||||
mock_vector_db.search.assert_called_once()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_delete_by_filter_returns_int(self, mock_vector_db):
|
||||
"""delete_by_filter returns int count."""
|
||||
result = await mock_vector_db.delete_by_filter('test', {'file_id': 'abc'})
|
||||
assert isinstance(result, int)
|
||||
359
tests/unit_tests/vector/test_vdb_filter_conversion.py
Normal file
359
tests/unit_tests/vector/test_vdb_filter_conversion.py
Normal file
@@ -0,0 +1,359 @@
|
||||
"""Tests for VDB backend filter conversion functions.
|
||||
|
||||
Tests cover:
|
||||
- _build_qdrant_filter: Qdrant models.Filter conversion
|
||||
- _build_milvus_expr: Milvus boolean expression string conversion
|
||||
- _build_pg_conditions: PostgreSQL SQLAlchemy conditions conversion
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
from importlib import import_module
|
||||
|
||||
|
||||
def get_qdrant_module():
|
||||
"""Lazy import qdrant module."""
|
||||
return import_module('langbot.pkg.vector.vdbs.qdrant')
|
||||
|
||||
|
||||
def get_milvus_module():
|
||||
"""Lazy import milvus module."""
|
||||
return import_module('langbot.pkg.vector.vdbs.milvus')
|
||||
|
||||
|
||||
def get_pgvector_module():
|
||||
"""Lazy import pgvector module."""
|
||||
return import_module('langbot.pkg.vector.vdbs.pgvector_db')
|
||||
|
||||
|
||||
class TestQdrantFilterConversion:
|
||||
"""Tests for _build_qdrant_filter function."""
|
||||
|
||||
def test_empty_filter_returns_empty_must(self):
|
||||
"""Empty filter dict returns Filter with None must/must_not."""
|
||||
qdrant_module = get_qdrant_module()
|
||||
|
||||
result = qdrant_module._build_qdrant_filter({})
|
||||
assert result.must is None
|
||||
assert result.must_not is None
|
||||
|
||||
def test_eq_operator_creates_must_condition(self):
|
||||
"""$eq operator creates FieldCondition in must list."""
|
||||
qdrant_module = get_qdrant_module()
|
||||
from qdrant_client import models
|
||||
|
||||
result = qdrant_module._build_qdrant_filter({'file_id': 'abc'})
|
||||
|
||||
assert result.must is not None
|
||||
assert len(result.must) == 1
|
||||
condition = result.must[0]
|
||||
assert condition.key == 'file_id'
|
||||
assert isinstance(condition.match, models.MatchValue)
|
||||
assert condition.match.value == 'abc'
|
||||
|
||||
def test_ne_operator_creates_must_not_condition(self):
|
||||
"""$ne operator creates FieldCondition in must_not list."""
|
||||
qdrant_module = get_qdrant_module()
|
||||
from qdrant_client import models
|
||||
|
||||
result = qdrant_module._build_qdrant_filter({'status': {'$ne': 'deleted'}})
|
||||
|
||||
assert result.must_not is not None
|
||||
assert len(result.must_not) == 1
|
||||
condition = result.must_not[0]
|
||||
assert condition.key == 'status'
|
||||
assert isinstance(condition.match, models.MatchValue)
|
||||
assert condition.match.value == 'deleted'
|
||||
|
||||
def test_in_operator_creates_match_any(self):
|
||||
"""$in operator creates MatchAny condition."""
|
||||
qdrant_module = get_qdrant_module()
|
||||
from qdrant_client import models
|
||||
|
||||
result = qdrant_module._build_qdrant_filter({'file_type': {'$in': ['pdf', 'docx']}})
|
||||
|
||||
assert result.must is not None
|
||||
assert len(result.must) == 1
|
||||
condition = result.must[0]
|
||||
assert condition.key == 'file_type'
|
||||
assert isinstance(condition.match, models.MatchAny)
|
||||
assert condition.match.any == ['pdf', 'docx']
|
||||
|
||||
def test_nin_operator_creates_must_not_match_any(self):
|
||||
"""$nin operator creates MatchAny in must_not."""
|
||||
qdrant_module = get_qdrant_module()
|
||||
from qdrant_client import models
|
||||
|
||||
result = qdrant_module._build_qdrant_filter({'status': {'$nin': ['deleted', 'archived']}})
|
||||
|
||||
assert result.must_not is not None
|
||||
assert len(result.must_not) == 1
|
||||
condition = result.must_not[0]
|
||||
assert condition.key == 'status'
|
||||
assert isinstance(condition.match, models.MatchAny)
|
||||
assert condition.match.any == ['deleted', 'archived']
|
||||
|
||||
def test_range_operators_create_range_condition(self):
|
||||
"""$gt, $gte, $lt, $lte create Range conditions."""
|
||||
qdrant_module = get_qdrant_module()
|
||||
from qdrant_client import models
|
||||
|
||||
# Test $gt
|
||||
result = qdrant_module._build_qdrant_filter({'created_at': {'$gt': 100}})
|
||||
condition = result.must[0]
|
||||
assert isinstance(condition.range, models.Range)
|
||||
assert condition.range.gt == 100
|
||||
|
||||
# Test $gte
|
||||
result = qdrant_module._build_qdrant_filter({'created_at': {'$gte': 100}})
|
||||
condition = result.must[0]
|
||||
assert condition.range.gte == 100
|
||||
|
||||
# Test $lt
|
||||
result = qdrant_module._build_qdrant_filter({'created_at': {'$lt': 100}})
|
||||
condition = result.must[0]
|
||||
assert condition.range.lt == 100
|
||||
|
||||
# Test $lte
|
||||
result = qdrant_module._build_qdrant_filter({'created_at': {'$lte': 100}})
|
||||
condition = result.must[0]
|
||||
assert condition.range.lte == 100
|
||||
|
||||
def test_multiple_conditions_combined(self):
|
||||
"""Multiple conditions are combined in must/must_not."""
|
||||
qdrant_module = get_qdrant_module()
|
||||
|
||||
result = qdrant_module._build_qdrant_filter({
|
||||
'file_id': 'abc',
|
||||
'status': {'$ne': 'deleted'},
|
||||
'created_at': {'$gte': 100},
|
||||
})
|
||||
|
||||
assert len(result.must) == 2 # file_id eq + created_at gte
|
||||
assert len(result.must_not) == 1 # status ne
|
||||
|
||||
def test_implicit_eq_handled(self):
|
||||
"""Implicit $eq (bare value) is correctly handled."""
|
||||
qdrant_module = get_qdrant_module()
|
||||
from qdrant_client import models
|
||||
|
||||
result = qdrant_module._build_qdrant_filter({'field': 'value'})
|
||||
|
||||
assert result.must is not None
|
||||
condition = result.must[0]
|
||||
assert isinstance(condition.match, models.MatchValue)
|
||||
|
||||
|
||||
class TestMilvusFilterConversion:
|
||||
"""Tests for _build_milvus_expr function.
|
||||
|
||||
NOTE: Milvus only supports fields: 'text', 'file_id', 'chunk_uuid'
|
||||
Tests use only these supported fields.
|
||||
"""
|
||||
|
||||
def test_empty_filter_returns_empty_string(self):
|
||||
"""Empty filter dict returns empty string."""
|
||||
milvus_module = get_milvus_module()
|
||||
|
||||
result = milvus_module._build_milvus_expr({})
|
||||
assert result == ''
|
||||
|
||||
def test_eq_operator_expression(self):
|
||||
"""$eq operator creates == expression."""
|
||||
milvus_module = get_milvus_module()
|
||||
|
||||
result = milvus_module._build_milvus_expr({'file_id': 'abc'})
|
||||
assert result == 'file_id == "abc"'
|
||||
|
||||
def test_ne_operator_expression(self):
|
||||
"""$ne operator creates != expression."""
|
||||
milvus_module = get_milvus_module()
|
||||
|
||||
result = milvus_module._build_milvus_expr({'file_id': {'$ne': 'deleted'}})
|
||||
assert result == 'file_id != "deleted"'
|
||||
|
||||
def test_comparison_operators(self):
|
||||
"""$gt, $gte, $lt, $lte create comparison expressions."""
|
||||
milvus_module = get_milvus_module()
|
||||
|
||||
assert milvus_module._build_milvus_expr({'chunk_uuid': {'$gt': 'uuid_100'}}) == 'chunk_uuid > "uuid_100"'
|
||||
assert milvus_module._build_milvus_expr({'chunk_uuid': {'$gte': 'uuid_100'}}) == 'chunk_uuid >= "uuid_100"'
|
||||
assert milvus_module._build_milvus_expr({'chunk_uuid': {'$lt': 'uuid_100'}}) == 'chunk_uuid < "uuid_100"'
|
||||
assert milvus_module._build_milvus_expr({'chunk_uuid': {'$lte': 'uuid_100'}}) == 'chunk_uuid <= "uuid_100"'
|
||||
|
||||
def test_in_operator_expression(self):
|
||||
"""$in operator creates in [...] expression."""
|
||||
milvus_module = get_milvus_module()
|
||||
|
||||
result = milvus_module._build_milvus_expr({'file_id': {'$in': ['pdf', 'docx']}})
|
||||
assert result == 'file_id in ["pdf", "docx"]'
|
||||
|
||||
def test_nin_operator_expression(self):
|
||||
"""$nin operator creates not in [...] expression."""
|
||||
milvus_module = get_milvus_module()
|
||||
|
||||
result = milvus_module._build_milvus_expr({'file_id': {'$nin': ['deleted', 'archived']}})
|
||||
assert result == 'file_id not in ["deleted", "archived"]'
|
||||
|
||||
def test_multiple_conditions_joined_with_and(self):
|
||||
"""Multiple conditions are joined with 'and'."""
|
||||
milvus_module = get_milvus_module()
|
||||
|
||||
result = milvus_module._build_milvus_expr({
|
||||
'file_id': 'abc',
|
||||
'chunk_uuid': {'$ne': 'def'},
|
||||
})
|
||||
assert 'and' in result
|
||||
assert 'file_id == "abc"' in result
|
||||
assert 'chunk_uuid != "def"' in result
|
||||
|
||||
def test_string_value_escaped(self):
|
||||
"""String values are properly escaped."""
|
||||
milvus_module = get_milvus_module()
|
||||
|
||||
# Test backslash escape
|
||||
result = milvus_module._build_milvus_expr({'file_id': 'C:\\Users\\test'})
|
||||
assert '\\\\' in result
|
||||
|
||||
# Test quote escape
|
||||
result = milvus_module._build_milvus_expr({'file_id': 'test "quoted"'})
|
||||
assert '\\"' in result
|
||||
|
||||
def test_text_field_supported(self):
|
||||
"""text field is supported."""
|
||||
milvus_module = get_milvus_module()
|
||||
|
||||
result = milvus_module._build_milvus_expr({'text': 'some text'})
|
||||
assert result == 'text == "some text"'
|
||||
|
||||
def test_milvus_literal_function(self):
|
||||
"""Test _milvus_literal helper."""
|
||||
milvus_module = get_milvus_module()
|
||||
|
||||
assert milvus_module._milvus_literal('string') == '"string"'
|
||||
assert milvus_module._milvus_literal(42) == '42'
|
||||
assert milvus_module._milvus_literal(3.14) == '3.14'
|
||||
|
||||
def test_unsupported_field_dropped(self):
|
||||
"""Unsupported fields are dropped (not in _MILVUS_SUPPORTED_FIELDS)."""
|
||||
milvus_module = get_milvus_module()
|
||||
|
||||
result = milvus_module._build_milvus_expr({'unknown_field': 'value'})
|
||||
assert result == ''
|
||||
|
||||
def test_uuid_alias_resolved(self):
|
||||
"""'uuid' alias is resolved to 'chunk_uuid'."""
|
||||
milvus_module = get_milvus_module()
|
||||
|
||||
result = milvus_module._build_milvus_expr({'uuid': 'abc'})
|
||||
assert result.startswith('chunk_uuid')
|
||||
# uuid substring appears in chunk_uuid which is expected
|
||||
|
||||
|
||||
class TestPgVectorFilterConversion:
|
||||
"""Tests for _build_pg_conditions function.
|
||||
|
||||
NOTE: PGVector only supports fields: 'text', 'file_id', 'chunk_uuid'
|
||||
Tests use only these supported fields.
|
||||
"""
|
||||
|
||||
def test_empty_filter_returns_empty_list(self):
|
||||
"""Empty filter dict returns empty list."""
|
||||
pgvector_module = get_pgvector_module()
|
||||
|
||||
result = pgvector_module._build_pg_conditions({})
|
||||
assert result == []
|
||||
|
||||
def test_eq_operator_creates_equality_condition(self):
|
||||
"""$eq operator creates SQLAlchemy == condition."""
|
||||
pgvector_module = get_pgvector_module()
|
||||
|
||||
result = pgvector_module._build_pg_conditions({'file_id': 'abc'})
|
||||
|
||||
assert len(result) == 1
|
||||
# Verify it's a SQLAlchemy BinaryExpression
|
||||
from sqlalchemy.sql.expression import BinaryExpression
|
||||
assert isinstance(result[0], BinaryExpression)
|
||||
|
||||
def test_ne_operator_creates_inequality_condition(self):
|
||||
"""$ne operator creates SQLAlchemy != condition."""
|
||||
pgvector_module = get_pgvector_module()
|
||||
|
||||
result = pgvector_module._build_pg_conditions({'file_id': {'$ne': 'deleted'}})
|
||||
|
||||
assert len(result) == 1
|
||||
# Operator should be ne (not equals)
|
||||
assert '!=' in str(result[0]) or 'ne' in str(result[0].operator)
|
||||
|
||||
def test_comparison_operators(self):
|
||||
"""$gt, $gte, $lt, $lte create comparison conditions."""
|
||||
pgvector_module = get_pgvector_module()
|
||||
|
||||
# Test all comparison operators with supported field
|
||||
for op, expected_op in [
|
||||
('$gt', '>'),
|
||||
('$gte', '>='),
|
||||
('$lt', '<'),
|
||||
('$lte', '<='),
|
||||
]:
|
||||
result = pgvector_module._build_pg_conditions({'chunk_uuid': {op: 'uuid_100'}})
|
||||
assert len(result) == 1
|
||||
assert expected_op in str(result[0])
|
||||
|
||||
def test_in_operator_creates_in_condition(self):
|
||||
"""$in operator creates SQLAlchemy in_ condition."""
|
||||
pgvector_module = get_pgvector_module()
|
||||
|
||||
result = pgvector_module._build_pg_conditions({'file_id': {'$in': ['a', 'b', 'c']}})
|
||||
|
||||
assert len(result) == 1
|
||||
assert 'IN' in str(result[0]).upper()
|
||||
|
||||
def test_nin_operator_creates_notin_condition(self):
|
||||
"""$nin operator creates SQLAlchemy notin_ condition."""
|
||||
pgvector_module = get_pgvector_module()
|
||||
|
||||
result = pgvector_module._build_pg_conditions({'file_id': {'$nin': ['a', 'b']}})
|
||||
|
||||
assert len(result) == 1
|
||||
assert 'NOT IN' in str(result[0]).upper()
|
||||
|
||||
def test_multiple_conditions_list(self):
|
||||
"""Multiple conditions return list of conditions."""
|
||||
pgvector_module = get_pgvector_module()
|
||||
|
||||
result = pgvector_module._build_pg_conditions({
|
||||
'file_id': 'abc',
|
||||
'chunk_uuid': {'$ne': 'def'},
|
||||
})
|
||||
|
||||
assert len(result) == 2
|
||||
|
||||
def test_unsupported_field_dropped(self):
|
||||
"""Unsupported fields are dropped (not in _PG_SUPPORTED_FIELDS)."""
|
||||
pgvector_module = get_pgvector_module()
|
||||
|
||||
result = pgvector_module._build_pg_conditions({'unknown_field': 'value'})
|
||||
assert result == []
|
||||
|
||||
def test_uuid_alias_resolved(self):
|
||||
"""'uuid' alias is resolved to 'chunk_uuid'."""
|
||||
pgvector_module = get_pgvector_module()
|
||||
|
||||
result = pgvector_module._build_pg_conditions({'uuid': 'abc'})
|
||||
|
||||
assert len(result) == 1
|
||||
# Should reference chunk_uuid column
|
||||
assert 'chunk_uuid' in str(result[0])
|
||||
|
||||
def test_supported_fields_only(self):
|
||||
"""Only supported fields (text, file_id, chunk_uuid) are kept."""
|
||||
pgvector_module = get_pgvector_module()
|
||||
|
||||
result = pgvector_module._build_pg_conditions({
|
||||
'text': {'$ne': ''},
|
||||
'file_id': 'abc',
|
||||
'chunk_uuid': {'$in': ['x', 'y']},
|
||||
'unsupported': 'value',
|
||||
})
|
||||
|
||||
assert len(result) == 3 # Only supported fields
|
||||
Reference in New Issue
Block a user