Database Abstraction Layer Design¶
Status: Current Implementation Last Updated: January 5, 2026 Target Audience: Senior/Staff Engineers, Architecture Review
Table of Contents¶
- Architecture Overview
- Design Patterns
- Abstract Base Classes
- Implementations
- Factory Pattern
- Design Rationale
- Code Organization
Architecture Overview¶
The MCP Gateway Registry implements a repository pattern with abstract base classes to provide a clean separation between business logic and data storage. This design enables seamless switching between file-based storage (legacy, backwards compatible) and DocumentDB/MongoDB (recommended, production-ready) without modifying application code.
System Diagram¶
┌─────────────────────────────────────────────────────────────────┐
│ Application Layer │
│ (Services, API Endpoints, Business Logic) │
└────────────────────┬────────────────────────────────────────────┘
│ depends on
▼
┌─────────────────────────────────────────────────────────────────┐
│ Repository Factory Layer │
│ get_server_repository() │
│ get_agent_repository() │
│ get_scope_repository() │
│ get_security_scan_repository() │
│ get_search_repository() │
│ get_federation_config_repository() │
└────────┬──────────────────────────────┬────────────────────────┘
│ │
│ creates (based on │ creates (based on
│ STORAGE_BACKEND) │ STORAGE_BACKEND)
▼ ▼
┌──────────────────────┐ ┌──────────────────────────┐
│ File-Based Impl │ │ DocumentDB Impl │
├──────────────────────┤ ├──────────────────────────┤
│ FileServerRepo │ │ DocumentDBServerRepo │
│ FileAgentRepo │ │ DocumentDBAgentRepo │
│ FileScopeRepo │ │ DocumentDBScopeRepo │
│ FileSecurityScanRepo │ │ DocumentDBSecurityRepo │
│ FaissSearchRepo │ │ DocumentDBSearchRepo │
│ FileFederationRepo │ │ DocumentDBFederationRepo │
└──────────┬───────────┘ └──────────┬───────────────┘
│ │
▼ ▼
Local File System DocumentDB/MongoDB Cluster
(JSON files, FAISS) (Collections, Vector Search)
Key Principles¶
- Abstraction: All repositories implement abstract base classes defining strict contracts
- Polymorphism: Multiple storage backend implementations (file, DocumentDB/MongoDB)
- Dependency Injection: Factory pattern provides single point of repository creation
- Consistency: All implementations provide identical behavior regardless of backend
- Testability: Mock implementations easy to create; reset_repositories() enables test isolation
Design Patterns¶
Repository Pattern¶
The repository pattern provides a data access abstraction layer that:
- Isolates business logic from storage implementation details
- Defines clear, intent-driven interfaces (e.g.,
get(),create(),update(),delete()) - Makes switching storage backends transparent to consuming code
- Enables comprehensive testing without actual storage dependencies
Factory Pattern¶
The factory pattern manages repository instantiation by:
- Creating repositories lazily on first access (singleton pattern)
- Selecting implementation based on environment configuration (
STORAGE_BACKENDsetting) - Centralizing backend selection logic in one place (
factory.py) - Providing
reset_repositories()for test isolation
Strategy Pattern¶
Different storage backends are strategies implementing the same interface:
- File Strategy: YAML/JSON files + FAISS for search
- DocumentDB/MongoDB Strategy: Distributed document database with vector search and aggregation pipelines
Abstract Base Classes¶
All repository implementations inherit from abstract base classes defined in registry/repositories/interfaces.py. These define the contract that ALL implementations must follow.
ServerRepositoryBase¶
Location: registry/repositories/interfaces.py (lines 14-76)
Purpose: Data access for MCP server definitions and lifecycle management
Key Methods:
# Lifecycle operations
async def get(path: str) -> Optional[Dict[str, Any]]
async def list_all() -> Dict[str, Dict[str, Any]]
async def create(server_info: Dict[str, Any]) -> bool
async def update(path: str, server_info: Dict[str, Any]) -> bool
async def delete(path: str) -> bool
# State management (enabled/disabled)
async def get_state(path: str) -> bool
async def set_state(path: str, enabled: bool) -> bool
# Lifecycle
async def load_all() -> None # Load/reload from storage at startup
Implementations:
registry/repositories/file/server_repository.py- File-basedregistry/repositories/documentdb/server_repository.py- DocumentDB/MongoDB
AgentRepositoryBase¶
Location: registry/repositories/interfaces.py (lines 78-140)
Purpose: Data access for A2A (Agent-to-Agent) agent definitions
Key Methods:
async def get(path: str) -> Optional[AgentCard]
async def list_all() -> List[AgentCard]
async def create(agent: AgentCard) -> AgentCard
async def update(path: str, updates: Dict[str, Any]) -> AgentCard
async def delete(path: str) -> bool
async def get_state(path: str) -> bool
async def set_state(path: str, enabled: bool) -> bool
async def load_all() -> None # Load/reload from storage
Implementations:
registry/repositories/file/agent_repository.py- File-basedregistry/repositories/documentdb/agent_repository.py- DocumentDB/MongoDB
ScopeRepositoryBase¶
Location: registry/repositories/interfaces.py (lines 142-506)
Purpose: Data access for authorization scopes (RBAC - Role-Based Access Control)
Overview:
The Scope Repository manages a complex authorization model with three primary data structures:
- UI-Scopes - Permissions for UI features grouped by Keycloak group
- Server Scopes - Server access rules (which servers, methods, tools)
- Group Mappings - Mapping between Keycloak groups and scopes
Key Methods:
# UI permission queries
async def get_ui_scopes(group_name: str) -> Dict[str, Any]
async def get_group_mappings(keycloak_group: str) -> List[str]
# Server access rules
async def get_server_scopes(scope_name: str) -> List[Dict[str, Any]]
# Group management
async def get_group(group_name: str) -> Dict[str, Any]
async def list_groups() -> Dict[str, Any]
async def group_exists(group_name: str) -> bool
async def create_group(group_name: str, description: str = "") -> bool
async def delete_group(group_name: str, remove_from_mappings: bool = True) -> bool
# Server scope assignment
async def add_server_scope(
server_path: str,
scope_name: str,
methods: List[str],
tools: Optional[List[str]] = None
) -> bool
async def remove_server_scope(server_path: str, scope_name: str) -> bool
async def remove_server_from_all_scopes(server_path: str) -> bool
# UI visibility management
async def add_server_to_ui_scopes(group_name: str, server_name: str) -> bool
async def remove_server_from_ui_scopes(group_name: str, server_name: str) -> bool
# Scope mapping management
async def add_group_mapping(group_name: str, scope_name: str) -> bool
async def remove_group_mapping(group_name: str, scope_name: str) -> bool
async def get_all_group_mappings() -> Dict[str, List[str]]
# Bulk operations
async def add_server_to_multiple_scopes(
server_path: str,
scope_names: List[str],
methods: List[str],
tools: List[str]
) -> bool
async def load_all() -> None # Load/reload from storage
Implementations:
registry/repositories/file/scope_repository.py- File-based (YAML)registry/repositories/documentdb/scope_repository.py- DocumentDB/MongoDB
SecurityScanRepositoryBase¶
Location: registry/repositories/interfaces.py (lines 508-598)
Purpose: Data access for security scanning results (both MCP servers and agents)
Key Methods:
# CRUD operations
async def get(server_path: str) -> Optional[Dict[str, Any]]
async def list_all() -> List[Dict[str, Any]]
async def create(scan_result: Dict[str, Any]) -> bool
# Querying
async def get_latest(server_path: str) -> Optional[Dict[str, Any]]
async def query_by_status(status: str) -> List[Dict[str, Any]]
async def load_all() -> None # Load/reload from storage
Implementations:
registry/repositories/file/security_scan_repository.py- File-basedregistry/repositories/documentdb/security_scan_repository.py- DocumentDB/MongoDB
SearchRepositoryBase¶
Location: registry/repositories/interfaces.py (lines 600-645)
Purpose: Data access for semantic/hybrid search functionality
Key Methods:
async def initialize() -> None # Initialize search service
# Indexing
async def index_server(
path: str,
server_info: Dict[str, Any],
is_enabled: bool = False
) -> None
async def index_agent(
path: str,
agent_card: AgentCard,
is_enabled: bool = False
) -> None
# Query
async def search(
query: str,
entity_types: Optional[List[str]] = None,
max_results: int = 10
) -> Dict[str, List[Dict[str, Any]]]
async def remove_entity(path: str) -> None
Implementations:
registry/repositories/file/search_repository.py- FAISS (Facebook AI Similarity Search)registry/repositories/documentdb/search_repository.py- DocumentDB/MongoDB Hybrid Search (BM25 + k-NN)
FederationConfigRepositoryBase¶
Location: registry/repositories/interfaces.py (lines 647-709)
Purpose: Data access for federation configuration (multi-registry federation)
Key Methods:
# CRUD operations
async def get_config(config_id: str = "default") -> Optional[FederationConfig]
async def save_config(
config: FederationConfig,
config_id: str = "default"
) -> FederationConfig
async def delete_config(config_id: str = "default") -> bool
async def list_configs() -> List[Dict[str, Any]]
Implementations:
registry/repositories/file/federation_config_repository.py- File-based (JSON)registry/repositories/documentdb/federation_config_repository.py- DocumentDB/MongoDB
Implementations¶
File-Based Backend (Legacy, Backwards Compatible)¶
Purpose: Local file system storage for development, testing, and backwards compatibility
Characteristics:
- Simplicity: JSON/YAML files, no external dependencies (except FAISS for search)
- Isolation: No network calls required
- State Management: Separate state files for enabled/disabled status
- Search: FAISS (Facebook AI Similarity Search) for vector similarity
Storage Structure:
registry/
├── servers/
│ ├── server1.json
│ ├── server2.json
│ └── server_state.json # Persistence for enabled/disabled state
├── agents/
│ ├── agent1_agent.json
│ ├── agent2_agent.json
│ └── agent_state.json
├── config/
│ ├── scopes.yml # Authorization scopes configuration
│ └── federation/
│ └── default.json # Federation configuration
├── security_scans/
│ ├── scan_result_1.json
│ └── scan_result_2.json
├── models/
│ └── all-MiniLM-L6-v2/ # Embedding model weights
└── service_index.faiss # FAISS vector index
Implementation Classes:
FileServerRepositoryFileAgentRepositoryFileScopeRepositoryFileSecurityScanRepositoryFaissSearchRepositoryFileFederationConfigRepository
Advantages:
- No infrastructure setup needed
- Good for development and testing
- Human-readable file formats
- Git-friendly for version control
Limitations:
- Single-node only (no distributed deployment)
- Limited query capabilities
- File locking issues in concurrent scenarios
- Not suitable for production at scale
- FAISS requires model file storage
DocumentDB/MongoDB Backend (Recommended for Production)¶
Purpose: Distributed document database for production deployments
Characteristics:
- Scalability: Clustered deployment for high availability (DocumentDB) or replica sets (MongoDB)
- Query Capabilities: Rich aggregation pipelines, complex filtering, projections
- Vector Search: Native vector search (DocumentDB) or application-level (MongoDB CE)
- Collection Management: Automatic index creation with proper field mappings
- Async Support: Full async/await implementation for non-blocking I/O
- Strong Consistency: ACID transactions and strong consistency guarantees
Collection Structure:
DocumentDB/MongoDB Cluster
├── mcp_servers_{namespace} # Server definitions
├── mcp_agents_{namespace} # Agent definitions
├── mcp_scopes_{namespace} # Authorization scopes
├── mcp_security_scans_{namespace} # Security scan results
├── mcp_embeddings_1536_{namespace} # Vector embeddings
└── mcp_federation_config_{namespace} # Federation configurations
Implementation Classes:
DocumentDBServerRepositoryDocumentDBAgentRepositoryDocumentDBScopeRepositoryDocumentDBSecurityScanRepositoryDocumentDBSearchRepositoryDocumentDBFederationConfigRepository
DocumentDB Client: registry/repositories/documentdb/client.py
Provides:
- Async MongoDB client singleton
- Connection pooling and authentication
- Index name management with namespace support
- Client lifecycle management (initialization and cleanup)
Advantages:
- Distributed, highly available
- Rich query capabilities
- Hybrid search (BM25 + k-NN) for semantic understanding
- Native vector support for embeddings
- Scales to millions of documents
- Production-ready
Limitations:
- Requires DocumentDB cluster or MongoDB CE instance
- Higher operational complexity
- More network I/O
- Requires proper index tuning for performance
Hybrid Search Explanation:
DocumentDB/MongoDB vector search provides semantic similarity:
- BM25 (Best Matching 25): Full-text relevance scoring based on term frequency
- Good for exact keyword matches
- Fast and traditional IR approach
-
Default web search behavior
-
k-NN (k-Nearest Neighbors): Vector similarity search
- Semantic understanding
- Captures meaning, not just keywords
- Uses neural embeddings (sentence-transformers or Bedrock Titan)
Default Weighting:
This gives 60% weight to semantic search (vector similarity) and 40% to keyword matching, optimized for finding semantically relevant servers/agents.
Factory Pattern¶
Factory Implementation¶
Location: registry/repositories/factory.py
The factory provides singleton repository instances with lazy initialization:
def get_server_repository() -> ServerRepositoryBase:
"""Get server repository singleton."""
global _server_repo
if _server_repo is not None:
return _server_repo
backend = settings.storage_backend
logger.info(f"Creating server repository with backend: {backend}")
if backend == "documentdb":
from .documentdb.server_repository import DocumentDBServerRepository
_server_repo = DocumentDBServerRepository()
else:
from .file.server_repository import FileServerRepository
_server_repo = FileServerRepository()
return _server_repo
Similar factory functions exist for:
get_agent_repository()get_scope_repository()get_security_scan_repository()get_search_repository()get_federation_config_repository()
Backend Selection Logic¶
Backend selection is controlled by the storage_backend setting in registry/core/config.py:
How it works:
- At startup: Application reads
STORAGE_BACKENDenvironment variable (or defaults to "file") - On first access: Factory function checks backend setting
- Instance creation: Creates appropriate implementation (file vs DocumentDB/MongoDB)
- Caching: Returns cached singleton on subsequent calls
Dependency Injection Pattern¶
Services consume repositories through factory functions:
# In any service module
from registry.repositories.factory import get_server_repository
async def list_servers():
repo = get_server_repository() # Gets file or DocumentDB/MongoDB impl
servers = await repo.list_all()
return servers
Benefits:
- No hardcoded dependencies
- Easy to swap implementations
- Test isolation via
reset_repositories()
Test Isolation¶
For testing, repositories can be reset:
from registry.repositories.factory import reset_repositories
async def test_server_creation():
reset_repositories() # Clear all singletons
# Now fresh repositories are created
repo = get_server_repository()
# ... test code ...
Design Rationale¶
Why Repository Pattern?¶
The repository pattern addresses several critical concerns:
- Separation of Concerns: Business logic doesn't know about storage details
- Controllers/services call
repo.get(), not filesystem or database directly -
Easier to test services in isolation
-
Multiple Storage Backends: Single interface, multiple implementations
- Same code works with file or DocumentDB/MongoDB backend
- Switching backends requires only env var change
-
No code changes needed
-
Consistency: All implementations follow same contract
- Same method signatures and behavior
- Predictable error handling
-
Consistent data transformations
-
Abstraction: Storage complexity hidden behind simple interface
repo.get(path)is simple whether backed by files or DocumentDB/MongoDB- Consumers don't care about implementation details
Benefits of Repository Abstraction¶
For Development:
- Develop with file backend (fast, no setup)
- Test with mocks (no I/O overhead)
- Easy to add new storage backends
For Deployment:
- Production uses DocumentDB/MongoDB (scalable)
- Backwards compatible with file storage
- Can migrate gradually
For Testing:
- Mock repositories easily
- Test services without actual storage
- Test data isolation via
reset_repositories()
For Maintenance:
- Storage logic centralized in repository classes
- Changes to storage don't affect business logic
- Clear interfaces make code changes safer
Interface Consistency¶
All repositories implement abstract base classes with:
# All repos have these patterns
async def load_all() -> None # Startup initialization
async def get(id: str) -> Optional # Single entity retrieval
async def list_all() -> List/Dict # Bulk retrieval
async def create(entity) -> bool/Entity # Create new entity
async def update(id, updates) -> bool/Entity # Modify existing
async def delete(id) -> bool # Remove entity
This consistent interface means:
- Easier onboarding for engineers
- Less mental overhead switching between repositories
- Standardized error handling
Data Consistency Strategy¶
File Backend:
- Atomic file operations
- Separate state file for enabled/disabled status
- No transactions (file-by-file consistency)
- Best effort on concurrent writes
DocumentDB/MongoDB Backend:
- Document-level consistency
- Index operations with refresh flags
- No distributed transactions
- Eventual consistency model
- Suitable for high concurrency
Code Organization¶
Directory Structure¶
registry/repositories/
├── __init__.py # Package initialization
├── interfaces.py # Abstract base classes (6 repos)
├── factory.py # Repository factory
│
├── file/ # File-based implementations
│ ├── __init__.py
│ ├── server_repository.py
│ ├── agent_repository.py
│ ├── scope_repository.py
│ ├── security_scan_repository.py
│ ├── search_repository.py # FAISS-based
│ └── federation_config_repository.py
│
└── documentdb/ # DocumentDB/MongoDB implementations
├── __init__.py
├── client.py # MongoDB client management
├── server_repository.py
├── agent_repository.py
├── scope_repository.py
├── security_scan_repository.py
├── search_repository.py # Vector search (native or app-level)
└── federation_config_repository.py
Naming Conventions¶
Abstract Base Classes (in interfaces.py):
ServerRepositoryBaseAgentRepositoryBaseScopeRepositoryBaseSecurityScanRepositoryBaseSearchRepositoryBaseFederationConfigRepositoryBase
File Implementations (in file/):
FileServerRepositoryFileAgentRepositoryFileScopeRepositoryFileSecurityScanRepositoryFaissSearchRepository(special: uses FAISS, not files)FileFederationConfigRepository
DocumentDB/MongoDB Implementations (in documentdb/):
DocumentDBServerRepositoryDocumentDBAgentRepositoryDocumentDBScopeRepositoryDocumentDBSecurityScanRepositoryDocumentDBSearchRepositoryDocumentDBFederationConfigRepository
Configuration¶
Backend selection and DocumentDB/MongoDB settings in registry/core/config.py:
# Backend selection
storage_backend: str = "file" # "file", "mongodb", or "documentdb"
# DocumentDB/MongoDB connection
documentdb_endpoint: str = "localhost"
documentdb_port: int = 27017
documentdb_username: Optional[str] = None
documentdb_password: Optional[str] = None
documentdb_database: str = "mcp_gateway"
documentdb_use_tls: bool = False
documentdb_tls_ca_file: Optional[str] = None
# Multi-tenancy support
documentdb_namespace: str = "default"
# Collection names (with namespace suffix)
documentdb_collection_servers: str = "mcp_servers"
documentdb_collection_agents: str = "mcp_agents"
documentdb_collection_scopes: str = "mcp_scopes"
documentdb_collection_embeddings: str = "mcp_embeddings_1536"
documentdb_collection_security_scans: str = "mcp_security_scans"
documentdb_collection_federation_config: str = "mcp_federation_config"
# Vector search configuration
documentdb_vector_dimension: int = 1536 # For embeddings
Advanced Topics¶
Namespace Support (Multi-Tenancy)¶
DocumentDB/MongoDB implementation supports multi-tenancy via namespaces:
# In config.py
documentdb_namespace: str = "default"
# Collection names are automatically namespaced
# mcp_servers_{namespace}
# mcp_agents_{namespace}
# etc.
Use Cases:
- Multiple registry instances on shared DocumentDB/MongoDB cluster
- Isolated test environments
- Customer separation in SaaS deployments
Implementation: registry/repositories/documentdb/client.py
def get_index_name(base_name: str) -> str:
"""Get full index name with namespace."""
return f"{base_name}-{settings.documentdb_namespace}"
Error Handling¶
Both implementations handle errors gracefully:
File Backend:
- Missing files → return None or empty list
- Parse errors → log and skip
- I/O errors → raised to caller
DocumentDB/MongoDB Backend:
- Connection errors → log and attempt retry
- Index not found → initialize index
- Query errors → log and raise
Performance Considerations¶
File Backend:
- Fast for small datasets (<1000 entities)
- No network latency
- FAISS search: O(n) linear scan (not scalable)
- Not suitable for high concurrency
DocumentDB/MongoDB Backend:
- Scalable to millions of entities
- Network latency (typically <100ms)
- BM25 + k-NN: O(log n) with proper indexing
- Built for high concurrency (lock-free reads)
Recommendations:
- Use file backend for development only
- Use DocumentDB/MongoDB for production
- Monitor DocumentDB/MongoDB query latency
- Tune index refresh intervals for throughput vs. latency tradeoff
Testing Strategy¶
Unit Tests¶
Mock repositories for unit testing:
class MockServerRepository(ServerRepositoryBase):
"""Mock implementation for testing."""
def __init__(self):
self.servers = {}
async def get(self, path: str) -> Optional[Dict]:
return self.servers.get(path)
async def list_all(self) -> Dict:
return self.servers.copy()
# ... implement other methods ...
Integration Tests¶
Use file backend for integration tests:
@pytest.fixture
def reset_repos():
"""Reset repositories between tests."""
yield
reset_repositories()
async def test_server_lifecycle(reset_repos):
repo = get_server_repository()
# File backend used by default
server = {"path": "/test", "name": "test_server"}
await repo.create(server)
result = await repo.get("/test")
assert result is not None
End-to-End Tests¶
Test against MongoDB CE in Docker:
# docker-compose.yml
services:
mongodb:
image: mongo:8.2
command: ["--replSet", "rs0", "--bind_ip_all"]
ports:
- "27017:27017"
Set STORAGE_BACKEND=mongodb and run full integration tests.
Summary¶
The database abstraction layer provides a clean, extensible architecture for data storage in the MCP Gateway Registry:
| Aspect | File | DocumentDB/MongoDB |
|---|---|---|
| Use Case | Development, testing | Production |
| Scalability | ~1000 entities | Millions |
| Dependencies | None (+ FAISS) | DocumentDB/MongoDB cluster |
| Query Power | Basic | Advanced (aggregation pipelines) |
| Concurrency | Limited | High |
| Setup Complexity | None | Moderate |
| Cost | Free | Infrastructure cost |
The repository pattern with factory ensures:
- Clean separation of concerns
- Easy backend switching
- Comprehensive testability
- Consistent interfaces
- Future extensibility
All implementations maintain identical behavior, making backend selection purely an operational decision rather than a code architecture choice.