# MCP Gateway & Registry - High-Level Summary

This project provides an enterprise-ready gateway and registry for Model Context Protocol (MCP) servers, enabling centralized management, secure access, and dynamic tool discovery for AI agents and development teams. The core goal is to transform the chaos of managing hundreds of individual MCP server connections into a unified, governed platform with comprehensive authentication, fine-grained access control, and intelligent tool discovery capabilities.

The repository provides:

1. **Centralized Gateway & Registry**: A unified platform for managing and accessing MCP servers across an organization
2. **Enterprise Authentication**: Multi-provider OAuth 2.0 support with Keycloak, Amazon Cognito, Microsoft Entra ID, and custom identity providers
3. **Fine-Grained Access Control**: Scope-based authorization at server, method, and individual tool levels
4. **Dynamic Tool Discovery**: AI-powered hybrid search (BM25 + vector k-NN) with flexible embedding providers (local, OpenAI, LiteLLM, Bedrock) for autonomous tool discovery
5. **Comprehensive Observability**: Dual-path metrics collection with SQLite and OpenTelemetry for detailed analytics
6. **Production-Ready Deployment**: Docker-based deployment with support for EC2, EKS, and container orchestration

Key features include: centralized server management, OAuth 2.0/3.0 authentication flows, token vending service, automated token refresh, dynamic tool discovery and invocation, real-time health monitoring, Grafana dashboards, security scanning integration, and Anthropic MCP Registry compatibility.

---

# MCP Gateway & Registry

## 1. Overview

- **Project Name:** MCP Gateway & Registry
- **Purpose:** Enterprise-ready platform for managing, securing, and accessing Model Context Protocol (MCP) servers at scale
- **Core Goal:** Transform scattered MCP server connections into a centralized, governed platform with unified authentication and intelligent tool discovery
- **Communication:** Uses MCP protocol over SSE (Server-Sent Events) and Streamable HTTP
- **Key Components:** Gateway (Nginx reverse proxy), Registry (Web UI & API), Auth Server (OAuth/JWT), MCP Servers, Metrics Service, Token Refresh Service

### 1.5. Repository Structure

**Top-Level Directories:**

| Directory | Purpose | Key Files |
|-----------|---------|-----------|
| `registry/` | **Core Registry Application** - FastAPI backend, repositories, services | `main.py` (FastAPI app), `core/config.py` (settings) |
| `auth_server/` | **OAuth Authentication Server** - Keycloak/Cognito/Entra ID integration | `server.py` (Flask auth server), `scopes.yml` (group mappings) |
| `frontend/` | **Web UI** - React/TypeScript admin dashboard | `src/App.tsx` (main app), `src/pages/` (UI pages) |
| `servers/` | **Example MCP Servers** - Reference implementations | `currenttime/`, `mcpgw/` (MCP Gateway server) |
| `terraform/` | **Infrastructure as Code** - AWS deployment automation | `aws-ecs/` (ECS Fargate), `modules/` (reusable modules) |
| `tests/` | **Test Suite** - pytest unit, integration, E2E tests | `conftest.py`, `unit/`, `integration/` |
| `docs/` | **Documentation** - Architecture, guides, design docs | `llms.txt` (this file), `design/` (architecture) |
| `cli/` | **Command-Line Tools** - MCP client, registry management | `mcp_client.py`, `registry_cli_wrapper.py` |
| `keycloak/` | **Keycloak Setup** - Docker configs, initialization scripts | `docker-compose.yml`, `setup/` (init scripts) |
| `docker/` | **Docker Configurations** - Dockerfiles for all services | `Dockerfile.registry`, `Dockerfile.auth` |
| `scripts/` | **Automation Scripts** - Deployment, testing, utilities | `test.py` (test runner), `publish-containers.sh` |
| `config/` | **Configuration Templates** - Nginx, environment examples | `nginx.conf.template`, `.env.example` |
| `metrics-service/` | **Metrics Collection** - OpenTelemetry metrics service | `app/main.py` (FastAPI metrics API) |
| `charts/` | **Helm Charts** - Kubernetes deployment manifests | `mcp-gateway/` (Helm chart) |
| `credentials-provider/` | **Credential Management** - OAuth token handling | `token_refresher.py`, `generate_creds.sh` |
| `agents/` | **AI Agent Implementation** - Reference agent with A2A protocol support | `agent.py`, `client.py`, `a2a/` (A2A protocol) |
| `api/` | **Legacy API** - Deprecated standalone API (use `registry/api/`) | - |
| `release-notes/` | **Release History** - Version release notes | Markdown files per version |

**Registry Application Structure (`registry/`):**

| Subdirectory | Purpose | Key Files |
|--------------|---------|-----------|
| `api/` | **API Routes** - FastAPI endpoint definitions | `server_routes.py`, `agent_routes.py`, `search_routes.py`, `federation_routes.py`, `management_routes.py`, `registry_routes.py`, `wellknown_routes.py` |
| `services/` | **Business Logic** - Service layer between routes and repositories | `server_service.py`, `agent_service.py`, `rating_service.py`, `security_scanner.py`, `agent_scanner.py`, `federation_service.py`, `transform_service.py`, `scope_service.py` |
| `repositories/` | **Data Access Layer** - Abstract repositories with multiple backends | `interfaces.py` (abstract base classes), `factory.py`, `documentdb/`, `file/` |
| `repositories/documentdb/` | **DocumentDB/MongoDB Implementation** - Production and development storage backend | `server_repository.py`, `agent_repository.py`, `scope_repository.py`, `search_repository.py`, `security_scan_repository.py`, `federation_config_repository.py`, `client.py` |
| `repositories/file/` | **File Implementation** - Legacy storage backend (DEPRECATED) | `server_repository.py`, `agent_repository.py`, `scope_repository.py`, `search_repository.py`, `security_scan_repository.py`, `federation_config_repository.py` |
| `schemas/` | **Pydantic Models** - Request/response validation | `server.py`, `agent.py`, `auth.py`, `search.py`, `security.py`, `rating.py` |
| `auth/` | **Authentication Logic** - JWT validation, session management | `dependencies.py` (FastAPI auth dependencies), `session.py` |
| `core/` | **Core Infrastructure** - Configuration, startup logic, MCP client | `config.py` (Settings class), `dependencies.py`, `mcp_client.py`, `nginx_service.py`, `task_manager.py`, `schemas.py` |
| `embeddings/` | **Embedding Providers** - Unified embeddings client abstraction | `client.py` (EmbeddingsClient ABC, SentenceTransformersClient, LiteLLMClient), `README.md` |
| `search/` | **Search Implementation** - Hybrid search (BM25 + vector) | `service.py` (hybrid search implementation) |
| `health/` | **Health Monitoring** - Server health checks, status tracking | `service.py`, `routes.py` |
| `utils/` | **Utilities** - Helper functions, logging, validation | `scopes_manager.py` (scope CRUD) |
| `services/federation/` | **Federation** - External registry synchronization | `anthropic_client.py`, `asor_client.py` (Workday ASOR) |
| `static/` | **Static Assets** - CSS, JavaScript, images for web UI | - |
| `templates/` | **Jinja2 Templates** - HTML templates for web UI | `pages/`, `components/` |
| `scripts/` | **Python Scripts** - Database utilities | `inspect-documentdb.py` |

**Important Root Files:**

| File | Purpose |
|------|---------|
| `pyproject.toml` | Python package configuration, dependencies, pytest settings |
| `docker-compose.yml` | Local development Docker Compose configuration |
| `docker-compose.prod.yml` | Production Docker Compose configuration |
| `.env.example` | Environment variable template with all settings |
| `README.md` | Project overview, quick start guide |
| `CLAUDE.md` | Coding standards and best practices |
| `TEAM.md` | Team roles and personas for development |
| `MAINTENANCE.md` | Maintenance procedures and troubleshooting |
| `WRITING_TESTS.md` | Test writing guidelines and patterns |

**Test Structure (`tests/`):**

| Subdirectory | Purpose | Key Files |
|--------------|---------|-----------|
| `unit/` | **Unit Tests** - Fast, isolated component tests | `api/`, `services/`, `repositories/`, `auth/` |
| `integration/` | **Integration Tests** - Multi-component workflow tests | `test_server_lifecycle.py`, `test_search_integration.py`, `conftest.py` |
| `fixtures/` | **Test Fixtures** - Mock data, factories | `factories.py` (Factory Boy), `mocks/` |
| `reporting/` | **Test Reports** - HTML coverage reports, test results | - |

**Terraform Structure (`terraform/`):**

| Subdirectory | Purpose | Key Files |
|--------------|---------|-----------|
| `aws-ecs/` | **AWS ECS Fargate Deployment** - Production-ready IaC | `main.tf`, `variables.tf`, `outputs.tf`, `ecs.tf` |
| `modules/` | **Reusable Terraform Modules** - Shared infrastructure components | `ecs-service/`, `alb/`, `networking/` |

**Important Configuration Files:**

| File/Directory | Purpose | Location |
|----------------|---------|----------|
| `oauth2_providers.yml` | OAuth provider configurations (Keycloak, Cognito, Entra ID) | `auth_server/` |
| `scopes.yml` | Group-to-scope mappings, UI permissions | `auth_server/` |
| `nginx.conf.template` | Nginx reverse proxy configuration | `config/` |
| `global-bundle.pem` | AWS DocumentDB TLS certificate | Root directory |
| `.env.example` | Environment variables for all services | Root directory |

**Key Entry Points:**

| Component | Entry Point | Purpose |
|-----------|-------------|---------|
| Registry API | `registry/main.py` | FastAPI application for registry and MCP gateway |
| Auth Server | `auth_server/server.py` | Flask OAuth server for authentication |
| Frontend | `frontend/src/App.tsx` | React web UI for administration |
| MCP Client | `cli/mcp_client.py` | CLI tool for calling MCP servers |
| Test Runner | `scripts/test.py` | Unified test execution script |
| Metrics Service | `metrics-service/app/main.py` | OpenTelemetry metrics collection |

## 2. Core Problem Solved

**Transform this chaos:**
- AI agents require separate connections to each MCP server
- Each developer configures VS Code, Cursor, Claude Code individually
- Developers must install and manage MCP servers locally
- No standard authentication flow for enterprise tools
- Scattered API keys and credentials across tools
- No visibility into what tools teams are using
- Security risks from unmanaged tool sprawl
- No dynamic tool discovery for autonomous agents

**Into this organized approach:**
- AI agents connect to one gateway, access multiple MCP servers
- Single configuration point for VS Code, Cursor, Claude Code
- Central IT manages cloud-hosted MCP infrastructure
- Developers use standard OAuth 2LO/3LO flows
- Centralized credential management with secure vault integration
- Complete visibility and audit trail for all tool usage
- Enterprise-grade security with governed tool access
- Dynamic tool discovery and invocation for autonomous workflows

## 3. Architecture Overview

### 3.1. Core Architectural Decision: Reverse Proxy Pattern

The MCP Gateway uses a **reverse proxy architecture** (Nginx-based) rather than an application-layer gateway:

**Key Benefits:**
- **Performance**: Direct proxy routing with minimal overhead (~1-2ms)
- **Protocol Independence**: Can proxy any protocol (HTTP, WebSocket, SSE, gRPC)
- **Scalability**: Each MCP server scales independently
- **Implementation**: Allows Python development while Nginx handles message routing
- **Future-Proof**: Supports A2A (Agent-to-Agent) and other protocols without gateway changes

**Architecture Flow:**
```
AI Agent/Coding Assistant
    ↓ Multiple Endpoints
┌─────────────────┐
│  Nginx Gateway  │
│  /fininfo/      │ ──auth_request──> Auth Server
│  /mcpgw/        │                        │
│  /currenttime/  │ <──auth_headers───────┘
└─────────────────┘
    │ │ │
    │ │ └─── localhost:8003 (currenttime)
    │ └───── localhost:8002 (mcpgw)
    └─────── localhost:8001 (fininfo)
```

**Alternative Considered:**
- Tools Gateway Pattern: Single endpoint with tool aggregation
- Trade-offs: Better developer experience but requires Go/Rust for performance and adds complexity

### 3.2. High-Level Component Architecture

```
┌─────────────────────────────────────┐
│      Human Users / AI Agents        │
└──────────────┬──────────────────────┘
               │
               ↓
┌──────────────────────────────────────┐
│   Identity Provider (Keycloak/      │
│   Cognito/Entra ID) - OAuth 2.0      │
└──────────────┬───────────────────────┘
               │
               ↓
┌──────────────────────────────────────┐
│   MCP Gateway & Registry (EC2/EKS)   │
│   ┌────────────────────────────────┐ │
│   │  NGINX Reverse Proxy Router    │ │
│   └──────┬─────────────────────────┘ │
│          │                            │
│   ┌──────┴─────────┬────────────┐    │
│   │ Auth Server    │ Registry   │    │
│   │ (Dual Auth)    │ Web UI     │    │
│   └────────────────┴────────────┘    │
│                                       │
│   ┌──────────────────────────────┐   │
│   │  Local MCP Servers           │   │
│   │  - MCP Server 1, 2, ...N     │   │
│   └──────────────────────────────┘   │
└──────────────┬───────────────────────┘
               │
               ↓
┌──────────────────────────────────────┐
│  External Systems & Data Sources     │
│  - EKS/EC2 Cluster MCP Servers       │
│  - API Gateway + Lambda Functions    │
│  - Databases, External APIs          │
└──────────────────────────────────────┘
```

### 3.3. Key Architectural Components

**Gateway Layer:**
- **Nginx Reverse Proxy**: Path-based routing, SSL termination, load balancing
- **Auth Server**: Dual authentication (Keycloak/Cognito), token validation, scope enforcement
- **Registry Web UI**: Server management, health monitoring, user administration
- **Registry MCP Server**: Dynamic tool discovery, intelligent tool finder

**Identity & Access:**
- **Keycloak/Cognito/Entra ID**: Primary identity provider (choose one or multi-provider)
- **OAuth 2.0/3.0**: User authentication and authorization
- **JWT Tokens**: Secure, stateless authentication
- **Fine-Grained Access Control**: Scope-based permissions at server, method, and tool levels
- **Enterprise SSO**: SAML/OIDC integration with Microsoft Entra ID for Microsoft 365 environments

**MCP Server Layer:**
- **Local MCP Servers**: Co-located with gateway (SSE transport)
- **Remote MCP Servers**: EKS/EC2 clusters (SSE/Streamable HTTP)
- **Serverless MCP**: API Gateway + Lambda functions

**Observability:**
- **Metrics Service**: Dual-path collection (SQLite + OpenTelemetry)
- **Prometheus**: Time-series metrics storage
- **Grafana**: Real-time dashboards and alerting
- **CloudWatch/Datadog**: Cloud-native monitoring integration

### 3.4. Storage Backend Architecture

**IMPORTANT:** The MCP Gateway & Registry uses a **repository pattern** with multiple storage backends. File-based storage is **LEGACY and DEPRECATED** - use DocumentDB or MongoDB for production deployments.

**Repository Pattern:**
```
Routes → Services → Repositories → Storage Backends
```

**Three Storage Backends:**

1. **File-Based Storage (LEGACY - DEPRECATED)**
   - Status: Maintained for backward compatibility only, will be removed
   - Use Case: Local development and testing ONLY
   - Vector Search: FAISS with in-memory indexing
   - Limitations: Not suitable for production, no high availability, file corruption risks
   - Location: `registry/repositories/file/`

2. **MongoDB Community Edition (Development)**
   - Status: Recommended for local development and testing
   - Use Case: Local Docker development, CI/CD testing
   - Vector Search: Application-level k-NN with BM25 hybrid search
   - Configuration: `STORAGE_BACKEND=mongodb-ce`
   - Connection: `DOCUMENTDB_HOST=localhost:27017`
   - Location: `registry/repositories/documentdb/` (shared with DocumentDB)

3. **Amazon DocumentDB (Production - RECOMMENDED)**
   - Status: Production-ready, enterprise-grade
   - Use Case: AWS production deployments with HA requirements
   - Vector Search: Native HNSW vector search with BM25 hybrid search
   - Configuration: `STORAGE_BACKEND=documentdb`
   - Features: Multi-AZ replication, automatic failover, point-in-time recovery
   - Namespace Support: Multi-tenancy via `DOCUMENTDB_NAMESPACE`
   - Location: `registry/repositories/documentdb/`

**IMPORTANT - Driver Migration (January 2026):**
- **Motor is deprecated** as of May 2026 and has been removed from this project
- **PyMongo 4.15+** now includes built-in `AsyncMongoClient` for async operations
- No separate `motor` package is required - async support is native to `pymongo>=4.15.0`
- All repository implementations use `AsyncMongoClient` directly from pymongo

**Repository Interfaces (Abstract Base Classes):**
- `ServerRepository`: Server registration, listing, metadata management
- `AgentRepository`: A2A agent card management
- `ScopeRepository`: Group and scope CRUD operations
- `SearchRepository`: Hybrid search (BM25 + vector k-NN)

**Factory Pattern:**
```python
from registry.repositories.factory import (
    get_server_repository,
    get_agent_repository,
    get_scope_repository,
    get_search_repository
)

# Automatically selects backend based on STORAGE_BACKEND env var
server_repo = await get_server_repository()
```

**Key Architectural Principles:**
1. **Never access repositories directly from API routes** - Always use service layer
2. **All backends provide identical behavior** - Polymorphism via abstract base classes
3. **Backend switching is transparent** - Factory pattern handles instantiation
4. **Use DocumentDB for production** - File-based storage is deprecated

**Configuration:**
```bash
# Production (DocumentDB)
STORAGE_BACKEND=documentdb
DOCUMENTDB_HOST=docdb-cluster.cluster-xxx.us-east-1.docdb.amazonaws.com
DOCUMENTDB_PORT=27017
DOCUMENTDB_DATABASE=mcp_registry
DOCUMENTDB_NAMESPACE=prod  # For multi-tenancy
DOCUMENTDB_USE_TLS=true
DOCUMENTDB_USE_IAM=true

# Development (MongoDB CE)
STORAGE_BACKEND=mongodb-ce
DOCUMENTDB_HOST=localhost
DOCUMENTDB_PORT=27017

# Legacy (File - DEPRECATED)
STORAGE_BACKEND=file  # NOT RECOMMENDED
```

**Vector Search Comparison:**
- **File Backend (Legacy)**: FAISS IndexFlatIP (cosine similarity)
- **MongoDB CE**: Application-level k-NN with score normalization
- **DocumentDB**: Native HNSW with optimized indexing

**Hybrid Search Strategy:**
All backends support hybrid search combining:
1. **BM25 Text Search**: Keyword matching on server/tool names and descriptions
2. **Vector k-NN Search**: Semantic similarity using embeddings (384-1536 dimensions)
3. **Score Fusion**: Weighted combination (configurable weights)

**References:**
- Design Document: `docs/design/database-abstraction-layer.md`
- Storage Architecture: `docs/design/storage-architecture-mongodb-documentdb.md`
- Repository Interfaces: `registry/repositories/interfaces.py`
- Factory Implementation: `registry/repositories/factory.py`

## 4. Authentication & Authorization

### 4.1. Three-Layer Authentication System

**Layer 1: Ingress Authentication (2LO/M2M)**
- Purpose: Controls who can access the MCP Gateway
- Providers: Keycloak (M2M service accounts), Amazon Cognito (M2M/2LO), Microsoft Entra ID (Azure AD)
- Headers: `X-Authorization`, `X-Client-Id`, `X-Keycloak-Realm`, `X-User-Pool-Id`, `X-Tenant-Id` (Entra ID)
- Methods: Machine-to-Machine (JWT tokens), User sessions (OAuth PKCE), Enterprise SSO (SAML/OIDC)

**Layer 2: Fine-Grained Access Control (FGAC)**
- Purpose: Controls which tools/methods within MCP servers can be accessed
- Based on: User/agent scopes and group memberships
- Validation: Applied at gateway level after ingress auth
- Granularity: Server-level, method-level, individual tool-level

**Layer 3: Egress Authentication (3LO)**
- Purpose: Allows MCP servers to act on user's behalf with external services
- Providers: Atlassian, Google, GitHub, Microsoft, custom OAuth providers
- Headers: `Authorization`, provider-specific headers (e.g., `X-Atlassian-Cloud-Id`)
- Validation: MCP server validates with its IdP

### 4.2. Dual Token System

AI agents carry BOTH ingress and egress tokens:

```json
{
  "headers": {
    // Ingress Authentication (for Gateway) - Keycloak
    "X-Authorization": "Bearer {keycloak_jwt_token}",
    "X-Client-Id": "{agent_client_id}",
    "X-Keycloak-Realm": "mcp-gateway",
    "X-Keycloak-URL": "http://localhost:8080",

    // OR Cognito
    "X-Authorization": "Bearer {cognito_jwt_token}",
    "X-User-Pool-Id": "{cognito_user_pool_id}",
    "X-Client-Id": "{cognito_client_id}",
    "X-Region": "{aws_region}",

    // Egress Authentication (for MCP Server) - Example: Atlassian
    "Authorization": "Bearer {atlassian_oauth_token}",
    "X-Atlassian-Cloud-Id": "{atlassian_cloud_id}"
  }
}
```

### 4.3. Complete Authentication Flow

```
1. One-Time Setup:
   User → Keycloak/Cognito (2LO) → Ingress Token
   User → External IdP (3LO, consent) → Egress Token
   User → Agent Configuration (both tokens)

2. Runtime (Every Request):
   Agent → Gateway (dual tokens)
   Gateway → Keycloak/Cognito (validate ingress)
   Gateway → Apply FGAC (check permissions)
   Gateway → MCP Server (forward egress token)
   MCP Server → External IdP (validate egress)
   MCP Server → Response (via Gateway)
```

### 4.4. Fine-Grained Access Control (FGAC)

**Scope Types:**
- **UI Scopes**: Registry management permissions
  - `mcp-registry-admin`: Full administrative access
  - `mcp-registry-user`: Limited user access
  - `mcp-registry-developer`: Service registration and management
  - `mcp-registry-operator`: Operational access without registration

- **Server Scopes**: MCP server access
  - `mcp-servers-unrestricted/read`: Read all servers
  - `mcp-servers-unrestricted/execute`: Execute all tools
  - `mcp-servers-restricted/read`: Limited read access
  - `mcp-servers-restricted/execute`: Limited execute access

**Methods vs Tools:**
- **MCP Methods**: Protocol operations (`initialize`, `tools/list`, `tools/call`)
- **Individual Tools**: Specific functions within servers

**Example Access Control:**
```yaml
# User can list tools but only execute specific ones
mcp-servers-restricted/execute:
  - server: fininfo
    methods:
      - tools/list        # Can list all tools
      - tools/call        # Can call tools
    tools:
      - get_stock_aggregates   # But only these specific tools
      - print_stock_data
```

**Validation Logic:**
1. Input Validation: Validate server name, method, tool name, user scopes
2. Scope Iteration: Check each user scope for matching permissions
3. Server Matching: Find server configurations that match the requested server
4. Method Validation: Check if the requested method is allowed
5. Tool Validation: For `tools/call`, validate specific tool permissions
6. Access Decision: Grant access if any scope allows the operation

**Group Mappings:**
```yaml
group_mappings:
  mcp-registry-admin:
    - mcp-registry-admin                    # UI permissions
    - mcp-servers-unrestricted/read         # Server read access
    - mcp-servers-unrestricted/execute      # Server execute access

  mcp-registry-user:
    - mcp-registry-user                     # Limited UI permissions
    - mcp-servers-restricted/read           # Limited server access
```

**Note**: All group names and scope names are completely customizable by administrators. Names must be configured consistently in both the Identity Provider (IdP) and `scopes.yml` configuration file.

## 4.5. Agent-to-Agent (A2A) Protocol Integration

The MCP Gateway supports Agent-to-Agent (A2A) communication, enabling AI agents to securely register themselves and their capabilities with the central registry, creating a self-managed agent ecosystem.

### 4.5.1. A2A Agent Architecture

```
Agent Application (AI Code)
    ↓ M2M Token (Keycloak Service Account)
┌─────────────────────────────────────┐
│  Agent Registry API (/api/agents)   │
│  - POST /api/agents/register        │
│  - GET /api/agents                  │
│  - GET /api/agents/{path}           │
│  - PUT /api/agents/{path}           │
│  - DELETE /api/agents/{path}        │
│  - POST /api/agents/{path}/toggle   │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│  Agent State Management             │
│  - registry/agents/agent_state.json │
│  - registry/agents/{name}.json      │
└─────────────────────────────────────┘
```

### 4.5.2. Agent Registration Flow

**Step 1: Agent Authentication**
- Agent obtains M2M token from Keycloak service account
- Tokens expire in 5 minutes and must be regenerated before use
- Token validation includes expiration checks via JWT payload decoding

**Step 2: Agent Registration**
- Agent calls POST `/api/agents/register` with:
  - Agent metadata (name, description, version)
  - Protocol version (e.g., "1.0")
  - Agent skills/capabilities (MCP tools provided by agent)
  - Security configuration (bearer tokens, oauth)
  - Visibility settings (public/private)
  - Trust level (verified/unverified)

**Step 3: Agent Access Control**
- Agent permissions defined in `auth_server/scopes.yml`
- Three-tier structure:
  1. **UI-Scopes**: Agent registry permissions (list_agents, get_agent, publish_agent, modify_agent, delete_agent)
  2. **Group Mappings**: Maps Keycloak groups to scope names
  3. **Individual group scopes**: Detailed agent and MCP server access

**Step 4: Agent CRUD Operations**
- CREATE: Register new agent with skills
- READ: Retrieve agent metadata and capabilities
- UPDATE: Modify agent description, tags, skills
- DELETE: Remove agent from registry
- TOGGLE: Enable/disable agent availability

### 4.5.3. Agent Access Control Example

**Scopes Configuration (auth_server/scopes.yml):**
```yaml
UI-Scopes:
  mcp-registry-admin:
    list_agents:
      - all              # Admin sees all agents
    get_agent:
      - all
    publish_agent:
      - all
    modify_agent:
      - all
    delete_agent:
      - all

  registry-users-lob1:
    list_agents:
      - /code-reviewer    # LOB1 sees specific agents
      - /test-automation
    get_agent:
      - /code-reviewer
      - /test-automation

group_mappings:
  mcp-registry-admin:
    - mcp-registry-admin
  registry-users-lob1:
    - registry-users-lob1
```

**Agent Permissions Table:**
```
Agent | Group | Can List | Can Get | Can Publish | Can Modify | Can Delete
------|-------|----------|---------|-------------|------------|----------
admin | admin | all      | all     | all         | all        | all
lob1  | lob1  | 2 agents | 2       | own agents  | own        | own agents
lob2  | lob2  | 2 agents | 2       | own agents  | own        | own agents
```

### 4.5.4. Agent State Management

**Agent State File (registry/agents/agent_state.json):**
```json
{
  "agents": {
    "/code-reviewer": {
      "path": "/code-reviewer",
      "name": "Code Reviewer Agent",
      "enabled": true,
      "registered_at": "2024-11-09T14:45:00Z",
      "last_modified": "2024-11-09T14:50:00Z"
    },
    "/data-analysis": {
      "path": "/data-analysis",
      "name": "Data Analysis Agent",
      "enabled": true,
      "registered_at": "2024-11-09T15:00:00Z"
    }
  }
}
```

**Individual Agent File (registry/agents/code-reviewer.json):**
```json
{
  "protocol_version": "1.0",
  "name": "Code Reviewer Agent",
  "description": "Reviews code for quality and best practices",
  "path": "/code-reviewer",
  "url": "https://agent.example.com",
  "skills": [
    {
      "id": "review-python",
      "name": "Python Code Review",
      "description": "Reviews Python code",
      "parameters": {
        "code_snippet": {"type": "string"}
      }
    }
  ],
  "security": ["bearer"],
  "tags": ["code-review", "qa"],
  "visibility": "public",
  "trust_level": "verified"
}
```

### 4.5.5. Agent CLI & Testing

**Agent CRUD Test Script (tests/agent_crud_test.sh):**
- Demonstrates all CRUD operations (create, read, update, delete, toggle)
- Includes token validation with JWT expiration checking
- Tests agent state persistence
- Verifies agent re-registration after deletion
- Supports custom token paths and environment variables

**Usage:**
```bash
# Generate fresh credentials
./credentials-provider/generate_creds.sh

# Run CRUD tests with default token
bash tests/agent_crud_test.sh

# Run with custom token path
bash tests/agent_crud_test.sh /path/to/token.json

# Run with environment variable
TOKEN_FILE=/path/to/token.json bash tests/agent_crud_test.sh
```

### 4.5.6. Access Control Testing

**LOB Bot Access Control Tests (tests/run-lob-bot-tests.sh):**
- Tests MCP service access permissions (Tests 1-6)
- Tests agent registry API permissions (Tests 7-14)
- Validates bot-specific agent visibility
- Ensures agents can only access permitted agents
- Confirms admin sees all agents

**Test Coverage:**
```
Part 1: MCP Service Access (6 tests)
- Tests 1-6: Verify bots can only call permitted MCP services

Part 2: Agent Registry API (8 tests)
- Tests 7-9: LOB1 agent access control
- Tests 10-12: LOB2 agent access control
- Tests 13-14: Admin agent access (see all)
```

**Running Access Control Tests:**
```bash
# Generate tokens for all bots
./keycloak/setup/generate-agent-token.sh admin-bot
./keycloak/setup/generate-agent-token.sh lob1-bot
./keycloak/setup/generate-agent-token.sh lob2-bot

# Run 14 comprehensive tests
bash tests/run-lob-bot-tests.sh
```

### 4.5.7. Code Structure for A2A Agent Management

**CLI Module (cli/agent_mgmt.py):**
- Agent registration and lifecycle management
- CRUD operations on agent metadata
- Argument validation and error handling
- Structured logging and status reporting

**Key Functions:**
- `register_agent()`: Register new agent in registry
- `get_agent()`: Retrieve agent metadata
- `update_agent()`: Modify agent settings
- `delete_agent()`: Remove agent from registry
- `toggle_agent()`: Enable/disable agent
- `list_agents()`: Get agents filtered by permissions

**API Routes (registry/api/agent_routes.py):**
- Implements Agent Registry REST API endpoints
- Access control enforcement via scopes
- Token validation and authentication
- Agent state persistence and management

**Data Models (registry/models/):**
- Agent schema validation
- Skill/capability definitions
- Security configuration models
- State tracking models

**Implementation Notes:**
- JWT token validation with expiration checks (5-minute TTL)
- Base64 padding for JWT payload decoding
- Proper HTTP status codes (200, 201, 204, 400, 403, 404)
- Comprehensive error messages for debugging
- Agent state file updates on registration/deletion
- File-based persistence for agent metadata

## 5. Dynamic Tool Discovery

### 5.1. Overview

Traditional AI agents are limited to pre-configured tools. Dynamic Tool Discovery enables agents to:
1. Discover new tools through natural language queries
2. Automatically find relevant tools from hundreds of MCP servers
3. Dynamically invoke discovered tools without prior configuration
4. Expand capabilities on-demand based on user requests

### 5.2. How It Works

```
1. Natural Language Query → Agent receives user request
2. Semantic Search → intelligent_tool_finder uses sentence transformers
3. FAISS Index Search → Searches embeddings of all registered tools
4. Relevance Ranking → Returns tools ranked by semantic similarity
5. Tool Invocation → Agent uses invoke_mcp_tool with discovered info
```

### 5.3. Architecture Components

**IMPORTANT:** Vector search architecture depends on storage backend:
- **File Backend (LEGACY - DEPRECATED)**: Uses FAISS IndexFlatIP
- **MongoDB CE/DocumentDB (PRODUCTION)**: Uses hybrid search (BM25 + native vector k-NN)

**Discovery Layer (Modern - DocumentDB/MongoDB):**
- **Embedding Providers**: Flexible provider selection (sentence-transformers, OpenAI, LiteLLM with 100+ models)
- **BM25 Text Search**: Keyword matching on server/tool names and descriptions
- **Vector k-NN Search**: Semantic similarity using embeddings (384-1536 dimensions)
- **Hybrid Search**: Weighted combination of BM25 and vector search results
- **Native Vector Indexing**: DocumentDB HNSW or MongoDB application-level k-NN
- **Tool Metadata**: Server information, tool schemas, descriptions, embeddings

**Discovery Layer (Legacy - File Backend):**
- **FAISS Index**: In-memory vector similarity search (DEPRECATED)
- **Sentence Transformer**: all-MiniLM-L6-v2 model (384 dimensions) only
- **Cosine Similarity**: IndexFlatIP for vector search
- **Limitations**: File-based, no hybrid search, single embedding provider

**Embedding Provider Options:**
1. **Sentence Transformers (Local)**: Default all-MiniLM-L6-v2 (384 dimensions), runs locally
2. **OpenAI Embeddings**: text-embedding-ada-002 (1536 dimensions), requires API key
3. **LiteLLM**: 100+ embedding models via unified interface (OpenAI, Cohere, Bedrock, etc.)
4. **Amazon Bedrock Titan**: titan-embed-text-v2:0 (1024 dimensions), native AWS integration

**Hybrid Search Strategy (DocumentDB/MongoDB):**
```
Final Score = (BM25_Weight × BM25_Score) + (Vector_Weight × Vector_Score)

Default weights: BM25_Weight=0.3, Vector_Weight=0.7
```

**Key Technologies:**
- DocumentDB/MongoDB (native vector search with HNSW indexing)
- BM25 algorithm for text matching
- Multiple embedding providers (sentence-transformers, OpenAI, LiteLLM, Bedrock)
- Hybrid scoring with configurable weights
- MCP Protocol
- FAISS (legacy file backend only - DEPRECATED)

### 5.4. Usage Patterns

**Pattern 1: Direct Developer Usage**
```python
# Discover tools
tools = await intelligent_tool_finder(
    natural_language_query="what time is it in Tokyo",
    session_cookie="your_session_cookie_here"
)

# Use discovered tool
result = await invoke_mcp_tool(
    mcp_registry_url="https://registry.com/mcpgw/sse",
    server_name=tools[0]["service_path"],
    tool_name=tools[0]["tool_name"],
    arguments={"tz_name": "Asia/Tokyo"},
    auth_token=auth_token,
    ...
)
```

**Pattern 2: Agent Integration (Autonomous)**
```python
# Agent has access to both tools as available capabilities
# 1. intelligent_tool_finder - discovers tools
# 2. invoke_mcp_tool - executes discovered tools

# Agent autonomously:
# - Identifies need for specialized tool
# - Calls intelligent_tool_finder with description
# - Receives tool information and usage instructions
# - Calls invoke_mcp_tool with discovered tool details
```

### 5.5. API Reference

**intelligent_tool_finder**

Parameters:
- `natural_language_query` (str, required): Query describing the task
- `username` (str, optional): Username for authentication
- `password` (str, optional): Password for authentication
- `session_cookie` (str, optional): Session cookie for authentication
- `top_k_services` (int, optional): Number of top services to consider (default: 3)
- `top_n_tools` (int, optional): Number of best matching tools to return (default: 1)

Returns:
```python
[
    {
        "tool_name": "current_time_by_timezone",
        "tool_parsed_description": {
            "main": "Get current time for a specific timezone",
            "parameters": {...}
        },
        "tool_schema": {...},
        "service_path": "/currenttime",
        "service_name": "Current Time Server",
        "overall_similarity_score": 0.89
    }
]
```

### 5.6. Implementation Details

**FAISS Index Creation:**
1. Tool Metadata Collection: Gathers descriptions, schemas, server info
2. Text Embedding: Creates vector embeddings using sentence transformers
3. Index Building: Constructs FAISS index for fast similarity search
4. Automatic Updates: Refreshes index when servers are added/modified

**Semantic Search Process:**
1. Embed the natural language query
2. Search FAISS for top_k_services
3. Collect tools from top services
4. Embed all candidate tool descriptions
5. Calculate cosine similarity and rank

**Performance Optimizations:**
- Lazy Loading: FAISS index and models loaded on-demand
- Caching: Embeddings and metadata cached
- Async Processing: Embedding operations in separate threads
- Memory Efficiency: Float32 precision for embeddings

## 6. Registry API & Management

### 6.1. Registry REST API

**Authentication Required**: Session cookie obtained via `/login` endpoint

**Core Endpoints:**
- `GET /login` - Display login form
- `POST /login` - Authenticate user, create session cookie (required first step)
- `POST /logout` - Invalidate session
- `POST /register` - Register new MCP service
- `POST /toggle/{service_path}` - Enable/disable service
- `POST /edit/{service_path}` - Update service details
- `GET /api/server_details/{service_path}` - Get service details
- `GET /api/tools/{service_path}` - Get service tools
- `POST /api/refresh/{service_path}` - Trigger health check/tool discovery
- `WebSocket /ws/health_status` - Real-time health status updates

**Registration Parameters:**
- `name`: Display name
- `description`: Service description
- `path`: URL path (e.g., `/weather`)
- `proxy_pass_url`: Backend URL
- `tags`: Comma-separated tags
- `num_tools`: Number of tools
- `num_stars`: Star rating
- `is_python`: Python-based flag
- `license`: License information

### 6.2. Anthropic MCP Registry API Compatibility

**Full compatibility** with Anthropic's MCP Registry REST API specification (v0.1):

**Endpoints:**
- `GET /v0.1/servers` - List all servers (with pagination)
- `GET /v0.1/servers/{server_name}/versions` - List server versions
- `GET /v0.1/servers/{server_name}/versions/{version}` - Get version details

**Authentication**: JWT Bearer token (short-lived, typically 5-15 minutes)

**Token Generation:**
1. Login to Registry Web Interface
2. Generate JWT Token from UI
3. Tokens stored in `.oauth-tokens/mcp-registry-api-tokens-YYYY-MM-DD.json`
4. Use Bearer token in Authorization header

**Example Usage:**
```bash
ACCESS_TOKEN=$(cat token-file.json | jq -r '.tokens.access_token')

# List servers
curl -X GET "http://localhost/v0.1/servers?limit=10" \
  -H "Authorization: Bearer $ACCESS_TOKEN"

# Get server versions
curl -X GET "http://localhost/v0.1/servers/io.mcpgateway%2Fatlassian/versions" \
  -H "Authorization: Bearer $ACCESS_TOKEN"

# Get server details
curl -X GET "http://localhost/v0.1/servers/io.mcpgateway%2Fatlassian/versions/latest" \
  -H "Authorization: Bearer $ACCESS_TOKEN"
```

**Import from Anthropic Registry:**
```bash
# Use federation API to import servers from Anthropic Registry
curl -X POST "http://localhost:7860/api/federation/sync?source=anthropic" \
  -H "Authorization: Bearer $TOKEN"
```

### 6.3. Service Management CLI

**Location**: `cli/registry_cli_wrapper.py`

**Key Commands:**
```bash
# List all servers (Anthropic API format)
uv run python cli/registry_cli_wrapper.py anthropic list

# Service management
uv run python cli/registry_cli_wrapper.py service list
uv run python cli/registry_cli_wrapper.py service add config.json
uv run python cli/registry_cli_wrapper.py service delete /server-path

# Create group
uv run python cli/registry_cli_wrapper.py group create \
  --name mcp-servers-finance \
  --description "Finance team servers"

# User management
uv run python cli/registry_cli_wrapper.py user create-human \
  --username john.doe \
  --email john.doe@example.com \
  --first-name John \
  --last-name Doe \
  --groups mcp-servers-finance \
  --password secure-password

# List groups
uv run python cli/registry_cli_wrapper.py group list
```

### 6.4. Rating System

**Community-driven quality assessment** for both MCP servers and AI agents using a 5-star rating system.

**Key Features:**
- **5-Star Rating Scale**: Users rate servers/agents from 1 to 5 stars
- **Interactive UI Widget**: Visual star rating interface in the web dashboard
- **API Support**: Submit ratings via REST API endpoints
- **Aggregate Ratings**: Weighted average with individual rating details
- **One Rating Per User**: Users can update their rating, but only one rating per entity
- **Rotating Buffer**: Maximum 100 ratings per entity (FIFO replacement)
- **Anonymous Tracking**: Ratings linked to username but not publicly displayed
- **Real-time Updates**: Aggregate rating updates immediately after submission

**Rating API Endpoints:**

```bash
# Submit a rating for a server
POST /api/v2/servers/{server_path}/rating
{
  "rating": 5,  # 1-5 stars
  "username": "john.doe"
}

# Submit a rating for an agent
POST /api/v2/agents/{agent_path}/rating
{
  "rating": 4,
  "username": "jane.smith"
}

# Get server with rating details
GET /api/v2/servers/{server_path}
# Returns:
{
  "server_path": "/example-server",
  "name": "Example Server",
  "aggregate_rating": 4.5,  # Weighted average
  "rating_count": 42,
  "rating_details": [
    {"user": "john.doe", "rating": 5},
    {"user": "jane.smith", "rating": 4}
    # ... up to 100 ratings
  ]
}

# Get agent with rating details
GET /api/v2/agents/{agent_path}
# Similar structure with aggregate_rating, rating_count, rating_details
```

**API Rating Submission:**
```bash
# Rate a server via API
curl -X POST "http://localhost:7860/api/v2/servers/{server_path}/rate" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"rating": 5, "username": "john.doe"}'

# Rate an agent via API
curl -X POST "http://localhost:7860/api/v2/agents/{agent_path}/rate" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"rating": 4, "username": "jane.smith"}'
```

**Rating Logic:**
- **Validation**: Ratings must be integers between 1 and 5 (inclusive)
- **Update Behavior**: Submitting a new rating for the same user updates their existing rating
- **Rotating Buffer**: Once 100 ratings are reached, oldest ratings are removed (FIFO)
- **Aggregate Calculation**: Simple arithmetic mean of all ratings
- **Service Location**: `registry/services/rating_service.py`

**Web UI Integration:**
- Interactive 5-star widget on server/agent detail pages
- Visual feedback showing aggregate rating and count
- Click to submit rating (requires authentication)
- Real-time update on submission

**Use Cases:**
- **Quality Discovery**: Find highly-rated servers/agents for specific tasks
- **Community Feedback**: Share experiences with tools and agents
- **Filtering**: Sort search results by rating
- **Reputation**: Build trust in community-contributed servers/agents

## 7. Configuration & Setup

### 7.1. Main Environment Configuration

**File**: `.env` (Project root)

**Core Variables:**
- `REGISTRY_URL`: Public URL of registry
- `ADMIN_USER`, `ADMIN_PASSWORD`: Registry admin credentials
- `AUTH_PROVIDER`: `keycloak` or `cognito`
- `AWS_REGION`: AWS region for services

**Keycloak Configuration (if AUTH_PROVIDER=keycloak):**
- `KEYCLOAK_URL`: Internal URL (`http://keycloak:8080`)
- `KEYCLOAK_EXTERNAL_URL`: External URL for browser access
- `KEYCLOAK_REALM`: Realm name (`mcp-gateway`)
- `KEYCLOAK_ADMIN`, `KEYCLOAK_ADMIN_PASSWORD`: Admin credentials
- `KEYCLOAK_CLIENT_ID`, `KEYCLOAK_CLIENT_SECRET`: Web client credentials (auto-generated)
- `KEYCLOAK_M2M_CLIENT_ID`, `KEYCLOAK_M2M_CLIENT_SECRET`: M2M credentials (auto-generated)

**Cognito Configuration (if AUTH_PROVIDER=cognito):**
- `COGNITO_USER_POOL_ID`: User Pool ID
- `COGNITO_CLIENT_ID`: App Client ID
- `COGNITO_CLIENT_SECRET`: App Client Secret
- `COGNITO_DOMAIN`: Cognito domain (optional)

**Getting Keycloak Credentials:**
```bash
# Initialize Keycloak and generate credentials
cd keycloak/setup
./init-keycloak.sh

# Retrieve existing credentials
./get-all-client-credentials.sh
```

### 7.2. OAuth Environment Configuration

**File**: `credentials-provider/oauth/.env`

**Ingress Authentication:**
```bash
# Keycloak
KEYCLOAK_URL=https://mcpgateway.ddns.net
KEYCLOAK_REALM=mcp-gateway
KEYCLOAK_M2M_CLIENT_ID=mcp-gateway-m2m
KEYCLOAK_M2M_CLIENT_SECRET=ZJqbsamnQs79hbUbkJLB...

# OR Cognito
INGRESS_OAUTH_USER_POOL_ID=us-east-1_vm1115QSU
INGRESS_OAUTH_CLIENT_ID=5v2rav1v93...
INGRESS_OAUTH_CLIENT_SECRET=1i888fnolv6k5sa1b8s5k839pdm...
```

**Egress Authentication (Multiple Providers):**
```bash
# Pattern: EGRESS_OAUTH_CLIENT_ID_N, EGRESS_OAUTH_CLIENT_SECRET_N
EGRESS_OAUTH_CLIENT_ID_1=your_atlassian_client_id
EGRESS_OAUTH_CLIENT_SECRET_1=your_atlassian_client_secret
EGRESS_OAUTH_REDIRECT_URI_1=http://localhost:8080/callback
EGRESS_PROVIDER_NAME_1=atlassian
EGRESS_MCP_SERVER_NAME_1=atlassian
```

### 7.3. OAuth Providers Configuration

**File**: `auth_server/oauth2_providers.yml`

**Keycloak Provider:**
```yaml
keycloak:
  display_name: "Keycloak"
  client_id: "${KEYCLOAK_CLIENT_ID}"
  client_secret: "${KEYCLOAK_CLIENT_SECRET}"
  auth_url: "${KEYCLOAK_URL}/realms/${KEYCLOAK_REALM}/protocol/openid-connect/auth"
  token_url: "${KEYCLOAK_URL}/realms/${KEYCLOAK_REALM}/protocol/openid-connect/token"
  user_info_url: "${KEYCLOAK_URL}/realms/${KEYCLOAK_REALM}/protocol/openid-connect/userinfo"
  logout_url: "${KEYCLOAK_URL}/realms/${KEYCLOAK_REALM}/protocol/openid-connect/logout"
  scopes: ["openid", "email", "profile"]
  groups_claim: "groups"
  enabled: true
```

**Amazon Cognito Provider:**
```yaml
cognito:
  display_name: "Amazon Cognito"
  client_id: "${COGNITO_CLIENT_ID}"
  client_secret: "${COGNITO_CLIENT_SECRET}"
  auth_url: "https://${COGNITO_DOMAIN}.auth.${AWS_REGION}.amazoncognito.com/oauth2/authorize"
  token_url: "https://${COGNITO_DOMAIN}.auth.${AWS_REGION}.amazoncognito.com/oauth2/token"
  user_info_url: "https://${COGNITO_DOMAIN}.auth.${AWS_REGION}.amazoncognito.com/oauth2/userInfo"
  logout_url: "https://${COGNITO_DOMAIN}.auth.${AWS_REGION}.amazoncognito.com/logout"
  scopes: ["openid", "email", "profile"]
  groups_claim: "cognito:groups"
  enabled: true
```

**Microsoft Entra ID (Azure AD) Provider:**
```yaml
entra_id:
  display_name: "Microsoft Entra ID"
  client_id: "${ENTRA_CLIENT_ID}"
  client_secret: "${ENTRA_CLIENT_SECRET}"
  tenant_id: "${ENTRA_TENANT_ID}"
  auth_url: "https://login.microsoftonline.com/${ENTRA_TENANT_ID}/oauth2/v2.0/authorize"
  token_url: "https://login.microsoftonline.com/${ENTRA_TENANT_ID}/oauth2/v2.0/token"
  user_info_url: "https://graph.microsoft.com/v1.0/me"
  logout_url: "https://login.microsoftonline.com/${ENTRA_TENANT_ID}/oauth2/v2.0/logout"
  scopes: ["openid", "email", "profile", "User.Read"]
  groups_claim: "groups"
  enabled: true
  # Enterprise features
  conditional_access: true  # Support for conditional access policies
  mfa_enabled: true  # Multi-factor authentication
  microsoft_365_integration: true  # Integration with M365 environments
```

**References:**
- Entra ID Setup Guide: `docs/entra-id-setup.md`
- Keycloak Integration Guide: `docs/keycloak-integration.md`
- Cognito Setup Guide: `docs/cognito.md`

### 7.4. Scopes Configuration

**File**: `auth_server/scopes.yml`

**Group Mappings:**
```yaml
group_mappings:
  mcp-registry-admin:
    - mcp-registry-admin
    - mcp-servers-unrestricted/read
    - mcp-servers-unrestricted/execute

  mcp-registry-user:
    - mcp-registry-user
    - mcp-servers-restricted/read
```

**UI Scopes:**
```yaml
UI-Scopes:
  mcp-registry-admin:
    list_service: [all]
    register_service: [all]
    health_check_service: [all]
    toggle_service: [all]
    modify_service: [all]
```

**Server Scopes:**
```yaml
mcp-servers-restricted/execute:
  - server: fininfo
    methods:
      - initialize
      - tools/list
      - tools/call
    tools:
      - get_stock_aggregates
      - print_stock_data
```

### 7.5. Credential Generation

**Quick Start:**
```bash
# Configure environment
cp .env.example .env
cp credentials-provider/oauth/.env.example credentials-provider/oauth/.env
# Edit both .env files with your credentials

# Generate all credentials
./credentials-provider/generate_creds.sh

# Available options:
# --all              # Run all authentication flows (default)
# --ingress-only     # Only MCP Gateway authentication
# --egress-only      # Only external provider authentication
# --agentcore-only   # Only AgentCore token generation
# --keycloak-only    # Only Keycloak token generation
# --provider google  # Specify provider for egress auth
# --verbose          # Enable debug logging
```

**Generated Configuration Files:**
- `.oauth-tokens/vscode_mcp.json` - VS Code MCP configuration
- `.oauth-tokens/mcp.json` - Roocode/Claude Code configuration
- `.oauth-tokens/ingress.json` - Ingress tokens
- `.oauth-tokens/egress.json` - Egress tokens
- `.oauth-tokens/agent-{name}-m2m-token.json` - Agent-specific tokens

### 7.6. Keycloak Setup

**Initial Setup:**
```bash
cd keycloak/setup
./init-keycloak.sh
```

**This creates:**
- `mcp-gateway` realm
- Web and M2M clients with configurations
- Required groups (`mcp-servers-unrestricted`, `mcp-servers-restricted`)
- Group mappers for JWT token claims
- Initial admin and test users

**Service Account Management:**
```bash
# Create individual agent service account
./setup-agent-service-account.sh --agent-id sre-agent --group mcp-servers-unrestricted

# Create shared M2M service account
./setup-m2m-service-account.sh
```

**Token Generation:**
```bash
# Generate M2M token for ingress
uv run python credentials-provider/token_refresher.py

# Generate agent-specific token
uv run python credentials-provider/token_refresher.py --agent-id sre-agent
```

## 8. Observability & Monitoring

### 8.1. Dual-Path Metrics System

**Architecture:**
```
Auth Server Middleware → Metrics Service API → Dual Path:
                                               ├─> SQLite Database (detailed storage)
                                               └─> OpenTelemetry (Prometheus/Grafana)
```

**Database Tables:**
- `auth_metrics`: Authentication requests and validation
- `tool_metrics`: Tool execution details (calls, methods, client info)
- `discovery_metrics`: Tool discovery/search queries
- `metrics`: Raw metrics data (all types)
- `api_keys`: API key management for metrics service

### 8.2. Accessing SQLite Metrics

**Connect to Database:**
```bash
# Via container
docker compose exec metrics-db sh
sqlite3 /var/lib/sqlite/metrics.db

# Or copy locally
docker compose cp metrics-db:/var/lib/sqlite/metrics.db ./metrics.db
sqlite3 ./metrics.db
```

**Sample Queries:**

**Authentication Success Rate:**
```sql
SELECT
    server,
    COUNT(*) as total,
    SUM(success) as successful,
    ROUND(100.0 * SUM(success) / COUNT(*), 2) as success_pct,
    ROUND(AVG(duration_ms), 2) as avg_ms
FROM auth_metrics
GROUP BY server
ORDER BY total DESC;
```

**Tool Usage Summary:**
```sql
SELECT
    tool_name,
    COUNT(*) as calls,
    SUM(success) as successful,
    ROUND(AVG(duration_ms), 2) as avg_ms,
    COUNT(DISTINCT client_name) as unique_clients
FROM tool_metrics
GROUP BY tool_name
ORDER BY calls DESC;
```

**Slowest Tool Executions:**
```sql
SELECT
    tool_name,
    server_name,
    ROUND(duration_ms, 2) as duration_ms,
    datetime(timestamp) as time,
    success
FROM tool_metrics
ORDER BY duration_ms DESC
LIMIT 20;
```

### 8.3. OpenTelemetry Metrics

**Prometheus Endpoint**: `http://localhost:9465/metrics`

**Available Metrics:**
- `mcp_auth_requests_total` - Counter of authentication requests
- `mcp_auth_request_duration_seconds` - Histogram of auth request durations
- `mcp_tool_executions_total` - Counter of tool executions
- `mcp_tool_execution_duration_seconds` - Histogram of tool execution durations
- `mcp_tool_discovery_total` - Counter of discovery requests
- `mcp_tool_discovery_duration_seconds` - Histogram of discovery durations
- `mcp_protocol_latency_seconds` - Histogram of protocol flow latencies
- `mcp_health_checks_total` - Counter of health checks

**OTLP Export Configuration:**
```bash
# In .env
OTEL_OTLP_ENDPOINT=http://otel-collector:4318
```

### 8.4. OpenTelemetry Collector Setup

**Add to docker-compose.yml:**
```yaml
otel-collector:
  image: otel/opentelemetry-collector-contrib:latest
  command: ["--config=/etc/otel-collector-config.yaml"]
  volumes:
    - ./config/otel-collector-config.yaml:/etc/otel-collector-config.yaml
  ports:
    - "4318:4318"   # OTLP HTTP receiver
    - "4317:4317"   # OTLP gRPC receiver
    - "8889:8889"   # Prometheus exporter metrics
  restart: unless-stopped
```

**Basic Configuration (config/otel-collector-config.yaml):**
```yaml
receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 10s
    send_batch_size: 1024

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
    namespace: mcp_gateway

  logging:
    loglevel: info

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus, logging]
```

**Cloud Backends:**
- **AWS CloudWatch**: `awscloudwatch` exporter
- **Datadog**: `datadog` exporter with API key
- **New Relic**: `otlphttp/newrelic` with license key
- **Grafana Cloud**: `otlphttp/grafanacloud` with auth
- **Honeycomb**: `otlphttp/honeycomb` with API key

### 8.5. Grafana Dashboards

**Access**: `http://localhost:3000` (admin/admin)

**Pre-configured Dashboards:**
1. **Authentication Metrics**: Success rates, request volume, error codes, response times
2. **Tool Execution Metrics**: Most used tools, client distribution, success rates, performance trends
3. **Discovery Metrics**: Search query volume, result counts, performance breakdown
4. **System Health**: Overall request volume, error rates, performance percentiles (p50, p95, p99)

**Sample PromQL Queries:**
```promql
# Authentication success rate
rate(mcp_auth_requests_total{success="true"}[5m]) / rate(mcp_auth_requests_total[5m])

# Average tool execution duration by server
rate(mcp_tool_execution_duration_seconds_sum[5m]) / rate(mcp_tool_execution_duration_seconds_count[5m])

# Top 5 most used tools
topk(5, sum by (tool_name) (rate(mcp_tool_executions_total[5m])))

# 95th percentile request duration
histogram_quantile(0.95, rate(mcp_auth_request_duration_seconds_bucket[5m]))
```

### 8.6. Monitoring Best Practices

**Key Metrics to Monitor:**
- Authentication Success Rate: >95%
- Tool Execution Success Rate: >90%
- Average Response Time: <100ms (auth), <500ms (tools)
- Error Rate: <5%
- Discovery Query Performance: <50ms (embedding time)

**Alert Configuration:**
- Authentication failure rate >10%
- Tool execution errors >5%
- Response time p95 >1000ms
- Discovery query failures

**Data Retention:**
- SQLite database: 90 days (configurable via `METRICS_RETENTION_DAYS`)
- Prometheus: 200 hours (configurable in `prometheus.yml`)

## 9. Installation & Deployment

### 9.1. Quick Start (5 Minutes)

```bash
# 1. Clone and setup
git clone https://github.com/agentic-community/mcp-gateway-registry.git
cd mcp-gateway-registry

# 2. Configure environment
cp .env.example .env
# Edit .env with your credentials

# 3. Generate authentication credentials
./credentials-provider/generate_creds.sh

# 4. Install prerequisites
curl -LsSf https://astral.sh/uv/install.sh | sh
sudo apt-get update && sudo apt-get install -y docker.io docker-compose

# 5. Deploy
./build_and_run.sh

# 6. Access registry
open http://localhost:7860
```

### 9.2. Pre-built Images (Instant Setup)

**Benefits:** No build time, no Node.js required, no frontend compilation, consistent tested images

```bash
# Step 1: Clone and setup
git clone https://github.com/agentic-community/mcp-gateway-registry.git
cd mcp-gateway-registry
cp .env.example .env

# Step 2: (Optional) Download local embeddings model
# Skip if using cloud APIs (OpenAI, Bedrock) - configure EMBEDDINGS_PROVIDER=litellm in .env instead
hf download sentence-transformers/all-MiniLM-L6-v2 --local-dir ${HOME}/mcp-gateway/models/all-MiniLM-L6-v2

# Step 3: Configure environment
# Complete: Initial Environment Configuration guide
export IMAGE_REGISTRY=ghcr.io/jrmatherly

# Step 4: Deploy with pre-built images from GHCR
./build_and_run.sh --prebuilt

# Step 5: Initialize Keycloak
# Complete: Initialize Keycloak Configuration guide

# Step 6: Access registry
open http://localhost:7860

# Step 7: Create first agent account
# Complete: Create Your First AI Agent Account guide

# Step 8: Restart auth server
docker compose down auth-server && docker compose rm -f auth-server && docker compose up -d auth-server

# Step 9: Test the setup
# Complete: Testing with mcp_client.py and agent.py guide
```

### 9.3. Amazon EC2 Deployment

**System Requirements:**
- **Minimum (Development)**: t3.large (2 vCPU, 8GB RAM), 20GB SSD
- **Recommended (Production)**: t3.2xlarge (8 vCPU, 32GB RAM), 50GB+ SSD

**Detailed Setup:**
```bash
# 1. Create directories
mkdir -p ${HOME}/mcp-gateway/{servers,auth_server,secrets,logs}
cp -r registry/servers ${HOME}/mcp-gateway/
cp auth_server/scopes.yml ${HOME}/mcp-gateway/auth_server/

# 2. Configure environment
cp .env.example .env
nano .env  # Configure required values

# 3. Generate credentials
cp credentials-provider/oauth/.env.example credentials-provider/oauth/.env
nano credentials-provider/oauth/.env
./credentials-provider/generate_creds.sh

# 4. Install dependencies
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
uv venv --python 3.12 && source .venv/bin/activate

sudo apt-get update
sudo apt-get install --reinstall docker.io -y
sudo apt-get install -y docker-compose
sudo usermod -a -G docker $USER
newgrp docker

# 5. Deploy services
./build_and_run.sh
```

### 9.4. HTTPS Configuration

**Option A: Let's Encrypt**
```bash
# Install certbot
sudo apt-get install -y certbot

# Get certificate
sudo certbot certonly --standalone -d your-domain.com

# Copy certificates
mkdir -p ${HOME}/mcp-gateway/ssl/{certs,private}
cp /etc/letsencrypt/live/your-domain/fullchain.pem ${HOME}/mcp-gateway/ssl/certs/
cp /etc/letsencrypt/live/your-domain/privkey.pem ${HOME}/mcp-gateway/ssl/private/
chmod 644 ${HOME}/mcp-gateway/ssl/certs/fullchain.pem
chmod 600 ${HOME}/mcp-gateway/ssl/private/privkey.pem

# Deploy
./build_and_run.sh
```

**Certificate Renewal (Cron):**
```bash
sudo crontab -e
# Add:
0 0,12 * * * certbot renew --quiet && cp /etc/letsencrypt/live/your-domain/fullchain.pem ${HOME}/mcp-gateway/ssl/certs/fullchain.pem && cp /etc/letsencrypt/live/your-domain/privkey.pem ${HOME}/mcp-gateway/ssl/private/privkey.pem && docker compose restart registry
```

### 9.5. Amazon EKS Deployment

For production Kubernetes deployments, see [EKS deployment guide](https://github.com/aws-samples/amazon-eks-machine-learning-with-terraform-and-kubeflow/tree/master/examples/agentic/mcp-gateway-microservices).

**Key Benefits:**
- High Availability: Multi-AZ pod distribution
- Auto Scaling: Horizontal pod autoscaling based on metrics
- Service Mesh: Istio integration for advanced traffic management
- Observability: Native CloudWatch and Prometheus integration
- Security: Pod security policies and network policies

### 9.6. AWS ECS Deployment (RECOMMENDED FOR PRODUCTION)

**Production-grade infrastructure** using AWS ECS Fargate with complete Terraform automation. This is the **most mature deployment option** for AWS production environments.

**Key Benefits:**
- **Fully Managed Compute**: ECS Fargate eliminates server management
- **Multi-AZ High Availability**: Services deployed across multiple availability zones
- **Auto-Scaling**: Task-level autoscaling based on CPU/memory metrics
- **Native AWS Integration**: CloudWatch, Secrets Manager, DocumentDB, Aurora
- **Infrastructure as Code**: Complete Terraform configuration in `terraform/aws-ecs/`
- **SSL/TLS**: Automatic certificate provisioning via AWS Certificate Manager
- **Cost Optimized**: Aurora Serverless v2 auto-scales from 0.5 to 2 ACUs

**Architecture Components:**
- **Compute**: ECS Fargate tasks (serverless containers)
- **Load Balancers**:
  - Main ALB (internet-facing) for Registry and Auth Server
  - Keycloak ALB for identity management
- **Data Layer**:
  - **Amazon DocumentDB**: Primary storage with native HNSW vector search
  - **Amazon Aurora PostgreSQL Serverless v2**: User data and sessions
- **Networking**: VPC with public/private subnets across 2 AZs
- **Security**: AWS Secrets Manager, SSL/TLS, security groups
- **Observability**: CloudWatch Logs, CloudWatch Alarms, SNS notifications

**Quick Start:**
```bash
# Prerequisites: Domain with Route53 hosted zone, AWS credentials

cd terraform/aws-ecs

# Step 1: Configure variables
cp terraform.tfvars.example terraform.tfvars
nano terraform.tfvars  # Set domain_name, aws_region, etc.

# Step 2: Initialize Terraform
terraform init

# Step 3: Review deployment plan
terraform plan

# Step 4: Deploy infrastructure (~60-90 minutes)
terraform apply

# Step 5: Get outputs
terraform output -json > terraform-outputs.json
```

**Post-Deployment:**
```bash
# View service URLs
terraform output registry_url
terraform output keycloak_url

# Monitor logs
./scripts/view-cloudwatch-logs.sh registry
./scripts/view-cloudwatch-logs.sh auth-server

# Check service health
aws ecs describe-services --cluster mcp-gateway-ecs-cluster \
  --services mcp-gateway-v2-registry

# Scale services
aws ecs update-service --cluster mcp-gateway-ecs-cluster \
  --service mcp-gateway-v2-registry --desired-count 3
```

**Configuration:**
- **Deployment Time**: 60-90 minutes for initial deployment
- **Prerequisites**:
  - Domain with Route53 hosted zone (any registrar supported)
  - AWS account with AdministratorAccess or specific IAM permissions
  - Terraform >= 1.5.0, AWS CLI >= 2.0, Docker >= 20.10
- **Region Support**: All commercial AWS regions
- **Regional Domains**: Automatic subdomain creation (e.g., `registry.us-east-1.your.domain`)

**Documentation:**
- Complete guide: `terraform/aws-ecs/README.md`
- Architecture diagrams: `terraform/aws-ecs/img/`
- Troubleshooting: `terraform/aws-ecs/README.md#troubleshooting`
- Cost optimization: `terraform/aws-ecs/README.md#cost-optimization`

**Comparison: ECS vs EKS vs EC2**

| Feature | ECS Fargate | EKS | EC2 |
|---------|-------------|-----|-----|
| **Maturity** | ✅ Production-ready | ⚠️  Preview/Experimental | ✅ Stable |
| **Management** | Fully managed | Managed control plane | Self-managed |
| **IaC** | Complete Terraform | Partial | Docker Compose |
| **Scaling** | Auto-scaling tasks | Horizontal pod autoscaling | Manual/script-based |
| **Cost** | Pay-per-task | Higher (node costs) | Fixed instance costs |
| **Setup Time** | 60-90 min | 2-4 hours | 30-60 min |
| **Best For** | Production deployments | K8s-native workloads | Development/testing |

### 9.7. Post-Installation Verification

```bash
# Check service status
docker compose ps
docker compose logs -f

# Test web interface
open http://localhost:7860

# Test gateway health
curl -f http://localhost:7860/health

# Configure AI assistants
./credentials-provider/generate_creds.sh
cp .oauth-tokens/vscode-mcp.json ~/.vscode/settings.json
```

## 10. Testing & Integration

### 10.1. MCP Testing Tools

**Python MCP Client**: `cli/mcp_client.py`

```bash
# Test gateway health
curl -f http://localhost:7860/health

# Test MCP connectivity with authentication
uv run python cli/mcp_client.py --url http://localhost:7860/mcp --token-file .oauth-tokens/ingress.json --operation ping

# List available tools (filtered by permissions)
uv run python cli/mcp_client.py --url http://localhost:7860/mcp --token-file .oauth-tokens/ingress.json --operation list

# Call specific tools
uv run python cli/mcp_client.py --url http://localhost:7860/mcp --token-file .oauth-tokens/ingress.json \
    --operation call --tool-name debug_auth_context --arguments '{}'
uv run python cli/mcp_client.py --url http://localhost:7860/mcp --token-file .oauth-tokens/ingress.json \
    --operation call --tool-name intelligent_tool_finder --arguments '{"natural_language_query": "quantum"}'
uv run python cli/mcp_client.py --url http://localhost:7860/currenttime/mcp --token-file .oauth-tokens/ingress.json \
    --operation call --tool-name current_time_by_timezone --arguments '{"tz_name": "America/New_York"}'

# Test against different gateway URLs
uv run python cli/mcp_client.py --url https://your-domain.com/mcp --token-file .oauth-tokens/ingress.json --operation ping
uv run python cli/mcp_client.py --url https://your-domain.com/mcp --token-file .oauth-tokens/ingress.json --operation list
```

**Additional Core Operations**:
```bash
# Core operations
uv run python cli/mcp_client.py --operation ping
uv run python cli/mcp_client.py --operation list
uv run python cli/mcp_client.py --operation call --tool-name get_stock_aggregates --arguments '{"ticker": "AAPL"}'
```

**Python Agent**: `agents/agent.py`
```bash
# Full-featured agent with AI capabilities
uv run python agents/agent.py --user-query "What time is it in Tokyo?"
```

### 10.2. Anthropic API Testing

**Test Script**: `cli/test_anthropic_api.py`

```bash
# Run all tests
uv run python cli/test_anthropic_api.py --token-file /path/to/token-file.json

# Test specific endpoint
uv run python cli/test_anthropic_api.py \
  --token-file /path/to/token-file.json \
  --test list-servers \
  --limit 10

# Get server details
uv run python cli/test_anthropic_api.py \
  --token-file /path/to/token-file.json \
  --test get-server \
  --server-name io.mcpgateway/atlassian
```

### 10.3. Credential Validation

```bash
# Validate all OAuth configurations
cd credentials-provider
./generate_creds.sh --verbose

# Test specific authentication flows
./generate_creds.sh --ingress-only --verbose    # MCP Gateway auth
./generate_creds.sh --egress-only --verbose     # External provider auth
./generate_creds.sh --agentcore-only --verbose  # AgentCore auth
```

### 10.4. Testing Architecture

**IMPORTANT:** The project uses **pytest as the primary testing framework**. For MCP connectivity testing, use `cli/mcp_client.py` instead of deprecated shell scripts.

**Test Categories:**
- **Unit Tests** (`tests/unit/`): Test individual functions and classes in isolation
- **Integration Tests** (`tests/integration/`): Test multiple components working together
- **E2E Tests** (`tests/e2e/`): End-to-end workflow tests

**Running Tests:**
```bash
# Run all tests with parallel execution (8 workers)
uv run pytest tests/ -n 8

# Expected results (as of 2026-01-22):
# - ~867 tests collected
# - Coverage: >=35% (enforced minimum)
# - Execution time: ~30-60 seconds with -n 8

# Run tests serially (slower, less memory)
uv run pytest tests/

# Run specific test categories
uv run pytest tests/unit/          # Unit tests only
uv run pytest tests/integration/   # Integration tests only
uv run pytest tests/e2e/           # E2E tests only

# Run with coverage report
uv run pytest tests/ -n 8 --cov=registry --cov-report=term-missing

# Run specific test file
uv run pytest tests/unit/test_server_service.py -v

# Stop at first failure
uv run pytest tests/ -n 8 -x
```

**Test Configuration:**
- **Location**: `pyproject.toml` lines 78-114
- **Minimum Coverage**: 35% (configured in pyproject.toml)
- **Test Markers**: unit, integration, e2e, auth, servers, search, health, core, repositories, slow, requires_models
- **Async Mode**: Auto-detected for async tests
- **Reports**: HTML report at `tests/reports/report.html`, JSON at `tests/reports/report.json`

**Test Prerequisites:**
```bash
# MongoDB must be running for integration tests
docker ps | grep mongo
# Should show: mcp-mongodb running on 0.0.0.0:27017

# Environment is auto-configured:
# - DOCUMENTDB_HOST=localhost
# - STORAGE_BACKEND=mongodb-ce
# - directConnection=true (single-node MongoDB)
```

**Test Best Practices:**

1. **Repository Reset Pattern** (for test isolation):
```python
@pytest.fixture(autouse=True)
async def reset_repository():
    """Reset repository state before each test."""
    repo = await get_server_repository()
    await repo.reset()  # Clear all data
    yield
    # Cleanup handled by TestClient teardown
```

2. **Memory Management** (avoid OOM on EC2):
```python
# Use -n 8 for parallel tests only if you have enough memory
# Otherwise run serially: uv run pytest tests/

# For CI/CD pipelines, use moderate parallelism:
pytest tests/ -n 2
```

3. **Fixture Cleanup**:
```python
# Always cleanup resources in fixtures
@pytest.fixture
async def test_client():
    async with AsyncClient(app=app, base_url="http://test") as client:
        yield client
    # Automatic cleanup via async context manager
```

4. **Mock External Dependencies**:
```python
# Mock security scanner, embeddings, external APIs
@pytest.fixture(autouse=True)
def mock_security_scanner():
    mock_service = MagicMock()
    mock_service.get_scan_config.return_value = SecurityScanConfig(enabled=False)
    with patch("registry.api.server_routes.security_scanner_service", mock_service):
        yield mock_service
```

**MCP Connectivity Testing:**
```bash
# Use cli/mcp_client.py for MCP testing (replaces deprecated shell scripts)
uv run python cli/mcp_client.py --url http://localhost:7860/mcp --token-file .oauth-tokens/ingress.json --operation ping
uv run python cli/mcp_client.py --url http://localhost:7860/mcp --token-file .oauth-tokens/ingress.json --operation list
```

**Continuous Integration:**
- Tests run automatically via GitHub Actions
- Triggered on PR creation and pushes to main/develop
- Configuration: `.github/workflows/registry-test.yml`
- All unit tests must pass (no failures allowed)

**Test Documentation:**
- Comprehensive guide: `docs/testing/README.md`
- Writing tests: `docs/testing/WRITING_TESTS.md`
- Test maintenance: `docs/testing/MAINTENANCE.md`
- Memory management: `docs/testing/memory-management.md`

**Coverage Requirements:**
- Minimum: 35% overall coverage (enforced)
- Target: 80% coverage for new features
- Coverage report: `htmlcov/index.html` (generated after test run)

## 11. Security Features

### 11.1. Security Scanning

**IMPORTANT:** The MCP Gateway & Registry provides **TWO SEPARATE security scanning systems** - one for MCP servers and one for A2A agents.

#### MCP Server Security Scanning

**Integrated Vulnerability Detection** with [Cisco AI Defence MCP Scanner](https://github.com/cisco-ai-defense/mcp-scanner):
- Automated security scanning during server registration
- Periodic registry-wide scans
- YARA pattern matching for malicious code detection
- Detailed security reports with vulnerability details, severity assessments, and remediation recommendations
- Automatic protection: Servers with security issues automatically disabled
- Compliance ready: Security audit trails and vulnerability tracking

**Configuration:**
```bash
# Enable MCP server scanning
SECURITY_SCAN_ENABLED=true
SECURITY_SCAN_ON_REGISTRATION=true
BLOCK_UNSAFE_SERVERS=true
```

**Service Location:** `registry/services/security_scanner.py`
**Scanner Integration:** `registry/api/server_routes.py` (automatic during registration)

#### A2A Agent Security Scanning

**Integrated Agent Vulnerability Detection** with [Cisco AI Defense A2A Scanner](https://github.com/cisco-ai-defense/a2a-scanner):
- Automated security scanning during agent registration
- Multi-analyzer support: YARA pattern matching, LLM-based analysis, static analysis
- Agent card validation and security assessment
- Configurable blocking policies for unsafe agents
- Detailed scan reports with security findings and recommendations
- Optional "security-pending" tagging for agents awaiting scan results

**Configuration:**
```bash
# Enable A2A agent scanning
AGENT_SECURITY_SCAN_ENABLED=true
AGENT_SECURITY_SCAN_ON_REGISTRATION=true
AGENT_SECURITY_BLOCK_UNSAFE_AGENTS=true
AGENT_SECURITY_ANALYZERS=yara,llm  # Comma-separated list
AGENT_SECURITY_SCAN_TIMEOUT=300
A2A_SCANNER_LLM_API_KEY=your-api-key  # For LLM-based analysis
AGENT_SECURITY_ADD_PENDING_TAG=true  # Add security-pending tag during scan
```

**Service Location:** `registry/services/agent_scanner.py`
**Scanner Integration:** `registry/api/agent_routes.py` (automatic during registration)
**Scan Storage:** Results stored in security scan repository for audit trails

**Analyzers:**
- **YARA**: Pattern-based malicious code detection
- **LLM**: AI-powered security analysis using Azure OpenAI
- **Static**: Code structure and configuration analysis

**Security Scan Results API:**
```bash
# Get agent scan results
GET /api/v2/agents/{agent_path}/security-scan

# Get server scan results
GET /api/v2/servers/{server_path}/security-scan
```

#### Global LLM Configuration with Fallback Hierarchy

**NEW (January 2026):** The MCP Gateway now supports a global LLM configuration with fallback hierarchy for embeddings and security scanners.

**Configuration Hierarchy:**
1. **Component-specific settings** (highest priority) - e.g., `MCP_SCANNER_LLM_API_KEY`
2. **Global LLM settings** (fallback) - e.g., `LLM_API_KEY`
3. **Default values** (lowest priority)

**Global LLM Settings:**
```bash
# Global LLM configuration (used as defaults for all LLM operations)
LLM_PROVIDER=litellm                    # Default: litellm
LLM_MODEL=openai/gpt-4o-mini           # Default LLM model
LLM_API_KEY=your-api-key               # Global API key (fallback for scanners)
LLM_API_BASE=https://your-proxy.com    # Optional: Custom API base (e.g., LiteLLM proxy)
```

**Scanner-Specific Overrides:**
```bash
# MCP Server Scanner (overrides global settings if set)
MCP_SCANNER_LLM_API_KEY=specific-key   # Falls back to LLM_API_KEY if empty
MCP_SCANNER_LLM_MODEL=openai/gpt-4o    # Falls back to LLM_MODEL if empty
MCP_SCANNER_LLM_API_BASE=              # Falls back to LLM_API_BASE if empty

# A2A Agent Scanner (overrides global settings if set)
A2A_SCANNER_LLM_API_KEY=specific-key   # Falls back to LLM_API_KEY if empty
A2A_SCANNER_LLM_MODEL=openai/gpt-4o    # Falls back to LLM_MODEL if empty
A2A_SCANNER_LLM_API_BASE=              # Falls back to LLM_API_BASE if empty
```

**Embeddings Configuration:**
```bash
# Embeddings provider settings
EMBEDDINGS_PROVIDER=sentence-transformers  # 'sentence-transformers' or 'litellm'
EMBEDDINGS_MODEL_NAME=all-MiniLM-L6-v2     # Model name
EMBEDDINGS_MODEL_DIMENSIONS=384             # Embedding dimensions (384 for default, 1024 for Bedrock Titan v2)

# LiteLLM-specific embeddings settings (only when provider='litellm')
EMBEDDINGS_API_KEY=                         # Optional: API key for LiteLLM
EMBEDDINGS_API_BASE=                        # Optional: Custom API base
EMBEDDINGS_AWS_REGION=us-east-1             # AWS region for Bedrock
```

**Benefits:**
- **Simplified Configuration**: Set API keys once globally, reuse across components
- **Flexible Overrides**: Component-specific settings override global defaults
- **LiteLLM Proxy Support**: Route all LLM calls through a central LiteLLM proxy
- **Cost Control**: Use different models for different components based on requirements

### 11.2. Security Best Practices

**Token Storage:**
- Tokens stored with `600` permissions in `.oauth-tokens/`
- Never commit `.env` files to version control
- Use secure secret management for production

**Network Security:**
- HTTPS-only for production
- PKCE where supported
- SSL/TLS certificate management

**Access Control:**
- Follow principle of least privilege
- Regular group membership reviews
- Scope-based authorization at server, method, and tool levels

**Token Lifecycle:**
- Ingress tokens: 1-hour expiry, auto-refresh via client credentials
- Egress tokens: Provider-specific, refresh tokens where available
- Automated refresh service for continuous monitoring

**Audit & Compliance:**
- Complete audit trails (Nginx access logs + auth server logs + IdP logs)
- Comprehensive metrics for compliance reporting
- Security event tracking and monitoring

### 11.3. Token Refresh Service

**Automated Token Refresh Service** provides:
- Continuous monitoring of all OAuth tokens for expiration
- Proactive refresh before tokens expire (configurable 1-hour buffer)
- Automatic MCP config generation for coding assistants
- Service discovery for both OAuth and no-auth services
- Background operation with comprehensive logging

**Start the service:**
```bash
./start_token_refresher.sh
```

**Generated configurations:**
- `.oauth-tokens/vscode_mcp.json` - VS Code extensions
- `.oauth-tokens/mcp.json` - Claude Code/Roocode
- Standard configuration format for custom MCP clients

## 12. Enterprise Features

### 12.1. AI Coding Assistants Integration

**Supported Assistants:**
- VS Code with MCP extension
- Cursor
- Claude Code (Roo Code)
- Cline

**Setup:**
```bash
# Generate configurations
./credentials-provider/generate_creds.sh

# VS Code
cp .oauth-tokens/vscode-mcp.json ~/.vscode/settings.json

# Roo Code
cp .oauth-tokens/mcp.json ~/.vscode/mcp-settings.json
```

### 12.2. Federation with External Registries

**Federation Architecture** allows you to import and manage servers/agents from multiple external registries through a unified interface with centralized authentication and access control.

**Supported Federation Sources:**

| Source | Type | Description | Visual Tag | Auth Required |
|--------|------|-------------|------------|---------------|
| **Anthropic MCP Registry** | MCP Servers | Official Anthropic curated servers | `ANTHROPIC` (purple) | No |
| **Workday ASOR** | AI Agents | Agent System of Record | `ASOR` (orange) | Yes (OAuth) |

**Key Benefits:**
- **Centralized Management**: Single interface for all servers/agents regardless of source
- **Unified Authentication**: Consistent auth/authz across all federated entities
- **Visual Tagging**: Color-coded tags show federation source (purple for Anthropic, orange for ASOR)
- **Automatic Synchronization**: Scheduled sync to keep federation up-to-date
- **Selective Import**: Import all or specific entities from each source
- **Audit Trail**: Complete tracking of federated entity provenance

#### Anthropic MCP Registry Integration

**Features:**
- Import servers from [Anthropic's official MCP Registry](https://registry.modelcontextprotocol.io)
- Full REST API compatibility
- No authentication required
- Purple `ANTHROPIC` visual tag on federated servers
- Unified access through your gateway with centralized auth

**Configuration:**
```bash
# Enable Anthropic federation in .env
ANTHROPIC_REGISTRY_ENABLED=true

# Federation config file: ~/mcp-gateway/federation.json
{
  "anthropic": {
    "enabled": true,
    "endpoint": "https://registry.modelcontextprotocol.io",
    "servers": []  # Empty = import all, or specify: [{"name": "server-name"}]
  }
}
```

**Import Servers via API:**
```bash
# Add specific server to federation config
curl -X POST "http://localhost:7860/api/federation/config/default/anthropic/servers/io.github.jgador/websharp" \
  -H "Authorization: Bearer $TOKEN"

# Trigger sync to import servers
curl -X POST "http://localhost:7860/api/federation/sync?source=anthropic" \
  -H "Authorization: Bearer $TOKEN"

# Import specific servers via federation config update
curl -X PUT "http://localhost:7860/api/federation/config/default" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "anthropic": {
      "enabled": true,
      "endpoint": "https://registry.modelcontextprotocol.io",
      "servers": [
        {"name": "io.github.jgador/websharp"},
        {"name": "modelcontextprotocol/filesystem"},
        {"name": "modelcontextprotocol/brave-search"}
      ]
    }
  }'
```

**Service Location:** `registry/services/federation/anthropic_client.py`

#### Workday ASOR Integration

**Features:**
- Import AI agents from Workday Agent System of Record
- OAuth 2.0 authentication with token refresh
- Orange `ASOR` visual tag on federated agents
- Enterprise agent lifecycle management
- Scheduled synchronization with ASOR backend

**Prerequisites:**
1. Valid Workday tenant with ASOR enabled
2. OAuth credentials (Client ID and Secret)
3. Access token with "Agent System of Record" scope

**Configuration:**
```bash
# Add to .env
ASOR_CLIENT_ID=your_client_id
ASOR_CLIENT_SECRET=your_client_secret
ASOR_TENANT_NAME=your_tenant_name
ASOR_HOSTNAME=your_host_name
ASOR_ACCESS_TOKEN=your_oauth_token  # Generated via get_asor_token.py

# Federation config: ~/mcp-gateway/federation.json
{
  "asor": {
    "enabled": true,
    "endpoint": "https://wcpdev-services1.wd103.myworkday.com/ccx/api/asor/v1/awsasor_wcpdev1",
    "auth_env_var": "ASOR_ACCESS_TOKEN",
    "agents": []  # Empty = import all
  }
}
```

**Get OAuth Token:**
```bash
# Run token generator (interactive OAuth flow)
python3 get_asor_token.py

# Follow prompts to:
# 1. Authorize via browser
# 2. Complete OAuth flow
# 3. Receive access token for .env
```

**Service Location:** `registry/services/federation/asor_client.py`

#### Federation Synchronization

**Automatic Sync:**
- Periodic synchronization keeps federated entities up-to-date
- Configurable sync schedule
- Handles entity updates, additions, and removals
- Maintains federation metadata (source, sync timestamp)

**Manual Sync (API):**
```bash
# Trigger manual federation sync (all sources)
curl -X POST "http://localhost:7860/api/federation/sync" \
  -H "Authorization: Bearer $TOKEN"

# Sync specific source (Anthropic or ASOR)
curl -X POST "http://localhost:7860/api/federation/sync?source=anthropic" \
  -H "Authorization: Bearer $TOKEN"
curl -X POST "http://localhost:7860/api/federation/sync?source=asor" \
  -H "Authorization: Bearer $TOKEN"
```

**Documentation:** Complete federation guide at `docs/federation.md`

### 12.3. Token Vending Service

**Capabilities:**
- JWT token generation for M2M authentication
- Service account provisioning
- Automated token lifecycle management
- Integration with identity providers

**Usage:**
```bash
# Generate token for agent
uv run python credentials-provider/token_refresher.py --agent-id sre-agent

# Check generated token
cat .oauth-tokens/agent-sre-agent-m2m-token.json
```

## 13. Troubleshooting

### 13.1. Common Issues

**Services won't start:**
```bash
# Check Docker daemon
sudo systemctl status docker

# Check environment variables
cat .env | grep -v SECRET

# View detailed logs
docker compose logs --tail=50
```

**Authentication failures:**
```bash
# Verify Cognito/Keycloak configuration
aws cognito-idp describe-user-pool --user-pool-id YOUR_POOL_ID

# Test credential generation
cd credentials-provider && ./generate_creds.sh --verbose
```

**Network connectivity issues:**
```bash
# Check port availability
sudo netstat -tlnp | grep -E ':(80|443|7860|8080)'

# Test internal services
curl -v http://localhost:7860/health
```

**Permission denied errors:**
- Check user's Cognito/Keycloak group memberships
- Verify scope mappings in `scopes.yml`
- Ensure tool names match exactly
- Regenerate tokens after group changes

**HTTPS not working:**
```bash
# Check certificate files
ls -la ${HOME}/mcp-gateway/ssl/certs/ ${HOME}/mcp-gateway/ssl/private/

# Check container logs
docker compose logs registry | grep -i ssl

# Verify port 443
sudo netstat -tlnp | grep 443
```

### 13.2. Debugging Tools

**Enable Verbose Logging:**
```python
# In auth_server/server.py or relevant module
logging.basicConfig(level=logging.DEBUG)
```

**Authentication Event Logging:**
```python
def log_auth_event(event_type: str, username: str = None, details: dict = None):
    logger.info(f"AUTH_EVENT: {event_type}", extra={
        'username': username,
        'event_type': event_type,
        'details': details,
        'timestamp': datetime.utcnow().isoformat()
    })
```

**Health Check:**
```bash
curl http://localhost:7860/health
```

## 14. Code Organization & Patterns

**CRITICAL:** The MCP Gateway & Registry follows strict architectural patterns to ensure maintainability and consistency. Follow these patterns religiously - violations will break the application architecture.

### 14.1. Layered Architecture (MANDATORY)

**The application MUST follow this layered architecture:**

```
API Routes → Services → Repositories → Storage Backends
```

**Each layer has specific responsibilities:**

1. **API Routes** (`registry/api/`):
   - Handle HTTP requests and responses
   - Validate request parameters
   - Call service layer methods
   - Return HTTP status codes and responses
   - **NEVER access repositories directly**

2. **Service Layer** (`registry/services/`):
   - Implement business logic
   - Coordinate between multiple repositories
   - Handle complex operations and workflows
   - Validate business rules
   - **ALWAYS use factory pattern to get repositories**

3. **Repository Layer** (`registry/repositories/`):
   - Abstract data access via interfaces (`interfaces.py`)
   - Provide consistent API across all storage backends
   - Handle data persistence and retrieval
   - Implement search and querying logic

4. **Storage Backends** (`registry/repositories/{backend}/`):
   - Implement repository interfaces for specific storage
   - File backend: `registry/repositories/file/` (DEPRECATED)
   - MongoDB CE / DocumentDB: `registry/repositories/documentdb/` (unified implementation)

### 14.2. Factory Pattern (REQUIRED)

**ALWAYS use the factory pattern** to obtain repository instances. NEVER instantiate repositories directly.

**Correct Usage:**
```python
from registry.repositories.factory import (
    get_server_repository,
    get_agent_repository,
    get_scope_repository,
    get_search_repository,
    get_security_scan_repository
)

# In service layer
async def some_service_method():
    server_repo = await get_server_repository()
    servers = await server_repo.get_all_servers()
    return servers
```

**Wrong Usage (ANTIPATTERN):**
```python
# ❌ NEVER DO THIS - Direct instantiation
from registry.repositories.documentdb.server_repository import DocumentDBServerRepository
server_repo = DocumentDBServerRepository()  # WRONG!

# ❌ NEVER DO THIS - Direct repository access from routes
from registry.api.server_routes import router

@router.get("/servers")
async def list_servers():
    server_repo = await get_server_repository()  # WRONG! Use service layer
    return await server_repo.get_all_servers()
```

### 14.3. Repository Abstraction

**All storage backends MUST provide identical behavior** through polymorphism. Code using repositories should work with ANY backend without modification.

**Abstract Base Classes:**
- `BaseServerRepository` (registry/repositories/interfaces.py)
- `BaseAgentRepository`
- `BaseScopeRepository`
- `BaseSearchRepository`
- `BaseSecurityScanRepository`

**Implementation Contract:**
```python
# All implementations must provide the same methods with the same signatures
class DocumentDBServerRepository(BaseServerRepository):
    async def get_all_servers(self, namespace: Optional[str] = None) -> List[dict]:
        # DocumentDB-specific implementation
        pass

class FileServerRepository(BaseServerRepository):
    async def get_all_servers(self, namespace: Optional[str] = None) -> List[dict]:
        # File-specific implementation (DEPRECATED)
        pass
```

### 14.4. Critical Antipatterns (DO NOT DO THIS)

**❌ 1. Direct Repository Access from Routes**
```python
# WRONG - Route directly accessing repository
@router.get("/servers/{server_path}")
async def get_server(server_path: str):
    repo = await get_server_repository()  # ANTIPATTERN!
    return await repo.get_server(server_path)

# CORRECT - Route calls service layer
@router.get("/servers/{server_path}")
async def get_server(server_path: str):
    return await server_service.get_server(server_path)
```

**❌ 2. Direct Repository Instantiation**
```python
# WRONG - Bypasses factory pattern
from registry.repositories.documentdb.server_repository import DocumentDBServerRepository
repo = DocumentDBServerRepository()  # ANTIPATTERN!

# CORRECT - Use factory
from registry.repositories.factory import get_server_repository
repo = await get_server_repository()
```

**❌ 3. Hardcoding Storage Backend**
```python
# WRONG - Hardcoded backend selection
if storage_type == "documentdb":
    repo = DocumentDBServerRepository()
elif storage_type == "file":
    repo = FileServerRepository()

# CORRECT - Factory handles backend selection
repo = await get_server_repository()  # Uses STORAGE_BACKEND env var
```

**❌ 4. Skipping Service Layer**
```python
# WRONG - Route contains business logic
@router.post("/servers")
async def create_server(server: ServerRegistration):
    repo = await get_server_repository()
    # Business logic here - ANTIPATTERN!
    if server.status == "active":
        await repo.create_server(server)
    return {"status": "created"}

# CORRECT - Business logic in service layer
@router.post("/servers")
async def create_server(server: ServerRegistration):
    return await server_service.create_server(server)
```

**❌ 5. Implementing Custom Vector Search**
```python
# WRONG - Custom vector search implementation
def custom_vector_search(query: str):
    # Don't implement your own vector search!
    pass

# CORRECT - Use repository abstraction
search_repo = await get_search_repository()
results = await search_repo.hybrid_search(query)
```

### 14.5. Code Organization Checklist

Before submitting code, verify:

- [ ] **No direct repository access from routes** - All routes call service layer
- [ ] **Factory pattern used** - No direct repository instantiation
- [ ] **Service layer exists** - Business logic in `registry/services/`
- [ ] **Repository interfaces** - New repositories extend abstract base classes
- [ ] **Backend agnostic** - Code works with any storage backend
- [ ] **No hardcoded backends** - Use `STORAGE_BACKEND` environment variable
- [ ] **Separation of concerns** - Each layer handles only its responsibility
- [ ] **Polymorphism** - All repository implementations provide identical APIs

### 14.6. Design Documentation

**Architecture References:**
- Database Abstraction Layer: `docs/design/database-abstraction-layer.md`
- Storage Architecture: `docs/design/storage-architecture-mongodb-documentdb.md`
- Repository Pattern: `registry/repositories/interfaces.py` (docstrings)

**When to Read Design Docs:**
- Before implementing new repository backend
- Before adding new data access patterns
- When confused about layering
- When reviewing code for architecture compliance

## 15. Additional Resources

### 15.1. Documentation Links

- [Complete Setup Guide](docs/complete-setup-guide.md) - Step-by-step from scratch on AWS EC2
- [Installation Guide](docs/installation.md) - Complete setup instructions for EC2 and EKS
- [Configuration Reference](docs/configuration.md) - Environment variables and settings
- [Authentication Guide](docs/auth.md) - OAuth and identity provider integration
- [Keycloak Integration](docs/keycloak-integration.md) - Enterprise identity with agent audit trails
- [Amazon Cognito Setup](docs/cognito.md) - Step-by-step IdP configuration
- [Fine-Grained Access Control](docs/scopes.md) - Permission management and security
- [Dynamic Tool Discovery](docs/dynamic-tool-discovery.md) - Autonomous agent capabilities
- [AI Coding Assistants Setup](docs/ai-coding-assistants-setup.md) - VS Code, Cursor, Claude Code integration
- [API Reference](docs/registry_api.md) - Programmatic registry management
- [Anthropic Registry API](docs/anthropic_registry_api.md) - REST API compatibility
- [Service Management](docs/service-management.md) - Server lifecycle and operations
- [Token Refresh Service](docs/token-refresh-service.md) - Automated token refresh and lifecycle management
- [Observability Guide](docs/OBSERVABILITY.md) - Metrics, monitoring, and OpenTelemetry setup
- [Troubleshooting Guide](docs/FAQ.md) - Common issues and solutions
- [Architectural Decision](docs/design/architectural-decision-reverse-proxy-vs-application-layer-gateway.md) - Reverse proxy vs application layer gateway
- [Registry Auth Architecture](docs/registry-auth-architecture.md) - Internal authentication mechanisms

### 15.2. Community & Support

**Getting Help:**
- [FAQ & Troubleshooting](docs/FAQ.md) - Common questions and solutions
- [GitHub Issues](https://github.com/agentic-community/mcp-gateway-registry/issues) - Bug reports and feature requests
- [GitHub Discussions](https://github.com/agentic-community/mcp-gateway-registry/discussions) - Community support and ideas

**Contributing:**
- [Contributing Guide](CONTRIBUTING.md) - How to contribute code and documentation
- [Code of Conduct](CODE_OF_CONDUCT.md) - Community guidelines and expectations
- [Security Policy](SECURITY.md) - Responsible disclosure process

### 15.3. License

This project is licensed under the Apache-2.0 License - see the [LICENSE](LICENSE) file for details.

---

*Part of the [Agentic Community](https://github.com/agentic-community) ecosystem - building the future of AI-driven development.*