Grundsatz: Der gesamte Tech-Stack läuft On-Premises im eigenen Rechenzentrum auf VMware vSphere. Ausnahmen bilden ausschließlich Microsoft 365 (Teams, Exchange, SharePoint via Graph API) und die Anthropic Claude API als Cloud-LLM-Service. Alle weiteren Dienste werden über Open-Source-Komponenten abgebildet.
Norm – Workflow-Engine: Norm ist die verbindliche Standard-Workflow-Engine für alle Automatisierungen im HAFS Self-Help Service Portal. Sämtliche Workflow-, Runbook- und Automatisierungsprozesse (Modul M7, Auto-Resolve, Onboarding/Offboarding, Genehmigungsworkflows, Scheduled Tasks, Event-Driven Automation) werden ausschließlich als Norm Flows implementiert. Custom Module (TypeScript) erweitern die 280+ vorhandenen Integrations-Module für HAFS-spezifische Systeme. Keine Eigenentwicklung eines Workflow-Builders.
1. Stack-Übersicht
Complete Tech Stack (On-Premises)
FRONTEND BACKEND AI/ML
──────── ─────── ─────
React 19 Nest.js (Backend) Anthropic Claude API
Next.js 15 Next.js 15 (BFF) Claude Sonnet 4.5
TypeScript 5.x TypeScript 5.x Claude Haiku 4.5
Tailwind CSS 4.x Microservices Elasticsearch (RAG)
Shadcn/UI gRPC + REST LangChain.js
Zustand CQRS + NestJS Mod. Anthropic SDK
RabbitMQ / NATS
INFRA DATA SECURITY
───── ──── ────────
RKE2 (Rancher) PostgreSQL (Patroni) HashiCorp Vault
VMware vSphere MongoDB (ReplicaSet) OPA / Gatekeeper
Terraform Redis (Sentinel) Calico (NetworkPolicy)
Flux v2 (GitOps) MinIO (S3) NGINX + ModSecurity
Harbor Elasticsearch Trivy / SonarQube
Helm / Kustomize RabbitMQ / NATS IAM/PAM Add-on
MONITORING INTEGRATION COMMUNICATION
────────── ─────────── ─────────────
Prometheus Microsoft Graph API Microsoft Teams Bot
Grafana On-Prem AD (LDAP) SMTP / Exchange Online
Loki Power Automate WebSockets (Real-Time)
OpenTelemetry IAM/PAM Add-on API
Jaeger SIEM Add-on API
Alertmanager Kong / Traefik
WORKFLOW ENGINE
───────────────
Norm (Workflow Engine)
280+ Integrations (Module)
No-Code/Low-Code Flow Builder
Webhooks, Scheduled Flows, AI Module
1.1 Migrationsübersicht Azure → On-Premises
| Azure-Dienst | On-Prem-Äquivalent | Lizenz / Typ |
| Azure AKS | RKE2 (Rancher-managed) auf VMware vSphere | Open Source (Apache 2.0) |
| Azure SQL | PostgreSQL mit Patroni HA | Open Source (PostgreSQL License) |
| Cosmos DB | MongoDB ReplicaSet | SSPL / Community |
| Azure Redis Cache | Redis mit Sentinel | Open Source (BSD) |
| Azure Blob Storage | MinIO (S3-kompatibel) | Open Source (AGPLv3) |
| Azure AI Search | Elasticsearch (Vektor + Volltext) | Open Source (SSPL) |
| Azure OpenAI Service | Anthropic Claude API (Cloud) | Cloud-Service (Vertrag) |
| Azure APIM | Kong / Traefik (API Gateway) | Open Source / Enterprise |
| Azure Key Vault | HashiCorp Vault | Open Source (BSL) |
| Azure Monitor + Log Analytics | Prometheus + Grafana + Loki | Open Source (Apache 2.0) |
| Application Insights | OpenTelemetry + Jaeger | Open Source (Apache 2.0) |
| Microsoft Sentinel | SIEM Add-on (eigene App, Elasticsearch-basiert) | Eigenentwicklung |
| Azure PIM / CyberArk | IAM/PAM Add-on (eigene App) | Eigenentwicklung |
| Azure Container Registry | Harbor | Open Source (Apache 2.0) |
| Azure Front Door + WAF | NGINX Ingress + ModSecurity | Open Source |
| Azure Service Bus | RabbitMQ / NATS | Open Source (MPL / Apache 2.0) |
| Azure Logic Apps / Power Automate | Norm (Workflow Engine) | — |
| ExpressRoute / VPN | Entfällt (alles On-Prem, nur Internet für M365/Claude) | — |
2. Frontend Stack
2.1 Technologie-Entscheidungen
| Technologie | Version | Begründung |
| React | 19.x | Industriestandard, großes Ökosystem, Server Components |
| Next.js | 15.x | SSR/SSG, API Routes, BFF-Pattern, Middleware |
| TypeScript | 5.x | Type-Safety, bessere Wartbarkeit |
| Tailwind CSS | 4.x | Utility-First, schnelle Entwicklung, konsistentes Design |
| Shadcn/UI | Latest | Hochwertige, zugängliche Komponenten, anpassbar |
| Zustand | Latest | Leichtgewichtiges State Management |
| React Query (TanStack) | Latest | Server-State Management, Caching, Optimistic Updates |
| React Hook Form | Latest | Performante Formulare mit geringem Re-Render |
| Zod | Latest | Schema-Validierung (Frontend + Backend shared) |
2.2 Portal-Struktur
Next.js App Router Structure
portal-frontend/
├── app/ # Next.js App Router
│ ├── (auth)/ # Authenticated Routes
│ │ ├── dashboard/ # User Dashboard
│ │ ├── tickets/ # Ticket System
│ │ ├── services/ # Service Katalog
│ │ ├── knowledge/ # Knowledge Base
│ │ ├── security/ # Security Center (IAM/PAM)
│ │ ├── governance/ # Governance & Compliance
│ │ ├── analytics/ # Analytics & Reporting
│ │ └── admin/ # Administration
│ ├── api/ # BFF API Routes (Next.js)
│ └── layout.tsx # Root Layout
├── components/
│ ├── ui/ # Shadcn/UI Basis-Komponenten
│ ├── tickets/ # Ticket-spezifische Komponenten
│ ├── chat/ # Chatbot-Komponenten
│ ├── security/ # Security-Komponenten
│ └── shared/ # Gemeinsame Komponenten
├── lib/ # Utilities, API-Clients, Konfiguration
├── hooks/ # Custom React Hooks
└── styles/ # Globale Styles, Theme-Konfiguration
2.3 Microsoft Teams Bot
| Technologie | Zweck |
| Bot Framework SDK | Teams Bot Grundgerüst |
| Adaptive Cards | Rich-Ticket-Darstellung in Teams |
| Teams Toolkit | Entwicklung & Deployment |
| Message Extensions | Ticket-Suche direkt in Teams |
3. Backend Stack
3.1 Nest.js Backend Services
| Technologie | Version | Begründung |
| Nest.js | 11.x | Enterprise-Grade Node.js Framework, Modular, DI, TypeScript-native |
| TypeScript | 5.x | Type-Safety, bessere Wartbarkeit, Shared Types mit Frontend |
| Prisma | Latest | ORM für PostgreSQL, Type-Safe Queries, Migrations |
| CQRS + NestJS Modules | Latest | CQRS-Pattern, Modular Architecture, Pipeline Behaviors |
| class-validator | Latest | Request-Validierung |
| nestjs-resilience | Latest | Resilience (Retry, Circuit Breaker, Timeout) |
| @nestjs/microservices | Latest | Message Bus Abstraktion (RabbitMQ / NATS) |
| BullMQ | Latest | Background-Jobs & Scheduling (Redis-basiert) |
| class-transformer | Latest | Object Mapping / Serialization |
| Pino / Winston | Latest | Structured Logging (Sink: Loki / OpenTelemetry) |
3.2 Ergänzende Services
| Technologie | Einsatz | Begründung |
| Next.js 15 (BFF) | Backend-for-Frontend, API Routes, SSR | Nahtlose Frontend-Integration, Middleware, Server Components |
| Python | ML-Pipelines, Data Processing (optional) | ML-Ökosystem, Embedding-Verarbeitung |
3.3 Microservice-Architektur
Service Architecture
┌─── Nest.js Backend Services ─────────────────────────────┐
│ │
│ ticket-service (Nest.js, CQRS, PostgreSQL) │
│ identity-service (Nest.js, LDAP + OIDC) │
│ security-service (Nest.js, IAM/PAM Add-on) │
│ governance-service (Nest.js, Compliance-Engine) │
│ catalog-service (Nest.js, Service-Katalog) │
│ automation-service (Norm + Nest.js Adapter) │
│ notification-service (Nest.js, WebSockets, SMTP) │
│ analytics-service (Nest.js, Reporting) │
│ ai-gateway (Nest.js, LangChain.js, Claude) │
│ chatbot-service (Nest.js, Bot Framework) │
│ knowledge-service (Nest.js, Elasticsearch Client) │
└───────────────────────────────────────────────────────────┘
Kommunikation:
• Synchron: REST (Public API) + gRPC (Service-intern)
• Asynchron: RabbitMQ / NATS (Events, Commands, Sagas)
• Real-Time: WebSockets (Notifications, Chat-Streaming)
• API GW: Kong / Traefik (Rate Limiting, Auth, TLS)
3.4 API-Design
| Aspekt | Entscheidung |
| API Style | REST (extern) + gRPC (intern zwischen Services) |
| API Versioning | URL-basiert (/api/v1/) |
| Auth | OAuth 2.0 + JWT (OIDC via On-Prem AD / Keycloak) |
| Documentation | OpenAPI 3.1 / Swagger UI |
| API Gateway | Kong / Traefik (Rate Limiting, Auth, TLS Termination) |
| Rate Limiting | Kong Plugin / Traefik Middleware |
| Caching | Redis (Sentinel) + ETags |
4. AI/ML Stack
4.1 AI-Technologien
| Technologie | Einsatz | Begründung |
| Anthropic Claude API | LLM-Basis (Cloud-Service) | Hohe Qualität, erweiterte Reasoning-Fähigkeiten, EU-Vertrag |
| Claude Sonnet 4.5 | Komplexe Aufgaben: Ticket-Analyse, Knowledge-Generierung, Governance-Berichte | Bestes Preis-Leistungs-Verhältnis für komplexe Tasks |
| Claude Haiku 4.5 | Klassifikation, Routing, einfache Antworten, Sentiment | Schnell, kostengünstig, ideal für High-Volume |
| Elasticsearch | Vektor + Semantic + Volltext Search (RAG) | On-Prem, Hybrid-Search, kNN-Vektoren |
| LangChain.js | AI Orchestration in Node.js Services | Große Community, Claude-Integration, Chains/Agents |
| Anthropic SDK | Direkte Claude-API-Anbindung | Offizielles SDK, Streaming, Tool-Use |
| Embedding-Modell | Dokumenten-Embedding für RAG | Open-Source (z.B. all-MiniLM-L6-v2) oder Anthropic Embeddings |
4.2 Modell-Einsatzmatrix
| Use Case | Modell | Begründung |
| Ticket-Klassifikation (Kategorie, Priorität) | Claude Haiku 4.5 | Schnell, günstig, ausreichend für Klassifikation |
| Ticket-Routing | Claude Haiku 4.5 | Regelbasierte Zuordnung mit LLM-Unterstützung |
| Sentiment-Analyse | Claude Haiku 4.5 | Einfache Analyse, hoher Durchsatz |
| Chatbot-Antworten (einfach) | Claude Haiku 4.5 | Schnelle Antwortzeiten für Standard-Fragen |
| Chatbot-Antworten (komplex) | Claude Sonnet 4.5 | Tiefe Analyse, mehrstufiges Reasoning |
| Knowledge-Artikel generieren | Claude Sonnet 4.5 | Qualitativ hochwertige Texterstellung |
| Ticket-Zusammenfassungen | Claude Sonnet 4.5 | Präzise Zusammenfassung komplexer Vorgänge |
| Governance-Berichte | Claude Sonnet 4.5 | Analytische Tiefe, Compliance-Verständnis |
| Auto-Resolution (bekannte Issues) | Claude Haiku 4.5 | Pattern Matching + vordefinierte Lösungen |
4.3 RAG Stack (Retrieval Augmented Generation)
RAG Pipeline
Datenquellen Verarbeitung Speicher
──────────── ──────────── ────────
Knowledge Base ──► Chunking ──────────► Elasticsearch
Gelöste Tickets ──► (Recursive, (kNN Vector Index)
Confluence ──────► Semantic)
SharePoint ──────► MongoDB
Embedding ──────────► (Session Cache,
(all-MiniLM-L6-v2 Chat-Historien)
oder Anthropic)
Query Pipeline:
User Query
→ Query Reformulation (Claude Haiku 4.5)
→ Hybrid Search (Elasticsearch: BM25 + kNN)
→ Reranking (Score Fusion / Cross-Encoder)
→ Context Assembly (Top-K Chunks + Metadaten)
→ LLM Generation (Claude Sonnet 4.5)
→ Guardrails (Hallucination Check, PII Filter)
→ Output
4.4 AI Gateway Architektur
AI Gateway (Node.js)
┌─── Eingang ───────────────────────────────────────────────┐
│ Kong / Traefik → Auth Check → Rate Limit → AI Gateway │
└───────────────────────────────────────────────────────────┘
┌─── Verarbeitung ──────────────────────────────────────────┐
│ │
│ 1. Request Validation + Prompt Sanitization │
│ 2. Modell-Routing (Haiku vs. Sonnet je nach Komplexität) │
│ 3. RAG Context Retrieval (Elasticsearch) │
│ 4. Prompt Assembly (System + Context + User) │
│ 5. Anthropic API Call (Streaming) │
│ 6. Response Guardrails │
│ 7. Audit Logging │
└───────────────────────────────────────────────────────────┘
┌─── Ausgang ───────────────────────────────────────────────┐
│ Response → Redis Cache → Client (SSE / WebSocket) │
└───────────────────────────────────────────────────────────┘
5. Datenbank-Stack
5.1 Datenbankübersicht
| Datenbank | Typ | Einsatz | HA-Strategie | Begründung |
| PostgreSQL 16 | Relational | Tickets, Config, Users, Audit | Patroni + etcd (3-Node-Cluster) | ACID, Transaktionen, Reporting, breites Ökosystem |
| MongoDB 7 | Document / NoSQL | Chat-Sessions, Flexible Daten, Logs | ReplicaSet (3 Nodes) | Schema-Flexibilität, JSON-native |
| Redis 7 | In-Memory | Session Cache, API Cache, Rate Limiting | Sentinel (3 Nodes) | Sub-ms Latency, Pub/Sub |
| MinIO | Object Storage | Attachments, Dokumente, Exports, Backups | Erasure Coding (4+ Nodes) | S3-kompatibel, kostengünstig |
| Elasticsearch 8 | Search / Vector | Knowledge Base, RAG, Volltext, SIEM | Cluster (3+ Nodes) | Hybrid Search (BM25 + kNN), Aggregationen |
5.2 Datenbankschema-Highlights (PostgreSQL)
Core Entities (PostgreSQL)
Tickets
├── id UUID (PK)
├── ticket_number VARCHAR (HAFS-YYYY-NNNNN)
├── title, description TEXT
├── status VARCHAR (enum)
├── priority VARCHAR (enum)
├── type VARCHAR (enum)
├── category_l1/l2/l3 VARCHAR
├── channel VARCHAR (Web, Teams, Email, API)
├── created_by UUID (FK → users)
├── assigned_team UUID (FK → teams)
├── assigned_agent UUID (FK → users)
├── sla_response_deadline TIMESTAMPTZ
├── sla_resolution_deadline TIMESTAMPTZ
├── ai_confidence_score DECIMAL
├── ai_suggested_category VARCHAR
├── sentiment_score DECIMAL
├── created_at, updated_at TIMESTAMPTZ
├── resolved_at, closed_at TIMESTAMPTZ
└── Indexes: status, priority, assigned_agent, created_at
AuditLogs (Append-Only, partitioniert nach Monat)
├── id UUID (PK)
├── timestamp TIMESTAMPTZ
├── actor VARCHAR
├── action VARCHAR
├── target VARCHAR
├── details JSONB
├── source_ip INET
├── device_id VARCHAR
├── session_id UUID
└── compliance_tags JSONB (Array)
AccessRequests
├── id UUID (PK)
├── requested_by UUID (FK → users)
├── target_resource VARCHAR
├── permission_level VARCHAR
├── risk_score DECIMAL
├── approval_chain JSONB
├── status VARCHAR
├── expires_at TIMESTAMPTZ
├── provisioned_at TIMESTAMPTZ
└── revoked_at TIMESTAMPTZ
5.3 Backup-Strategie
| Datenbank | Backup-Methode | Frequenz | Aufbewahrung |
| PostgreSQL | pgBackRest (Inkrementell + WAL-Archivierung) | Täglich Full, stündlich Inkr. | 30 Tage lokal, 90 Tage MinIO |
| MongoDB | mongodump + Oplog-Replay | Täglich Full, stündlich Inkr. | 30 Tage lokal, 90 Tage MinIO |
| Redis | RDB Snapshots + AOF | Alle 5 Minuten | 7 Tage |
| MinIO | Cross-Site Replication (optional) | Kontinuierlich | Unbegrenzt (Lifecycle Policies) |
| Elasticsearch | Snapshot API → MinIO | Täglich | 30 Tage |
6. Infrastructure Stack
6.1 Plattform & Virtualisierung
| Technologie | Einsatz |
| VMware vSphere 8 | Hypervisor / Virtualisierungsplattform |
| RKE2 (Rancher Kubernetes Engine 2) | FIPS-konforme Kubernetes-Distribution |
| Rancher | Kubernetes-Management UI, Multi-Cluster |
| Docker | Container Runtime (via containerd in RKE2) |
6.2 Container & Orchestrierung
| Technologie | Einsatz |
| RKE2 | Kubernetes-Cluster (Control Plane + Worker Nodes) |
| Helm 3 | Kubernetes Package Manager |
| Kustomize | Environment-spezifische K8s-Konfiguration |
| Flux v2 | GitOps Continuous Deployment |
| Harbor | Private Container Registry + Image Scanning |
| KEDA | Event-driven Auto-Scaling (RabbitMQ, NATS, HTTP) |
| Calico | Container Network Interface (CNI), NetworkPolicies |
6.3 Infrastructure as Code
| Technologie | Einsatz |
| Terraform | IaC für VMware vSphere VMs, Netzwerke, Storage |
| Terraform vSphere Provider | VM-Provisionierung, Templates, Datastores |
| Helm Charts | Kubernetes Deployments (eigene + Community Charts) |
| Kustomize | Environment Overlays (Dev, Staging, Prod) |
| Ansible (optional) | OS-Konfiguration, Baseline-Härtung |
6.4 Cluster-Layout
RKE2 Kubernetes Cluster (VMware vSphere)
┌─── Control Plane (3 VMs, HA) ────────────────────────────┐
│ rke2-cp-01 rke2-cp-02 rke2-cp-03 │
│ (etcd + API Server + Controller + Scheduler) │
└───────────────────────────────────────────────────────────┘
┌─── Worker Nodes: Application (4+ VMs) ───────────────────┐
│ worker-app-01 .. worker-app-04 │
│ → Nest.js Services, Next.js BFF, Frontend │
└───────────────────────────────────────────────────────────┘
┌─── Worker Nodes: Data (3+ VMs) ──────────────────────────┐
│ worker-data-01 .. worker-data-03 │
│ → PostgreSQL, MongoDB, Redis, Elasticsearch │
└───────────────────────────────────────────────────────────┘
┌─── Worker Nodes: Infra (2+ VMs) ─────────────────────────┐
│ worker-infra-01 .. worker-infra-02 │
│ → Monitoring, Logging, Harbor, Vault, Kong/Traefik │
└───────────────────────────────────────────────────────────┘
Namespaces:
┌────────────┬────────────┬────────────┬────────────────────┐
│ hafs-app │ hafs-data │ hafs-infra │ hafs-monitoring │
│ hafs-ai │ hafs-security │ hafs-staging │ hafs-dev │
└────────────┴────────────┴────────────┴────────────────────┘
6.5 CI/CD Pipeline
CI/CD Pipeline Flow
┌── Source ────┐ ┌── Build + Test ───┐ ┌── Deploy ────────┐
│ │ │ │ │ │
│ GitLab / │──│ Build │──│ Dev (auto) │
│ GitHub │ │ Unit Tests │ │ Staging (auto) │
│ │ │ Lint + Format │ │ Prod (manual) │
│ PR → Review │ │ SAST (SonarQube) │ │ │
│ → Merge │ │ Container Build │ │ Flux v2 GitOps: │
│ │ │ Trivy Image Scan │ │ Git Commit → │
│ │ │ Push to Harbor │ │ Auto-Reconcile │
└──────────────┘ └──────────────────┘ └───────────────────┘
Quality Gates:
• Unit Tests > 80% Coverage
• No Critical / High Security Findings (Trivy + SonarQube)
• Performance Benchmarks passed
• Integration Tests passed (Staging)
• Manual Approval für Production Deployment
• OPA Policy Check (Kubernetes Manifests)
Artifact Flow:
Source Code → GitLab CI / GitHub Actions → Harbor Registry
→ Flux v2 (GitOps) → RKE2 Cluster (Namespace per Environment)
7. Monitoring Stack
7.1 Übersicht
| Technologie | Einsatz | Begründung |
| Prometheus | Metriken-Sammlung (Cluster, Services, Infra) | CNCF-Standard, Pull-basiert, PromQL |
| Grafana | Dashboards, Visualisierung, Alerting UI | Multi-Source, umfangreiche Plugins |
| Loki | Log-Aggregation (Structured Logging) | Grafana-nativ, kostengünstig, LogQL |
| OpenTelemetry | Distributed Tracing, Metriken, Logs (Collector) | Vendor-neutral, CNCF-Standard |
| Jaeger | Trace-Visualisierung, Latenz-Analyse | Optimiert für Microservices |
| Alertmanager | Alert-Routing, Deduplizierung, Silencing | Prometheus-nativ, Teams-Integration |
| kube-state-metrics | Kubernetes Objekt-Metriken | Pod, Deployment, Node Status |
| node-exporter | Host-Level Metriken | CPU, Memory, Disk, Network |
7.2 Monitoring-Architektur
Monitoring Stack
┌─── Datensammlung ────────────────────────────────────────┐
│ │
│ Services ──► OpenTelemetry Collector ──┬─► Prometheus │
│ (Traces, (OTLP Receiver) │ (Metriken) │
│ Metriken, ├─► Loki │
│ Logs) │ (Logs) │
│ └─► Jaeger │
│ Kubernetes ──► kube-state-metrics ────────► Prometheus │
│ Nodes ───────► node-exporter ─────────────► Prometheus │
└───────────────────────────────────────────────────────────┘
┌─── Visualisierung & Alerting ────────────────────────────┐
│ │
│ Grafana Dashboards: │
│ • Cluster-Übersicht (CPU, Mem, Pods) │
│ • Service-Health (Latenz, Errors, Throughput) │
│ • AI Gateway (Tokens, Latenz, Kosten) │
│ • Datenbanken (Connections, Queries, Replication Lag) │
│ • Business KPIs (Tickets, SLA, MTTR) │
│ │
│ Alertmanager → Teams (Webhook) + E-Mail + PagerDuty │
└───────────────────────────────────────────────────────────┘
7.3 Wichtige Dashboards
| Dashboard | Inhalt | Zielgruppe |
| Cluster Overview | Nodes, Pods, CPU/Mem, Storage | Platform Team |
| Service Health | Latenz (p50/p95/p99), Error Rate, RPS | Entwickler |
| AI Gateway Metrics | Token-Verbrauch, Modell-Latenz, Kosten-Tracking | AI Team |
| Database Health | Connections, Query-Performance, Replication Lag | DBA |
| Business KPIs | Ticket-Volumen, MTTR, SLA-Einhaltung, AI-Resolution-Rate | Management |
| Security Events | Failed Logins, Anomalien, Policy Violations | Security Team |
8. Integration Stack
8.1 Externe Integrationen (Cloud-Services)
| System | Integration | Technologie |
| Microsoft 365 | Exchange, SharePoint, OneDrive | Microsoft Graph API (REST) |
| Microsoft Teams | Bot, Notifications, Adaptive Cards | Bot Framework SDK + Graph API |
| Anthropic Claude | LLM-Inferenz (Sonnet 4.5, Haiku 4.5) | Anthropic SDK / REST API (HTTPS) |
| Norm | Zentrale Workflow-Engine (On-Prem, 280+ Module, No-Code/Low-Code) | REST API + Webhooks + Custom Module |
| Power Automate | Workflow-Erweiterung, Microsoft-spezifische Anbindung | Connectors + HTTP Webhooks via Norm |
8.2 Interne Integrationen (On-Premises)
| System | Integration | Technologie |
| On-Prem Active Directory | Authentifizierung, User-Sync, Gruppen | LDAP / LDAPS (Port 636) |
| IAM/PAM Add-on | Berechtigungsmanagement, Privileged Access | Eigene REST API (On-Prem) |
| SIEM Add-on | Security Events, Incident Correlation | Eigene REST API + Elasticsearch |
| SMTP / Exchange | E-Mail-Eingang und -Ausgang | SMTP (On-Prem) + Graph API (Cloud) |
| Qualys / Nessus | Vulnerability Scan Daten | Scanner REST API |
| Kong / Traefik | API Gateway, Rate Limiting, TLS | Proxy-Konfiguration |
8.3 Integrationsarchitektur
Integration Layer
┌─── Internet / Cloud ─────────────────────────────────────┐
│ │
│ Microsoft 365 ◄──── Graph API (HTTPS) ───► Portal │
│ Anthropic API ◄──── REST/SDK (HTTPS) ────► AI Gateway │
│ │
│ Firewall: Nur ausgehende HTTPS-Verbindungen erlaubt │
│ Ziele: graph.microsoft.com, api.anthropic.com │
└───────────────────────────────────────────────────────────┘
┌─── On-Premises Netzwerk ─────────────────────────────────┐
│ │
│ Active Directory ◄── LDAPS ──────────► Identity Service │
│ IAM/PAM Add-on ◄── REST API ───────► Security Service │
│ SIEM Add-on ◄── REST API ───────► Analytics Service │
│ Mail Server ◄── SMTP ──────────► Notification Svc │
└───────────────────────────────────────────────────────────┘
┌─── Messaging (Asynchron) ────────────────────────────────┐
│ │
│ RabbitMQ / NATS │
│ ├── ticket.created → AI Triage, Notification │
│ ├── ticket.updated → SLA Check, Audit Log │
│ ├── security.alert → SIEM Add-on, Notification │
│ ├── ai.request → AI Gateway Processing │
│ ├── user.provisioned → IAM/PAM Sync │
│ └── audit.event → Compliance Engine │
└───────────────────────────────────────────────────────────┘
9. Technologie-Entscheidungsmatrix
| Entscheidung | Option A | Option B | Option C | Gewählt | Begründung |
| Backend-Sprache | Nest.js / TypeScript | Node.js / Express | — | Nest.js + Next.js | Nest.js für alle Backend-Services (Enterprise-Grade, DI, Modular), Next.js als BFF |
| Frontend | React + Next.js | Angular | Vue + Nuxt | React | Flexibler, größte Community, Server Components, Shadcn/UI-Ökosystem |
| Datenbank (relational) | PostgreSQL | MariaDB | — | PostgreSQL | JSONB-Support, Patroni HA, breites Ökosystem, kein Lizenzrisiko |
| Datenbank (document) | MongoDB | CouchDB | — | MongoDB | Reifes Ökosystem, ReplicaSet HA, flexible Queries |
| LLM-Anbieter | Anthropic Claude | OpenAI | Lokales LLM (Llama) | Anthropic Claude | Beste Reasoning-Qualität, Tool-Use, EU-Vertrag möglich |
| Search Engine (RAG) | Elasticsearch | OpenSearch | Milvus | Elasticsearch | Hybrid Search (BM25 + kNN), bestehende Expertise, SIEM-Doppelnutzung |
| AI Orchestration | LangChain.js | Semantic Kernel | Eigenbau | LangChain.js | Breite Community, Claude-Support, Chains/Agents, RAG-Tooling |
| Kubernetes | RKE2 (Rancher) | k3s | kubeadm | RKE2 | FIPS-konform, Rancher-UI, Enterprise-Support, CIS-gehärtet |
| Container Registry | Harbor | Nexus | Docker Registry | Harbor | Image Scanning (Trivy), RBAC, Replication, CNCF-Projekt |
| IaC | Terraform | Pulumi | — | Terraform | Größtes Ökosystem, vSphere Provider, State Management |
| GitOps | Flux v2 | ArgoCD | — | Flux v2 | Leichtgewichtig, CNCF-Projekt, gute Helm-Integration |
| Messaging | RabbitMQ | NATS | Kafka | RabbitMQ / NATS | RabbitMQ für klassische Queues, NATS für leichtgewichtigen Pub/Sub |
| API Gateway | Kong | Traefik | HAProxy | Kong / Traefik | Kong für Plugin-Ökosystem, Traefik für K8s-native Ingress |
| Secrets | HashiCorp Vault | Sealed Secrets | SOPS | Vault | Dynamische Secrets, PKI, Transit Encryption, Audit |
| Monitoring | Prometheus + Grafana | Zabbix | — | Prometheus + Grafana | Cloud-Native-Standard, PromQL, Loki-Integration |
| Tracing | Jaeger | Zipkin | Tempo | Jaeger | Reif, OpenTelemetry-nativ, gute UI |
| Workflow Engine | Norm | n8n | Temporal | Norm | 280+ Module, Docker/K8s-Deployment, No-Code UI, AI-native, Custom Module in TypeScript |
| WAF | ModSecurity (NGINX) | Coraza | — | ModSecurity | Bewährt, OWASP CRS Regelwerk, NGINX-Integration |
10. Security-Anforderungen an den Stack
10.1 Secrets Management
| Anforderung | Umsetzung |
| Zentrale Secrets-Verwaltung | HashiCorp Vault (HA-Cluster, 3 Nodes) |
| Dynamische Datenbank-Credentials | Vault Database Secrets Engine (PostgreSQL, MongoDB) |
| PKI / Zertifikate | Vault PKI Engine + cert-manager (K8s) |
| Kubernetes-Integration | Vault Agent Injector / CSI Provider |
| Transit Encryption | Vault Transit Engine für Application-Level Encryption |
| Audit Trail | Vault Audit Backend → Loki |
10.2 Container & Image Security
| Anforderung | Umsetzung |
| Image Scanning | Trivy (in CI Pipeline + Harbor Registry) |
| Base Image Policy | Nur geprüfte Base Images aus Harbor (Distroless / Alpine) |
| Image Signierung | Cosign / Notary (Harbor-integriert) |
| Runtime Security | Falco (optionales Runtime-Monitoring) |
| Registry RBAC | Harbor Projekt-basierte Zugriffskontrolle |
10.3 Code & Dependency Security
| Anforderung | Umsetzung |
| SAST | SonarQube (in CI Pipeline) |
| Dependency Scanning | Trivy FS Scan / Dependabot / Renovate |
| DAST | OWASP ZAP (gegen Staging-Umgebung) |
| License Compliance | Trivy License Scanning |
| Secret Detection | GitLeaks / TruffleHog (Pre-Commit + CI) |
10.4 Kubernetes & Netzwerk Security
| Anforderung | Umsetzung |
| Pod Security | OPA / Gatekeeper (Pod Security Policies) |
| Network Policies | Calico (Zero-Trust: Default Deny, explizite Allow-Rules) |
| Ingress / WAF | NGINX Ingress Controller + ModSecurity (OWASP CRS) |
| TLS Everywhere | cert-manager + Vault PKI (mTLS zwischen Services) |
| RBAC | Kubernetes RBAC (Namespace-isoliert, Least Privilege) |
| Audit Logging | Kubernetes Audit Logs → Loki |
10.5 Datenverschlüsselung
| Ebene | Maßnahme |
| Data at Rest | PostgreSQL: LUKS (Disk-Encryption), MongoDB: Encrypted Storage Engine |
| Data in Transit | TLS 1.3 überall, mTLS zwischen Services (Vault PKI) |
| Application-Level | Vault Transit Engine für sensible Felder (PII, Credentials) |
| Backups | Verschlüsselte Backups auf MinIO (Server-Side Encryption) |
| Elasticsearch | Encrypted Communication (TLS), Search Guard / X-Pack Security |
10.6 Security-Gesamtarchitektur
Security Layers
┌─── Perimeter ────────────────────────────────────────────┐
│ NGINX Ingress + ModSecurity (OWASP CRS) │
│ Kong / Traefik (Rate Limiting, Bot Detection) │
│ Firewall (nur HTTPS ausgehend für M365 + Claude API) │
└───────────────────────────────────────────────────────────┘
┌─── Identity & Access ────────────────────────────────────┐
│ On-Prem AD (LDAPS) → OIDC / OAuth 2.0 + JWT │
│ IAM/PAM Add-on (Berechtigungen, Privileged Access) │
│ Kubernetes RBAC (Namespace-isoliert) │
└───────────────────────────────────────────────────────────┘
┌─── Application ──────────────────────────────────────────┐
│ HashiCorp Vault (Secrets, PKI, Transit Encryption) │
│ OPA / Gatekeeper (Policy Enforcement) │
│ SonarQube + Trivy (SAST, Container Scanning) │
└───────────────────────────────────────────────────────────┘
┌─── Network ──────────────────────────────────────────────┐
│ Calico (NetworkPolicies, Default Deny) │
│ mTLS (Service-to-Service via Vault PKI) │
│ TLS 1.3 (alle externen Verbindungen) │
└───────────────────────────────────────────────────────────┘
┌─── Data ─────────────────────────────────────────────────┐
│ Encryption at Rest (LUKS, MongoDB WiredTiger) │
│ Vault Transit (PII, sensible Felder) │
│ Encrypted Backups (MinIO SSE) │
└───────────────────────────────────────────────────────────┘
┌─── Monitoring & Response ────────────────────────────────┐
│ SIEM Add-on (Elasticsearch-basiert, Correlation Rules) │
│ Falco (Runtime Anomaly Detection, optional) │
│ Alertmanager → Teams + PagerDuty │
└───────────────────────────────────────────────────────────┘