Grundsatz: Der gesamte Tech-Stack läuft On-Premises im eigenen Rechenzentrum auf VMware vSphere. Ausnahmen bilden ausschließlich Microsoft 365 (Teams, Exchange, SharePoint via Graph API) und die Anthropic Claude API als Cloud-LLM-Service. Alle weiteren Dienste werden über Open-Source-Komponenten abgebildet.
Norm – Workflow-Engine: Norm ist die verbindliche Standard-Workflow-Engine für alle Automatisierungen im HAFS Self-Help Service Portal. Sämtliche Workflow-, Runbook- und Automatisierungsprozesse (Modul M7, Auto-Resolve, Onboarding/Offboarding, Genehmigungsworkflows, Scheduled Tasks, Event-Driven Automation) werden ausschließlich als Norm Flows implementiert. Custom Module (TypeScript) erweitern die 280+ vorhandenen Integrations-Module für HAFS-spezifische Systeme. Keine Eigenentwicklung eines Workflow-Builders.

1. Stack-Übersicht

Complete Tech Stack (On-Premises)
  FRONTEND            BACKEND              AI/ML
  ────────            ───────              ─────
  React 19            Nest.js (Backend)    Anthropic Claude API
  Next.js 15          Next.js 15 (BFF)     Claude Sonnet 4.5
  TypeScript 5.x      TypeScript 5.x       Claude Haiku 4.5
  Tailwind CSS 4.x    Microservices        Elasticsearch (RAG)
  Shadcn/UI           gRPC + REST          LangChain.js
  Zustand             CQRS + NestJS Mod.       Anthropic SDK
                      RabbitMQ / NATS

  INFRA               DATA                 SECURITY
  ─────               ────                 ────────
  RKE2 (Rancher)      PostgreSQL (Patroni) HashiCorp Vault
  VMware vSphere      MongoDB (ReplicaSet) OPA / Gatekeeper
  Terraform           Redis (Sentinel)     Calico (NetworkPolicy)
  Flux v2 (GitOps)    MinIO (S3)           NGINX + ModSecurity
  Harbor              Elasticsearch        Trivy / SonarQube
  Helm / Kustomize    RabbitMQ / NATS      IAM/PAM Add-on

  MONITORING          INTEGRATION          COMMUNICATION
  ──────────          ───────────          ─────────────
  Prometheus          Microsoft Graph API  Microsoft Teams Bot
  Grafana             On-Prem AD (LDAP)    SMTP / Exchange Online
  Loki                Power Automate       WebSockets (Real-Time)
  OpenTelemetry       IAM/PAM Add-on API
  Jaeger              SIEM Add-on API
  Alertmanager        Kong / Traefik

  WORKFLOW ENGINE
  ───────────────
  Norm (Workflow Engine)
  280+ Integrations (Module)
  No-Code/Low-Code Flow Builder
  Webhooks, Scheduled Flows, AI Module

1.1 Migrationsübersicht Azure → On-Premises

Azure-DienstOn-Prem-ÄquivalentLizenz / Typ
Azure AKSRKE2 (Rancher-managed) auf VMware vSphereOpen Source (Apache 2.0)
Azure SQLPostgreSQL mit Patroni HAOpen Source (PostgreSQL License)
Cosmos DBMongoDB ReplicaSetSSPL / Community
Azure Redis CacheRedis mit SentinelOpen Source (BSD)
Azure Blob StorageMinIO (S3-kompatibel)Open Source (AGPLv3)
Azure AI SearchElasticsearch (Vektor + Volltext)Open Source (SSPL)
Azure OpenAI ServiceAnthropic Claude API (Cloud)Cloud-Service (Vertrag)
Azure APIMKong / Traefik (API Gateway)Open Source / Enterprise
Azure Key VaultHashiCorp VaultOpen Source (BSL)
Azure Monitor + Log AnalyticsPrometheus + Grafana + LokiOpen Source (Apache 2.0)
Application InsightsOpenTelemetry + JaegerOpen Source (Apache 2.0)
Microsoft SentinelSIEM Add-on (eigene App, Elasticsearch-basiert)Eigenentwicklung
Azure PIM / CyberArkIAM/PAM Add-on (eigene App)Eigenentwicklung
Azure Container RegistryHarborOpen Source (Apache 2.0)
Azure Front Door + WAFNGINX Ingress + ModSecurityOpen Source
Azure Service BusRabbitMQ / NATSOpen Source (MPL / Apache 2.0)
Azure Logic Apps / Power AutomateNorm (Workflow Engine)
ExpressRoute / VPNEntfällt (alles On-Prem, nur Internet für M365/Claude)

2. Frontend Stack

2.1 Technologie-Entscheidungen

TechnologieVersionBegründung
React19.xIndustriestandard, großes Ökosystem, Server Components
Next.js15.xSSR/SSG, API Routes, BFF-Pattern, Middleware
TypeScript5.xType-Safety, bessere Wartbarkeit
Tailwind CSS4.xUtility-First, schnelle Entwicklung, konsistentes Design
Shadcn/UILatestHochwertige, zugängliche Komponenten, anpassbar
ZustandLatestLeichtgewichtiges State Management
React Query (TanStack)LatestServer-State Management, Caching, Optimistic Updates
React Hook FormLatestPerformante Formulare mit geringem Re-Render
ZodLatestSchema-Validierung (Frontend + Backend shared)

2.2 Portal-Struktur

Next.js App Router Structure
  portal-frontend/
  ├── app/                    # Next.js App Router
  │   ├── (auth)/            # Authenticated Routes
  │   │   ├── dashboard/     # User Dashboard
  │   │   ├── tickets/       # Ticket System
  │   │   ├── services/      # Service Katalog
  │   │   ├── knowledge/     # Knowledge Base
  │   │   ├── security/      # Security Center (IAM/PAM)
  │   │   ├── governance/    # Governance & Compliance
  │   │   ├── analytics/     # Analytics & Reporting
  │   │   └── admin/         # Administration
  │   ├── api/               # BFF API Routes (Next.js)
  │   └── layout.tsx         # Root Layout
  ├── components/
  │   ├── ui/                # Shadcn/UI Basis-Komponenten
  │   ├── tickets/           # Ticket-spezifische Komponenten
  │   ├── chat/              # Chatbot-Komponenten
  │   ├── security/          # Security-Komponenten
  │   └── shared/            # Gemeinsame Komponenten
  ├── lib/                   # Utilities, API-Clients, Konfiguration
  ├── hooks/                 # Custom React Hooks
  └── styles/                # Globale Styles, Theme-Konfiguration

2.3 Microsoft Teams Bot

TechnologieZweck
Bot Framework SDKTeams Bot Grundgerüst
Adaptive CardsRich-Ticket-Darstellung in Teams
Teams ToolkitEntwicklung & Deployment
Message ExtensionsTicket-Suche direkt in Teams

3. Backend Stack

3.1 Nest.js Backend Services

TechnologieVersionBegründung
Nest.js11.xEnterprise-Grade Node.js Framework, Modular, DI, TypeScript-native
TypeScript5.xType-Safety, bessere Wartbarkeit, Shared Types mit Frontend
PrismaLatestORM für PostgreSQL, Type-Safe Queries, Migrations
CQRS + NestJS ModulesLatestCQRS-Pattern, Modular Architecture, Pipeline Behaviors
class-validatorLatestRequest-Validierung
nestjs-resilienceLatestResilience (Retry, Circuit Breaker, Timeout)
@nestjs/microservicesLatestMessage Bus Abstraktion (RabbitMQ / NATS)
BullMQLatestBackground-Jobs & Scheduling (Redis-basiert)
class-transformerLatestObject Mapping / Serialization
Pino / WinstonLatestStructured Logging (Sink: Loki / OpenTelemetry)

3.2 Ergänzende Services

TechnologieEinsatzBegründung
Next.js 15 (BFF)Backend-for-Frontend, API Routes, SSRNahtlose Frontend-Integration, Middleware, Server Components
PythonML-Pipelines, Data Processing (optional)ML-Ökosystem, Embedding-Verarbeitung

3.3 Microservice-Architektur

Service Architecture
  ┌─── Nest.js Backend Services ─────────────────────────────┐
  │                                                           │
  │  ticket-service          (Nest.js, CQRS, PostgreSQL)      │
  │  identity-service        (Nest.js, LDAP + OIDC)           │
  │  security-service        (Nest.js, IAM/PAM Add-on)        │
  │  governance-service      (Nest.js, Compliance-Engine)      │
  │  catalog-service         (Nest.js, Service-Katalog)        │
  │  automation-service      (Norm + Nest.js Adapter)            │
  │  notification-service    (Nest.js, WebSockets, SMTP)       │
  │  analytics-service       (Nest.js, Reporting)              │
  │  ai-gateway              (Nest.js, LangChain.js, Claude)   │
  │  chatbot-service         (Nest.js, Bot Framework)          │
  │  knowledge-service       (Nest.js, Elasticsearch Client)   │
  └───────────────────────────────────────────────────────────┘

  Kommunikation:
  • Synchron:  REST (Public API) + gRPC (Service-intern)
  • Asynchron: RabbitMQ / NATS (Events, Commands, Sagas)
  • Real-Time: WebSockets (Notifications, Chat-Streaming)
  • API GW:    Kong / Traefik (Rate Limiting, Auth, TLS)

3.4 API-Design

AspektEntscheidung
API StyleREST (extern) + gRPC (intern zwischen Services)
API VersioningURL-basiert (/api/v1/)
AuthOAuth 2.0 + JWT (OIDC via On-Prem AD / Keycloak)
DocumentationOpenAPI 3.1 / Swagger UI
API GatewayKong / Traefik (Rate Limiting, Auth, TLS Termination)
Rate LimitingKong Plugin / Traefik Middleware
CachingRedis (Sentinel) + ETags

4. AI/ML Stack

4.1 AI-Technologien

TechnologieEinsatzBegründung
Anthropic Claude APILLM-Basis (Cloud-Service)Hohe Qualität, erweiterte Reasoning-Fähigkeiten, EU-Vertrag
Claude Sonnet 4.5Komplexe Aufgaben: Ticket-Analyse, Knowledge-Generierung, Governance-BerichteBestes Preis-Leistungs-Verhältnis für komplexe Tasks
Claude Haiku 4.5Klassifikation, Routing, einfache Antworten, SentimentSchnell, kostengünstig, ideal für High-Volume
ElasticsearchVektor + Semantic + Volltext Search (RAG)On-Prem, Hybrid-Search, kNN-Vektoren
LangChain.jsAI Orchestration in Node.js ServicesGroße Community, Claude-Integration, Chains/Agents
Anthropic SDKDirekte Claude-API-AnbindungOffizielles SDK, Streaming, Tool-Use
Embedding-ModellDokumenten-Embedding für RAGOpen-Source (z.B. all-MiniLM-L6-v2) oder Anthropic Embeddings

4.2 Modell-Einsatzmatrix

Use CaseModellBegründung
Ticket-Klassifikation (Kategorie, Priorität)Claude Haiku 4.5Schnell, günstig, ausreichend für Klassifikation
Ticket-RoutingClaude Haiku 4.5Regelbasierte Zuordnung mit LLM-Unterstützung
Sentiment-AnalyseClaude Haiku 4.5Einfache Analyse, hoher Durchsatz
Chatbot-Antworten (einfach)Claude Haiku 4.5Schnelle Antwortzeiten für Standard-Fragen
Chatbot-Antworten (komplex)Claude Sonnet 4.5Tiefe Analyse, mehrstufiges Reasoning
Knowledge-Artikel generierenClaude Sonnet 4.5Qualitativ hochwertige Texterstellung
Ticket-ZusammenfassungenClaude Sonnet 4.5Präzise Zusammenfassung komplexer Vorgänge
Governance-BerichteClaude Sonnet 4.5Analytische Tiefe, Compliance-Verständnis
Auto-Resolution (bekannte Issues)Claude Haiku 4.5Pattern Matching + vordefinierte Lösungen

4.3 RAG Stack (Retrieval Augmented Generation)

RAG Pipeline
  Datenquellen           Verarbeitung          Speicher
  ────────────           ────────────          ────────

  Knowledge Base ──►  Chunking ──────────►  Elasticsearch
  Gelöste Tickets ──► (Recursive,           (kNN Vector Index)
  Confluence ──────►   Semantic)
  SharePoint ──────►                        MongoDB
                     Embedding ──────────►  (Session Cache,
                     (all-MiniLM-L6-v2       Chat-Historien)
                      oder Anthropic)

  Query Pipeline:

  User Query
    → Query Reformulation (Claude Haiku 4.5)
    → Hybrid Search (Elasticsearch: BM25 + kNN)
    → Reranking (Score Fusion / Cross-Encoder)
    → Context Assembly (Top-K Chunks + Metadaten)
    → LLM Generation (Claude Sonnet 4.5)
    → Guardrails (Hallucination Check, PII Filter)
    → Output

4.4 AI Gateway Architektur

AI Gateway (Node.js)
  ┌─── Eingang ───────────────────────────────────────────────┐
  │  Kong / Traefik → Auth Check → Rate Limit → AI Gateway    │
  └───────────────────────────────────────────────────────────┘

  ┌─── Verarbeitung ──────────────────────────────────────────┐
  │                                                            │
  │  1. Request Validation + Prompt Sanitization               │
  │  2. Modell-Routing (Haiku vs. Sonnet je nach Komplexität) │
  │  3. RAG Context Retrieval (Elasticsearch)                  │
  │  4. Prompt Assembly (System + Context + User)              │
  │  5. Anthropic API Call (Streaming)                         │
  │  6. Response Guardrails                                    │
  │  7. Audit Logging                                         │
  └───────────────────────────────────────────────────────────┘

  ┌─── Ausgang ───────────────────────────────────────────────┐
  │  Response → Redis Cache → Client (SSE / WebSocket)        │
  └───────────────────────────────────────────────────────────┘

5. Datenbank-Stack

5.1 Datenbankübersicht

DatenbankTypEinsatzHA-StrategieBegründung
PostgreSQL 16RelationalTickets, Config, Users, AuditPatroni + etcd (3-Node-Cluster)ACID, Transaktionen, Reporting, breites Ökosystem
MongoDB 7Document / NoSQLChat-Sessions, Flexible Daten, LogsReplicaSet (3 Nodes)Schema-Flexibilität, JSON-native
Redis 7In-MemorySession Cache, API Cache, Rate LimitingSentinel (3 Nodes)Sub-ms Latency, Pub/Sub
MinIOObject StorageAttachments, Dokumente, Exports, BackupsErasure Coding (4+ Nodes)S3-kompatibel, kostengünstig
Elasticsearch 8Search / VectorKnowledge Base, RAG, Volltext, SIEMCluster (3+ Nodes)Hybrid Search (BM25 + kNN), Aggregationen

5.2 Datenbankschema-Highlights (PostgreSQL)

Core Entities (PostgreSQL)
  Tickets
  ├── id                   UUID (PK)
  ├── ticket_number        VARCHAR (HAFS-YYYY-NNNNN)
  ├── title, description   TEXT
  ├── status               VARCHAR (enum)
  ├── priority             VARCHAR (enum)
  ├── type                 VARCHAR (enum)
  ├── category_l1/l2/l3    VARCHAR
  ├── channel              VARCHAR (Web, Teams, Email, API)
  ├── created_by           UUID (FK → users)
  ├── assigned_team        UUID (FK → teams)
  ├── assigned_agent       UUID (FK → users)
  ├── sla_response_deadline    TIMESTAMPTZ
  ├── sla_resolution_deadline  TIMESTAMPTZ
  ├── ai_confidence_score      DECIMAL
  ├── ai_suggested_category    VARCHAR
  ├── sentiment_score          DECIMAL
  ├── created_at, updated_at   TIMESTAMPTZ
  ├── resolved_at, closed_at   TIMESTAMPTZ
  └── Indexes: status, priority, assigned_agent, created_at

  AuditLogs (Append-Only, partitioniert nach Monat)
  ├── id                   UUID (PK)
  ├── timestamp            TIMESTAMPTZ
  ├── actor                VARCHAR
  ├── action               VARCHAR
  ├── target               VARCHAR
  ├── details              JSONB
  ├── source_ip            INET
  ├── device_id            VARCHAR
  ├── session_id           UUID
  └── compliance_tags      JSONB (Array)

  AccessRequests
  ├── id                   UUID (PK)
  ├── requested_by         UUID (FK → users)
  ├── target_resource      VARCHAR
  ├── permission_level     VARCHAR
  ├── risk_score           DECIMAL
  ├── approval_chain       JSONB
  ├── status               VARCHAR
  ├── expires_at           TIMESTAMPTZ
  ├── provisioned_at       TIMESTAMPTZ
  └── revoked_at           TIMESTAMPTZ

5.3 Backup-Strategie

DatenbankBackup-MethodeFrequenzAufbewahrung
PostgreSQLpgBackRest (Inkrementell + WAL-Archivierung)Täglich Full, stündlich Inkr.30 Tage lokal, 90 Tage MinIO
MongoDBmongodump + Oplog-ReplayTäglich Full, stündlich Inkr.30 Tage lokal, 90 Tage MinIO
RedisRDB Snapshots + AOFAlle 5 Minuten7 Tage
MinIOCross-Site Replication (optional)KontinuierlichUnbegrenzt (Lifecycle Policies)
ElasticsearchSnapshot API → MinIOTäglich30 Tage

6. Infrastructure Stack

6.1 Plattform & Virtualisierung

TechnologieEinsatz
VMware vSphere 8Hypervisor / Virtualisierungsplattform
RKE2 (Rancher Kubernetes Engine 2)FIPS-konforme Kubernetes-Distribution
RancherKubernetes-Management UI, Multi-Cluster
DockerContainer Runtime (via containerd in RKE2)

6.2 Container & Orchestrierung

TechnologieEinsatz
RKE2Kubernetes-Cluster (Control Plane + Worker Nodes)
Helm 3Kubernetes Package Manager
KustomizeEnvironment-spezifische K8s-Konfiguration
Flux v2GitOps Continuous Deployment
HarborPrivate Container Registry + Image Scanning
KEDAEvent-driven Auto-Scaling (RabbitMQ, NATS, HTTP)
CalicoContainer Network Interface (CNI), NetworkPolicies

6.3 Infrastructure as Code

TechnologieEinsatz
TerraformIaC für VMware vSphere VMs, Netzwerke, Storage
Terraform vSphere ProviderVM-Provisionierung, Templates, Datastores
Helm ChartsKubernetes Deployments (eigene + Community Charts)
KustomizeEnvironment Overlays (Dev, Staging, Prod)
Ansible (optional)OS-Konfiguration, Baseline-Härtung

6.4 Cluster-Layout

RKE2 Kubernetes Cluster (VMware vSphere)
  ┌─── Control Plane (3 VMs, HA) ────────────────────────────┐
  │  rke2-cp-01    rke2-cp-02    rke2-cp-03                   │
  │  (etcd + API Server + Controller + Scheduler)             │
  └───────────────────────────────────────────────────────────┘

  ┌─── Worker Nodes: Application (4+ VMs) ───────────────────┐
  │  worker-app-01 .. worker-app-04                           │
  │  → Nest.js Services, Next.js BFF, Frontend                │
  └───────────────────────────────────────────────────────────┘

  ┌─── Worker Nodes: Data (3+ VMs) ──────────────────────────┐
  │  worker-data-01 .. worker-data-03                         │
  │  → PostgreSQL, MongoDB, Redis, Elasticsearch              │
  └───────────────────────────────────────────────────────────┘

  ┌─── Worker Nodes: Infra (2+ VMs) ─────────────────────────┐
  │  worker-infra-01 .. worker-infra-02                       │
  │  → Monitoring, Logging, Harbor, Vault, Kong/Traefik       │
  └───────────────────────────────────────────────────────────┘

  Namespaces:
  ┌────────────┬────────────┬────────────┬────────────────────┐
  │ hafs-app   │ hafs-data  │ hafs-infra │ hafs-monitoring    │
  │ hafs-ai    │ hafs-security │ hafs-staging │ hafs-dev      │
  └────────────┴────────────┴────────────┴────────────────────┘

6.5 CI/CD Pipeline

CI/CD Pipeline Flow
  ┌── Source ────┐  ┌── Build + Test ───┐  ┌── Deploy ────────┐
  │              │  │                   │  │                   │
  │ GitLab /     │──│ Build             │──│ Dev (auto)        │
  │ GitHub       │  │ Unit Tests        │  │ Staging (auto)    │
  │              │  │ Lint + Format     │  │ Prod (manual)     │
  │ PR → Review  │  │ SAST (SonarQube) │  │                   │
  │ → Merge      │  │ Container Build   │  │ Flux v2 GitOps:  │
  │              │  │ Trivy Image Scan  │  │ Git Commit →      │
  │              │  │ Push to Harbor    │  │ Auto-Reconcile    │
  └──────────────┘  └──────────────────┘  └───────────────────┘

  Quality Gates:
  • Unit Tests > 80% Coverage
  • No Critical / High Security Findings (Trivy + SonarQube)
  • Performance Benchmarks passed
  • Integration Tests passed (Staging)
  • Manual Approval für Production Deployment
  • OPA Policy Check (Kubernetes Manifests)

  Artifact Flow:
  Source Code → GitLab CI / GitHub Actions → Harbor Registry
  → Flux v2 (GitOps) → RKE2 Cluster (Namespace per Environment)

7. Monitoring Stack

7.1 Übersicht

TechnologieEinsatzBegründung
PrometheusMetriken-Sammlung (Cluster, Services, Infra)CNCF-Standard, Pull-basiert, PromQL
GrafanaDashboards, Visualisierung, Alerting UIMulti-Source, umfangreiche Plugins
LokiLog-Aggregation (Structured Logging)Grafana-nativ, kostengünstig, LogQL
OpenTelemetryDistributed Tracing, Metriken, Logs (Collector)Vendor-neutral, CNCF-Standard
JaegerTrace-Visualisierung, Latenz-AnalyseOptimiert für Microservices
AlertmanagerAlert-Routing, Deduplizierung, SilencingPrometheus-nativ, Teams-Integration
kube-state-metricsKubernetes Objekt-MetrikenPod, Deployment, Node Status
node-exporterHost-Level MetrikenCPU, Memory, Disk, Network

7.2 Monitoring-Architektur

Monitoring Stack
  ┌─── Datensammlung ────────────────────────────────────────┐
  │                                                           │
  │  Services ──► OpenTelemetry Collector ──┬─► Prometheus    │
  │  (Traces,     (OTLP Receiver)           │   (Metriken)   │
  │   Metriken,                             ├─► Loki         │
  │   Logs)                                 │   (Logs)       │
  │                                         └─► Jaeger       │
  │  Kubernetes ──► kube-state-metrics ────────► Prometheus   │
  │  Nodes ───────► node-exporter ─────────────► Prometheus   │
  └───────────────────────────────────────────────────────────┘

  ┌─── Visualisierung & Alerting ────────────────────────────┐
  │                                                           │
  │  Grafana Dashboards:                                      │
  │  • Cluster-Übersicht (CPU, Mem, Pods)                    │
  │  • Service-Health (Latenz, Errors, Throughput)             │
  │  • AI Gateway (Tokens, Latenz, Kosten)                    │
  │  • Datenbanken (Connections, Queries, Replication Lag)     │
  │  • Business KPIs (Tickets, SLA, MTTR)                     │
  │                                                           │
  │  Alertmanager → Teams (Webhook) + E-Mail + PagerDuty      │
  └───────────────────────────────────────────────────────────┘

7.3 Wichtige Dashboards

DashboardInhaltZielgruppe
Cluster OverviewNodes, Pods, CPU/Mem, StoragePlatform Team
Service HealthLatenz (p50/p95/p99), Error Rate, RPSEntwickler
AI Gateway MetricsToken-Verbrauch, Modell-Latenz, Kosten-TrackingAI Team
Database HealthConnections, Query-Performance, Replication LagDBA
Business KPIsTicket-Volumen, MTTR, SLA-Einhaltung, AI-Resolution-RateManagement
Security EventsFailed Logins, Anomalien, Policy ViolationsSecurity Team

8. Integration Stack

8.1 Externe Integrationen (Cloud-Services)

SystemIntegrationTechnologie
Microsoft 365Exchange, SharePoint, OneDriveMicrosoft Graph API (REST)
Microsoft TeamsBot, Notifications, Adaptive CardsBot Framework SDK + Graph API
Anthropic ClaudeLLM-Inferenz (Sonnet 4.5, Haiku 4.5)Anthropic SDK / REST API (HTTPS)
NormZentrale Workflow-Engine (On-Prem, 280+ Module, No-Code/Low-Code)REST API + Webhooks + Custom Module
Power AutomateWorkflow-Erweiterung, Microsoft-spezifische AnbindungConnectors + HTTP Webhooks via Norm

8.2 Interne Integrationen (On-Premises)

SystemIntegrationTechnologie
On-Prem Active DirectoryAuthentifizierung, User-Sync, GruppenLDAP / LDAPS (Port 636)
IAM/PAM Add-onBerechtigungsmanagement, Privileged AccessEigene REST API (On-Prem)
SIEM Add-onSecurity Events, Incident CorrelationEigene REST API + Elasticsearch
SMTP / ExchangeE-Mail-Eingang und -AusgangSMTP (On-Prem) + Graph API (Cloud)
Qualys / NessusVulnerability Scan DatenScanner REST API
Kong / TraefikAPI Gateway, Rate Limiting, TLSProxy-Konfiguration

8.3 Integrationsarchitektur

Integration Layer
  ┌─── Internet / Cloud ─────────────────────────────────────┐
  │                                                           │
  │  Microsoft 365 ◄──── Graph API (HTTPS) ───► Portal        │
  │  Anthropic API ◄──── REST/SDK (HTTPS) ────► AI Gateway    │
  │                                                           │
  │  Firewall: Nur ausgehende HTTPS-Verbindungen erlaubt      │
  │  Ziele: graph.microsoft.com, api.anthropic.com            │
  └───────────────────────────────────────────────────────────┘

  ┌─── On-Premises Netzwerk ─────────────────────────────────┐
  │                                                           │
  │  Active Directory ◄── LDAPS ──────────► Identity Service  │
  │  IAM/PAM Add-on  ◄── REST API ───────► Security Service   │
  │  SIEM Add-on     ◄── REST API ───────► Analytics Service  │
  │  Mail Server      ◄── SMTP ──────────► Notification Svc   │
  └───────────────────────────────────────────────────────────┘

  ┌─── Messaging (Asynchron) ────────────────────────────────┐
  │                                                           │
  │  RabbitMQ / NATS                                          │
  │  ├── ticket.created     → AI Triage, Notification         │
  │  ├── ticket.updated     → SLA Check, Audit Log            │
  │  ├── security.alert     → SIEM Add-on, Notification       │
  │  ├── ai.request         → AI Gateway Processing           │
  │  ├── user.provisioned   → IAM/PAM Sync                    │
  │  └── audit.event        → Compliance Engine               │
  └───────────────────────────────────────────────────────────┘

9. Technologie-Entscheidungsmatrix

EntscheidungOption AOption BOption CGewähltBegründung
Backend-SpracheNest.js / TypeScriptNode.js / ExpressNest.js + Next.jsNest.js für alle Backend-Services (Enterprise-Grade, DI, Modular), Next.js als BFF
FrontendReact + Next.jsAngularVue + NuxtReactFlexibler, größte Community, Server Components, Shadcn/UI-Ökosystem
Datenbank (relational)PostgreSQLMariaDBPostgreSQLJSONB-Support, Patroni HA, breites Ökosystem, kein Lizenzrisiko
Datenbank (document)MongoDBCouchDBMongoDBReifes Ökosystem, ReplicaSet HA, flexible Queries
LLM-AnbieterAnthropic ClaudeOpenAILokales LLM (Llama)Anthropic ClaudeBeste Reasoning-Qualität, Tool-Use, EU-Vertrag möglich
Search Engine (RAG)ElasticsearchOpenSearchMilvusElasticsearchHybrid Search (BM25 + kNN), bestehende Expertise, SIEM-Doppelnutzung
AI OrchestrationLangChain.jsSemantic KernelEigenbauLangChain.jsBreite Community, Claude-Support, Chains/Agents, RAG-Tooling
KubernetesRKE2 (Rancher)k3skubeadmRKE2FIPS-konform, Rancher-UI, Enterprise-Support, CIS-gehärtet
Container RegistryHarborNexusDocker RegistryHarborImage Scanning (Trivy), RBAC, Replication, CNCF-Projekt
IaCTerraformPulumiTerraformGrößtes Ökosystem, vSphere Provider, State Management
GitOpsFlux v2ArgoCDFlux v2Leichtgewichtig, CNCF-Projekt, gute Helm-Integration
MessagingRabbitMQNATSKafkaRabbitMQ / NATSRabbitMQ für klassische Queues, NATS für leichtgewichtigen Pub/Sub
API GatewayKongTraefikHAProxyKong / TraefikKong für Plugin-Ökosystem, Traefik für K8s-native Ingress
SecretsHashiCorp VaultSealed SecretsSOPSVaultDynamische Secrets, PKI, Transit Encryption, Audit
MonitoringPrometheus + GrafanaZabbixPrometheus + GrafanaCloud-Native-Standard, PromQL, Loki-Integration
TracingJaegerZipkinTempoJaegerReif, OpenTelemetry-nativ, gute UI
Workflow EngineNormn8nTemporalNorm280+ Module, Docker/K8s-Deployment, No-Code UI, AI-native, Custom Module in TypeScript
WAFModSecurity (NGINX)CorazaModSecurityBewährt, OWASP CRS Regelwerk, NGINX-Integration

10. Security-Anforderungen an den Stack

10.1 Secrets Management

AnforderungUmsetzung
Zentrale Secrets-VerwaltungHashiCorp Vault (HA-Cluster, 3 Nodes)
Dynamische Datenbank-CredentialsVault Database Secrets Engine (PostgreSQL, MongoDB)
PKI / ZertifikateVault PKI Engine + cert-manager (K8s)
Kubernetes-IntegrationVault Agent Injector / CSI Provider
Transit EncryptionVault Transit Engine für Application-Level Encryption
Audit TrailVault Audit Backend → Loki

10.2 Container & Image Security

AnforderungUmsetzung
Image ScanningTrivy (in CI Pipeline + Harbor Registry)
Base Image PolicyNur geprüfte Base Images aus Harbor (Distroless / Alpine)
Image SignierungCosign / Notary (Harbor-integriert)
Runtime SecurityFalco (optionales Runtime-Monitoring)
Registry RBACHarbor Projekt-basierte Zugriffskontrolle

10.3 Code & Dependency Security

AnforderungUmsetzung
SASTSonarQube (in CI Pipeline)
Dependency ScanningTrivy FS Scan / Dependabot / Renovate
DASTOWASP ZAP (gegen Staging-Umgebung)
License ComplianceTrivy License Scanning
Secret DetectionGitLeaks / TruffleHog (Pre-Commit + CI)

10.4 Kubernetes & Netzwerk Security

AnforderungUmsetzung
Pod SecurityOPA / Gatekeeper (Pod Security Policies)
Network PoliciesCalico (Zero-Trust: Default Deny, explizite Allow-Rules)
Ingress / WAFNGINX Ingress Controller + ModSecurity (OWASP CRS)
TLS Everywherecert-manager + Vault PKI (mTLS zwischen Services)
RBACKubernetes RBAC (Namespace-isoliert, Least Privilege)
Audit LoggingKubernetes Audit Logs → Loki

10.5 Datenverschlüsselung

EbeneMaßnahme
Data at RestPostgreSQL: LUKS (Disk-Encryption), MongoDB: Encrypted Storage Engine
Data in TransitTLS 1.3 überall, mTLS zwischen Services (Vault PKI)
Application-LevelVault Transit Engine für sensible Felder (PII, Credentials)
BackupsVerschlüsselte Backups auf MinIO (Server-Side Encryption)
ElasticsearchEncrypted Communication (TLS), Search Guard / X-Pack Security

10.6 Security-Gesamtarchitektur

Security Layers
  ┌─── Perimeter ────────────────────────────────────────────┐
  │  NGINX Ingress + ModSecurity (OWASP CRS)                 │
  │  Kong / Traefik (Rate Limiting, Bot Detection)            │
  │  Firewall (nur HTTPS ausgehend für M365 + Claude API)    │
  └───────────────────────────────────────────────────────────┘

  ┌─── Identity & Access ────────────────────────────────────┐
  │  On-Prem AD (LDAPS) → OIDC / OAuth 2.0 + JWT             │
  │  IAM/PAM Add-on (Berechtigungen, Privileged Access)       │
  │  Kubernetes RBAC (Namespace-isoliert)                      │
  └───────────────────────────────────────────────────────────┘

  ┌─── Application ──────────────────────────────────────────┐
  │  HashiCorp Vault (Secrets, PKI, Transit Encryption)       │
  │  OPA / Gatekeeper (Policy Enforcement)                    │
  │  SonarQube + Trivy (SAST, Container Scanning)             │
  └───────────────────────────────────────────────────────────┘

  ┌─── Network ──────────────────────────────────────────────┐
  │  Calico (NetworkPolicies, Default Deny)                   │
  │  mTLS (Service-to-Service via Vault PKI)                  │
  │  TLS 1.3 (alle externen Verbindungen)                     │
  └───────────────────────────────────────────────────────────┘

  ┌─── Data ─────────────────────────────────────────────────┐
  │  Encryption at Rest (LUKS, MongoDB WiredTiger)            │
  │  Vault Transit (PII, sensible Felder)                     │
  │  Encrypted Backups (MinIO SSE)                            │
  └───────────────────────────────────────────────────────────┘

  ┌─── Monitoring & Response ────────────────────────────────┐
  │  SIEM Add-on (Elasticsearch-basiert, Correlation Rules)   │
  │  Falco (Runtime Anomaly Detection, optional)              │
  │  Alertmanager → Teams + PagerDuty                         │
  └───────────────────────────────────────────────────────────┘