Reference Architectures
Proven deployment patterns that combine the principles from every Well-Architected pillar into end-to-end, production-ready configurations you can adopt or adapt.
Each reference architecture below includes a complete topology, configuration, cost estimate, and known limitations so you can evaluate trade-offs before committing to a design. Pick the architecture closest to your constraints, then adjust RPO, RTO, and storage tiers to match your specific requirements.
Architecture Comparison
| Architecture | RPO | RTO | Complexity | Est. Monthly Cost | Best For |
|---|---|---|---|---|---|
| 1. Single-Region S3 | < 1 hr | < 4 hr | Low | ~$105 | Single-region workloads, dev/staging |
| 2. Cross-Region DR | < 15 min | < 1 hr | Medium | ~$175 | Multi-region availability, production DR |
| 3. Multi-Cloud Active-Passive | < 1 hr | < 2 hr | High | ~$265 | Cloud-provider failure protection |
| 4. Air-Gapped Compliance | < 24 hr | < 8 hr | High | ~$195 | Ransomware protection, regulatory compliance |
| 5. Kubernetes GitOps Pipeline | < 1 hr | < 2 hr | Medium | ~$105 | K8s-native teams, declarative operations |
Start with Architecture 1 to validate your backup strategy, then evolve toward cross-region or multi-cloud patterns as your availability requirements grow. Each architecture builds on the configuration patterns established in the simpler designs.
Architecture 1: Single-Region Backup to S3
Overview
The simplest production-ready pattern. A single kafka-backup deployment runs continuously inside the same region as your Kafka cluster, streaming data to an S3 bucket with versioning enabled. Prometheus scrapes the built-in metrics endpoint for alerting and dashboards.
When to Use
- Single-region Kafka deployment
- RPO < 1 hour is acceptable
- RTO < 4 hours is acceptable
- You want the lowest operational overhead and cost
- Cross-region protection is not yet a requirement
Architecture Diagram
┌─────────────────────────────────────────────────────────────────┐
│ AWS Region (us-east-1) │
│ │
│ ┌──────────────────────┐ ┌──────────────────────────┐ │
│ │ Kafka Cluster │ │ Kubernetes Cluster │ │
│ │ │ │ │ │
│ │ ┌─────┐ ┌─────┐ │ │ ┌──────────────────────┐ │ │
│ │ │ b-1 │ │ b-2 │ │ ───── │ │ kafka-backup │ │ │
│ │ └─────┘ └─────┘ │ │ │ (Deployment, 1 pod) │ │ │
│ │ ┌─────┐ │ │ └──────────┬───────────┘ │ │
│ │ │ b-3 │ │ │ │ │ │
│ │ └─────┘ │ │ ┌──────────┴───────────┐ │ │
│ └──────────────────────┘ │ │ Prometheus + Grafana │ │ │
│ │ └──────────────────────┘ │ │
│ └──────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ S3 Bucket │ │
│ │ (versioning on) │ │
│ │ kafka-backup/ │ │
│ └──────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Components
| Component | Purpose |
|---|---|
| Kafka cluster (3 brokers) | Source data |
| kafka-backup (K8s Deployment) | Continuous backup, 1 replica |
| S3 bucket (versioning enabled) | Backup storage, same region |
| Prometheus + Grafana | Metrics scraping, alerting, dashboards |
Configuration
backup.yaml
source:
bootstrap_servers:
- kafka-0.kafka-headless.kafka.svc.cluster.local:9092
- kafka-1.kafka-headless.kafka.svc.cluster.local:9092
- kafka-2.kafka-headless.kafka.svc.cluster.local:9092
topic:
include:
- ".*" # back up all topics
storage:
type: s3
s3:
bucket: my-org-kafka-backup
region: us-east-1
prefix: prod/
backup:
compression: zstd
segment_max_bytes: 134217728 # 128 MB
continuous: true
checkpoint_interval_secs: 60
metrics:
enabled: true
port: 9090
Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: kafka-backup
namespace: kafka-backup
labels:
app: kafka-backup
spec:
replicas: 1
selector:
matchLabels:
app: kafka-backup
template:
metadata:
labels:
app: kafka-backup
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
spec:
serviceAccountName: kafka-backup
containers:
- name: kafka-backup
image: osodevops/kafka-backup:latest
args: ["backup", "--config", "/etc/kafka-backup/backup.yaml"]
ports:
- name: metrics
containerPort: 9090
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: "2"
memory: 2Gi
volumeMounts:
- name: config
mountPath: /etc/kafka-backup
readOnly: true
volumes:
- name: config
configMap:
name: kafka-backup-config
IAM
Refer to SEC-01: Identity & Access Management for the least-privilege IAM policy. The backup role requires write-only access to S3 and read-only access to Kafka.
Cost Estimate
| Item | Monthly Cost |
|---|---|
| S3 storage (~1 TB/day, 30-day retention, zstd compression) | ~$70 |
| Compute (1 pod, 2 vCPU / 2 GB) | ~$30 |
| Monitoring (Prometheus + Grafana) | ~$5 |
| Total | ~$105 |
Limitations
- No cross-region protection — a regional outage affects both source and backup
- Single storage backend — no redundancy if S3 experiences an availability event
- Restore must occur from the same region to avoid data transfer costs
Architecture 2: Cross-Region Disaster Recovery
Overview
Extends Architecture 1 with S3 Cross-Region Replication (CRR) to maintain a replica of all backup data in a secondary region. A standby kafka-backup instance in the DR region can restore data to a pre-provisioned DR Kafka cluster, achieving a significantly lower RTO than a cold-start approach.
When to Use
- Multi-region availability is required
- RPO < 15 minutes is required
- RTO < 1 hour is required
- You need protection against a full regional outage
- Regulatory requirements mandate geographically separated copies
Architecture Diagram
┌──────────────────────────────┐ ┌──────────────────────────────┐
│ Primary Region │ │ DR Region │
│ (us-east-1) │ │ (us-west-2) │
│ │ │ │
│ ┌─────────┐ ┌────────────┐ │ │ ┌────────────┐ ┌─────────┐ │
│ │ Kafka │ │ kafka- │ │ │ │ kafka- │ │ DR │ │
│ │ Cluster │──│ backup │ │ │ │ backup │──│ Kafka │ │
│ │ (prod) │ │ (backup) │ │ │ │ (restore) │ │ Cluster │ │
│ └─────────┘ └─────┬──────┘ │ │ └─────┬──────┘ └─────────┘ │
│ │ │ │ │ │
│ ┌──────▼──────┐ │ S3 CRR │ ┌────▼───────┐ │
│ │ S3 Bucket │─┼────────►│ │ S3 Bucket │ │
│ │ (primary) │ │ │ │ (replica) │ │
│ └─────────────┘ │ │ └────────────┘ │
└──────────────────────────────┘ └──────────────────────────────┘
Components
| Component | Purpose |
|---|---|
| Primary Kafka cluster | Production source data |
| kafka-backup (primary region) | Continuous backup to S3 |
| S3 bucket (primary) | Primary backup storage with versioning |
| S3 Cross-Region Replication | Asynchronous replication to DR region |
| S3 bucket (DR region) | Replica backup storage |
| kafka-backup (DR region) | Standby restore instance |
| DR Kafka cluster | Pre-provisioned restore target |
Configuration
Primary Region — backup.yaml
source:
bootstrap_servers:
- kafka-0.kafka-headless.kafka.svc.cluster.local:9092
- kafka-1.kafka-headless.kafka.svc.cluster.local:9092
- kafka-2.kafka-headless.kafka.svc.cluster.local:9092
topic:
include:
- ".*"
storage:
type: s3
s3:
bucket: my-org-kafka-backup-primary
region: us-east-1
prefix: prod/
backup:
compression: zstd
segment_max_bytes: 134217728
continuous: true
checkpoint_interval_secs: 60
metrics:
enabled: true
port: 9090
S3 Cross-Region Replication
{
"Role": "arn:aws:iam::123456789012:role/s3-crr-role",
"Rules": [
{
"ID": "kafka-backup-crr",
"Status": "Enabled",
"Priority": 1,
"Filter": {
"Prefix": "prod/"
},
"Destination": {
"Bucket": "arn:aws:s3:::my-org-kafka-backup-dr",
"StorageClass": "STANDARD_IA"
},
"DeleteMarkerReplication": {
"Status": "Disabled"
}
}
]
}
DR Region — restore.yaml
source:
type: s3
s3:
bucket: my-org-kafka-backup-dr
region: us-west-2
prefix: prod/
target:
bootstrap_servers:
- kafka-0.kafka-headless.kafka.svc.cluster.local:9092
- kafka-1.kafka-headless.kafka.svc.cluster.local:9092
- kafka-2.kafka-headless.kafka.svc.cluster.local:9092
topic:
include:
- ".*"
restore:
from_latest: true
Cost Estimate
| Item | Monthly Cost |
|---|---|
| Primary region (Architecture 1) | ~$105 |
| S3 Cross-Region Replication (transfer + storage) | ~$50 |
| DR standby compute | ~$20 |
| Total | ~$175 |
Limitations
- S3 CRR replication lag (typically seconds to minutes) adds to effective RPO
- DR Kafka cluster incurs cost even when idle
- Manual or scripted failover — not automatic unless combined with health-check automation
- Cross-region data transfer costs increase with data volume
Architecture 3: Multi-Cloud Active-Passive DR
Overview
Protects against an entire cloud provider outage by maintaining backup data on a secondary cloud platform. The primary backup runs on AWS with S3 storage, while a cross-cloud sync process keeps an Azure Blob Storage copy up to date. A standby kafka-backup instance on Azure can restore to an Azure-hosted Kafka cluster.
When to Use
- Cloud provider failure protection is a business requirement
- RPO < 1 hour is acceptable
- RTO < 2 hours is acceptable
- Regulatory or contractual requirements mandate multi-cloud data residency
- Your organisation already operates infrastructure on multiple cloud providers
Architecture Diagram
┌─────────────────────────────┐ ┌──────────────────────────────┐
│ AWS (us-east-1) │ │ Azure (East US) │
│ │ │ │
│ ┌─────────┐ ┌────────────┐ │ │ ┌────────────┐ ┌───────────┐│
│ │ Kafka │ │ kafka- │ │ │ │ kafka- │ │ Azure ││
│ │ Cluster │─│ backup │ │ │ │ backup │─│ Kafka ││
│ │ (prod) │ │ (backup) │ │ │ │ (restore) │ │ Cluster ││
│ └─────────┘ └─────┬──────┘ │ │ └─────┬──────┘ └───────────┘│
│ │ │ │ │ │
│ ┌──────▼──────┐ │ rclone │ ┌────▼────────────┐ │
│ │ S3 Bucket │─┼─────────►│ │ Blob Storage │ │
│ │ │ │ sync │ │ Container │ │
│ └─────────────┘ │ │ └─────────────────┘ │
└─────────────────────────────┘ └──────────────────────────────┘
Components
| Component | Purpose |
|---|---|
| AWS Kafka cluster | Production source data |
| kafka-backup (AWS) | Continuous backup to S3 |
| S3 bucket | Primary backup storage |
| Cross-cloud sync (rclone) | Scheduled sync from S3 to Azure Blob |
| Azure Blob Storage | Secondary backup storage |
| kafka-backup (Azure) | Standby restore instance |
| Azure Kafka cluster | DR restore target |
Configuration
AWS — backup.yaml
source:
bootstrap_servers:
- kafka-0.kafka-headless.kafka.svc.cluster.local:9092
- kafka-1.kafka-headless.kafka.svc.cluster.local:9092
- kafka-2.kafka-headless.kafka.svc.cluster.local:9092
topic:
include:
- ".*"
storage:
type: s3
s3:
bucket: my-org-kafka-backup
region: us-east-1
prefix: prod/
backup:
compression: zstd
segment_max_bytes: 134217728
continuous: true
checkpoint_interval_secs: 60
metrics:
enabled: true
port: 9090
Azure — restore.yaml
source:
type: azure
azure:
storage_account: myorgkafkabackupdr
container: kafka-backup
prefix: prod/
target:
bootstrap_servers:
- kafka-0.kafka-headless.kafka.svc.cluster.local:9092
- kafka-1.kafka-headless.kafka.svc.cluster.local:9092
- kafka-2.kafka-headless.kafka.svc.cluster.local:9092
topic:
include:
- ".*"
restore:
from_latest: true
Cross-Cloud Sync Script
#!/usr/bin/env bash
# sync-to-azure.sh — runs on a schedule (e.g., every 15 minutes via cron or K8s CronJob)
set -euo pipefail
RCLONE_CONFIG="/etc/rclone/rclone.conf"
SOURCE="aws-s3:my-org-kafka-backup/prod/"
DEST="azure-blob:kafka-backup/prod/"
echo "[$(date -u)] Starting cross-cloud sync..."
rclone sync "$SOURCE" "$DEST" \
--config "$RCLONE_CONFIG" \
--transfers 16 \
--checkers 32 \
--fast-list \
--log-level INFO
echo "[$(date -u)] Sync complete."
Cost Estimate
| Item | Monthly Cost |
|---|---|
| Primary AWS (Architecture 1) | ~$105 |
| Azure Blob Storage | ~$80 |
| Cross-cloud sync (rclone compute + egress) | ~$50 |
| DR standby compute (Azure) | ~$30 |
| Total | ~$265 |
Limitations
- Cross-cloud sync introduces complexity and a potential failure point
- Network egress costs (AWS to Azure) scale linearly with data volume
- Separate credential management for each cloud provider
- Sync lag adds to effective RPO — monitor rclone metrics closely
- Requires expertise in both AWS and Azure infrastructure
Architecture 4: Air-Gapped Compliance Backup
Overview
Provides ransomware-proof, tamper-proof backup storage for regulated industries. Backup data is written to a primary S3 bucket, then transferred to a completely isolated AWS account with S3 Object Lock (WORM — Write Once, Read Many). The air-gapped account has no VPC peering or network connectivity to the production environment, ensuring that a compromised production account cannot modify or delete backup data.
When to Use
- Ransomware protection is a top priority
- RPO < 24 hours is acceptable
- RTO < 8 hours is acceptable
- Regulatory requirements mandate immutable, tamper-proof backups (financial services, healthcare, government)
- Compliance frameworks require geographically or logically separated backup copies
- You need to demonstrate chain-of-custody for audit purposes
Architecture Diagram
┌─────────────────────────────────┐ ┌──────────────────────────────────┐
│ Production Account │ │ Air-Gapped Account │
│ │ │ (no VPC peering, no network) │
│ ┌─────────┐ ┌───────────────┐ │ │ │
│ │ Kafka │ │ kafka-backup │ │ │ ┌──────────────────────────┐ │
│ │ Cluster │──│ (continuous) │ │ │ │ S3 Bucket │ │
│ └─────────┘ └──────┬────────┘ │ │ │ (Object Lock / WORM) │ │
│ │ │ │ │ (Glacier for archive) │ │
│ ┌───────▼───────┐ │ S3 │ └──────────────────────────┘ │
│ │ S3 Bucket │─┼─Batch──│ │
│ │ (primary) │ │ or │ ┌──────────────────────────┐ │
│ └───────────────┘ │ DataSync │ │ IAM: deny all deletes │ │
│ │ │ │ MFA-protected root only │ │
│ ┌──────────────────┐ │ │ └──────────────────────────┘ │
│ │ Prometheus + │ │ │ │
│ │ Grafana │ │ │ ┌─ ─────────────────────────┐ │
│ └──────────────────┘ │ │ │ CloudTrail audit logging │ │
│ │ │ └──────────────────────────┘ │
└─────────────────────────────────┘ └──────────────────────────────────┘
Components
| Component | Purpose |
|---|---|
| Kafka cluster | Production source data |
| kafka-backup (production account) | Continuous backup to primary S3 |
| S3 bucket (primary) | Initial backup storage |
| AWS S3 Batch / DataSync | Scheduled transfer to air-gapped account |
| S3 bucket (air-gapped, Object Lock) | Immutable WORM storage |
| Glacier transition | Long-term archive for cost optimisation |
| CloudTrail (air-gapped account) | Audit logging for compliance |
Configuration
Production Account — backup.yaml
source:
bootstrap_servers:
- kafka-0.kafka-headless.kafka.svc.cluster.local:9092
- kafka-1.kafka-headless.kafka.svc.cluster.local:9092
- kafka-2.kafka-headless.kafka.svc.cluster.local:9092
topic:
include:
- ".*"
storage:
type: s3
s3:
bucket: my-org-kafka-backup-prod
region: us-east-1
prefix: prod/
backup:
compression: zstd
segment_max_bytes: 134217728
continuous: true
checkpoint_interval_secs: 60
metrics:
enabled: true
port: 9090
S3 Object Lock Configuration (Air-Gapped Account)
{
"ObjectLockEnabled": "Enabled",
"Rule": {
"DefaultRetention": {
"Mode": "COMPLIANCE",
"Days": 365
}
}
}
COMPLIANCE mode prevents anyone — including the root user — from deleting or overwriting objects before the retention period expires. Use GOVERNANCE mode if you need the ability to override with special permissions during testing.
Air-Gapped Account IAM Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyAllDeleteOperations",
"Effect": "Deny",
"Principal": "*",
"Action": [
"s3:DeleteObject",
"s3:DeleteObjectVersion",
"s3:PutBucketPolicy",
"s3:DeleteBucketPolicy"
],
"Resource": [
"arn:aws:s3:::my-org-kafka-backup-airgap",
"arn:aws:s3:::my-org-kafka-backup-airgap/*"
],
"Condition": {
"StringNotEquals": {
"aws:PrincipalArn": "arn:aws:iam::111111111111:root"
}
}
},
{
"Sid": "AllowWriteFromProductionAccount",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::222222222222:role/kafka-backup-transfer-role"
},
"Action": [
"s3:PutObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-org-kafka-backup-airgap",
"arn:aws:s3:::my-org-kafka-backup-airgap/*"
]
}
]
}
Cost Estimate
| Item | Monthly Cost |
|---|---|
| Primary backup (Architecture 1) | ~$105 |
| Air-gapped S3 storage (Glacier + Object Lock) | ~$90 |
| Total | ~$195 |
Use S3 Intelligent-Tiering or lifecycle policies to transition older backups to Glacier Deep Archive after 90 days. This can reduce air-gapped storage costs by up to 70% for long-retention requirements.
Limitations
- Higher RTO due to the air gap — restoring requires transferring data back from the isolated account
- Transfer scheduling adds complexity (S3 Batch operations, DataSync jobs)
- MFA-protected root account access is required for emergency operations in the air-gapped account
- Object Lock retention cannot be shortened once set in COMPLIANCE mode
- Testing restores from the air-gapped account requires careful planning to avoid violating the air gap
Architecture 5: Kubernetes GitOps Backup Pipeline
Overview
A fully declarative, Kubernetes-native approach where backup and restore operations are managed through Custom Resource Definitions (CRDs) and reconciled by a GitOps controller such as ArgoCD or Flux. All configuration lives in a Git repository, providing version history, peer review, and automated rollout for every change.
When to Use
- Your team already operates a Kubernetes platform with GitOps tooling
- RPO < 1 hour is acceptable
- RTO < 2 hours is acceptable
- You want all backup configuration versioned, reviewed, and auditable in Git
- You need to manage backup across multiple environments (dev, staging, prod) consistently
Architecture Diagram
┌──────────────┐ ┌──────────────────────────────────────────────────────┐
│ Git Repo │ │ Kubernetes Cluster │
│ │ │ │
│ envs/ │ │ ┌──────────┐ ┌───────────────────────────────┐ │
│ └─ prod/ │────►│ │ ArgoCD │───►│ kafka-backup Operator │ │
│ ├─ app.yaml │ └──────────┘ │ │ │
│ ├─ backup.yaml │ │ ┌─────────────────────────┐ │ │
│ ├─ monitor.yaml │ │ │ KafkaBackup CR │ │ │
│ └─ restore.yaml │ │ │ (reconciles backup │ │ │
│ │ │ │ │ pods automatically) │ │ │
└──────────────┘ │ │ └───────────┬─────────────┘ │ │
│ └──────────────┼────────────────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ kafka-backup│ │
│ │ pods │ │
│ └──────┬──────┘ │
│ │ │
│ ┌──────────────────┐ ┌──────▼──────┐ │
│ │ Prometheus + │ │ S3 Bucket │ │
│ │ Grafana │ │ │ │
│ │ (ServiceMonitor) │ └─────────────┘ │
│ └──────────────────┘ │
└──────────────────────────────────────────────────────┘
Components
| Component | Purpose |
|---|---|
| Git repository | Single source of truth for all backup configuration |
| ArgoCD / Flux | GitOps controller, reconciles desired state |
| kafka-backup Operator | Watches KafkaBackup/KafkaRestore CRDs, manages pods |
| KafkaBackup CRD | Declarative backup specification |
| KafkaRestore CRD | Declarative restore specification |
| Prometheus ServiceMonitor | Auto-discovered metrics scraping |
| S3 bucket | Backup storage |
Configuration
Git Repository Structure
environments/
└── prod/
├── kustomization.yaml
├── argocd-application.yaml
├── kafka-backup-crd.yaml
├── kafka-restore-crd.yaml
└── service-monitor.yaml
ArgoCD Application
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: kafka-backup-prod
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/my-org/kafka-backup-config.git
targetRevision: main
path: environments/prod
destination:
server: https://kubernetes.default.svc
namespace: kafka-backup
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
KafkaBackup Custom Resource
apiVersion: kafka-backup.osodevops.io/v1alpha1
kind: KafkaBackup
metadata:
name: prod-backup
namespace: kafka-backup
spec:
source:
bootstrapServers:
- kafka-0.kafka-headless.kafka.svc.cluster.local:9092
- kafka-1.kafka-headless.kafka.svc.cluster.local:9092
- kafka-2.kafka-headless.kafka.svc.cluster.local:9092
topicSelector:
include:
- ".*"
storage:
type: s3
s3:
bucket: my-org-kafka-backup
region: us-east-1
prefix: prod/
backup:
compression: zstd
segmentMaxBytes: 134217728
continuous: true
checkpointIntervalSecs: 60
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: "2"
memory: 2Gi
KafkaRestore Custom Resource
apiVersion: kafka-backup.osodevops.io/v1alpha1
kind: KafkaRestore
metadata:
name: prod-restore
namespace: kafka-backup
spec:
source:
type: s3
s3:
bucket: my-org-kafka-backup
region: us-east-1
prefix: prod/
target:
bootstrapServers:
- kafka-0.kafka-headless.kafka.svc.cluster.local:9092
- kafka-1.kafka-headless.kafka.svc.cluster.local:9092
- kafka-2.kafka-headless.kafka.svc.cluster.local:9092
topicSelector:
include:
- ".*"
restore:
fromLatest: true
# Set to 'paused: true' until a restore is needed
paused: true
Prometheus ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kafka-backup
namespace: kafka-backup
labels:
release: prometheus
spec:
selector:
matchLabels:
app: kafka-backup
endpoints:
- port: metrics
interval: 30s
path: /metrics
With GitOps, every configuration change goes through a pull request. This gives you a full audit trail, peer review, and the ability to roll back any change by reverting a commit.
Cost Estimate
| Item | Monthly Cost |
|---|---|
| Base backup infrastructure (Architecture 1) | ~$105 |
| GitOps tooling (ArgoCD/Flux — typically already deployed) | ~$0 |
| Total | ~$105 |
Limitations
- Requires Kubernetes and GitOps expertise on the team
- Operator learning curve — custom resources add an abstraction layer
- CRD schema changes require careful upgrade planning
- ArgoCD/Flux must be operational for configuration changes to propagate (backup continues running if GitOps is temporarily down)
Choosing an Architecture
Use the comparison table at the top of this page as a starting point. Then consider these questions:
- What is your RPO/RTO budget? If < 15 min RPO is required, start with Architecture 2 (Cross-Region DR).
- Do you need multi-cloud protection? Architecture 3 is the only option that survives a full cloud provider outage.
- Are you in a regulated industry? Architecture 4 (Air-Gapped) provides the immutability guarantees auditors look for.
- Is your team already running GitOps? Architecture 5 adds minimal overhead and maximum auditability.
- Just getting started? Architecture 1 is the fastest path to a working, production-grade backup.
All architectures can be combined. For example, you can run Architecture 5 (GitOps) as your deployment model while using Architecture 2 (Cross-Region) as your storage topology and Architecture 4 (Air-Gapped) as an additional compliance layer.