Cross-Cutting Concerns

Architecture considerations that span multiple pillars of the Well-Architected Framework — topics that affect security, reliability, cost, operational excellence, and performance simultaneously.

Some architectural decisions do not fit neatly into a single pillar. They cut across every dimension of the Well-Architected Framework and must be addressed holistically. This page covers the most important cross-cutting concerns for organisations running OSO Kafka Backup in production.

Multi-Cloud & Hybrid Deployments

Many organisations operate Kafka across multiple clouds or hybrid on-prem/cloud environments. Your backup strategy must account for heterogeneous infrastructure, differing credential models, and the realities of cross-cloud networking.

Key Considerations

Storage portability — kafka-backup supports S3, Azure Blob, GCS, and local filesystem. Backups created in one cloud can be restored in another.
Credential management across clouds — Each cloud has its own identity model (IAM roles, managed identities, workload identity). Backup configs must handle these differences.
Network connectivity and latency — Cross-cloud restores introduce network hops and potential bandwidth constraints.
Data sovereignty and residency requirements — Regulations may restrict where backup data can be stored or transferred.
Cost of cross-cloud data transfer — Egress charges can be significant when replicating backups between providers.

Recommended Patterns

Back up locally, replicate cross-cloud for DR — Perform primary backups to the same cloud as the source cluster, then replicate to a secondary cloud for disaster recovery.
Use S3-compatible storage (MinIO) as a universal intermediate format — MinIO provides an S3-compatible API that runs on any cloud or on-prem, giving you a consistent storage interface.
Consistent config across clouds using GitOps — Store backup configurations in Git and deploy them identically across environments to reduce drift.

tip

Start with local backups to minimise latency and cost, then add cross-cloud replication as a second stage. This avoids paying egress fees on every backup cycle.

Configuration Example

Multi-cloud backup with primary S3 and secondary Azure Blob for DR:

# Primary backup — S3 in AWS (same region as source Kafka)
backup:
  name: production-primary
  source:
    bootstrap-servers: "${KAFKA_BOOTSTRAP_SERVERS}"
    security-protocol: SASL_SSL
    sasl-mechanism: SCRAM-SHA-512
    sasl-username: "${KAFKA_USERNAME}"
    sasl-password: "${KAFKA_PASSWORD}"
  storage:
    type: s3
    bucket: "${AWS_BACKUP_BUCKET}"
    region: "${AWS_REGION}"
    prefix: kafka-backup/production
  topics:
    include:
      - ".*"

---
# Secondary backup — Azure Blob for cross-cloud DR
backup:
  name: production-dr
  source:
    bootstrap-servers: "${KAFKA_BOOTSTRAP_SERVERS}"
    security-protocol: SASL_SSL
    sasl-mechanism: SCRAM-SHA-512
    sasl-username: "${KAFKA_USERNAME}"
    sasl-password: "${KAFKA_PASSWORD}"
  storage:
    type: azure-blob
    container: "${AZURE_BACKUP_CONTAINER}"
    account-name: "${AZURE_STORAGE_ACCOUNT}"
    account-key: "${AZURE_STORAGE_KEY}"
    prefix: kafka-backup/production
  topics:
    include:
      - ".*"

warning

Cross-cloud credential environment variables must be managed carefully. Use a secrets manager (Vault, AWS Secrets Manager, Azure Key Vault) rather than storing credentials directly in config files or CI/CD pipelines.

Kubernetes-Native Operations

kafka-backup provides a Kubernetes operator with Custom Resource Definitions (CRDs) for GitOps-native backup management. This allows you to declare backup and restore jobs as Kubernetes resources, managed alongside your application manifests.

Key Concepts

CRD	Purpose
KafkaBackup	Defines a backup job — source cluster, storage target, schedule, and topic filters
KafkaRestore	Defines a restore job — source backup, target cluster, and restore parameters
KafkaOffsetReset	Manages consumer offset recovery after a restore operation
KafkaOffsetRollback	Rolls back offset changes if a reset produces unexpected results

Best Practices

Store CRDs in Git alongside application manifests — Backup definitions should live in the same repository as the services that produce and consume the data.
Use ArgoCD or Flux for GitOps deployment — Automate CRD deployment through your existing GitOps pipeline.
Define resource requests and limits — Prevent backup pods from starving other workloads or being OOM-killed during large backups.
Use Kubernetes RBAC to control who can create restore CRDs — Restores are destructive operations; limit access to authorised personnel.
Monitor CRD status with kubectl and Prometheus — The operator exposes metrics and CRD status conditions for observability.

info

The Kubernetes operator watches for CRD changes and reconciles the desired state automatically. This means you can trigger a backup or restore simply by applying a manifest — no imperative commands required.

Configuration Examples

KafkaBackup CRD:

apiVersion: kafka.oso.dev/v1alpha1
kind: KafkaBackup
metadata:
  name: production-daily
  namespace: kafka-backup
spec:
  schedule: "0 2 * * *"
  source:
    bootstrapServers: kafka-cluster-kafka-bootstrap:9093
    securityProtocol: SASL_SSL
    saslMechanism: SCRAM-SHA-512
    credentialsSecret:
      name: kafka-backup-credentials
  storage:
    type: s3
    bucket: my-backup-bucket
    region: eu-west-1
    prefix: production/daily
  topics:
    include:
      - ".*"
    exclude:
      - "__.*"
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: "2"
      memory: 4Gi

KafkaRestore CRD:

apiVersion: kafka.oso.dev/v1alpha1
kind: KafkaRestore
metadata:
  name: restore-production-20260324
  namespace: kafka-backup
spec:
  backup:
    name: production-daily
    snapshot: "2026-03-24T02:00:00Z"
  target:
    bootstrapServers: kafka-cluster-kafka-bootstrap:9093
    securityProtocol: SASL_SSL
    saslMechanism: SCRAM-SHA-512
    credentialsSecret:
      name: kafka-restore-credentials
  topics:
    include:
      - "orders.*"
      - "payments.*"
  restoreOffsets: true
  resources:
    requests:
      cpu: "1"
      memory: 2Gi
    limits:
      cpu: "4"
      memory: 8Gi

tip

Use kubectl get kafkabackup and kubectl get kafkarestore to check the status of backup and restore operations. The operator sets status conditions such as Ready, Running, Completed, and Failed.

Schema Registry Integration

For clusters using a Schema Registry (Confluent, Apicurio), backup must include schemas alongside topic data. Without schemas, consumers cannot deserialise restored messages, and producers cannot validate new messages against the expected format.

Key Considerations

Schema IDs may not be preserved across clusters — Schema IDs are auto-incremented integers assigned by the registry. A restore to a different cluster will likely produce different IDs.
Schema evolution history should be backed up — Consumers may depend on older schema versions for backward compatibility.
Restore must handle schema ID remapping — Messages reference schema IDs in their headers. After restore, these IDs must map to the correct schemas in the target registry.

warning

Restoring topic data without its associated schemas will result in deserialisation failures for all Avro, Protobuf, or JSON Schema consumers. Always include schema backup in your DR plan.

Enterprise Feature

The Enterprise edition provides integrated Schema Registry backup and restore with:

Automatic ID remapping — Schema IDs in restored messages are updated to match the target registry.
Compatibility validation — Schemas are validated against the target registry's compatibility settings before restore.
Full evolution history — All schema versions and their metadata are preserved.

Recommended Patterns

Back up Schema Registry independently — Export schemas via the Schema Registry REST API as a supplementary backup.
Use Enterprise for integrated schema backup/restore — The Enterprise edition handles the complexity of ID remapping and compatibility checks automatically.
Test schema compatibility after restore — Verify that consumers can deserialise messages and producers can register new schemas.

See also: Schema Registry (Enterprise)

Kafka Streams & Stateful Applications

Kafka Streams applications maintain local state stores backed by changelog topics. Backup and restore of a Streams application requires special consideration to ensure the application can recover its state correctly.

Key Considerations

Changelog topics must be included in backup — These topics are the source of truth for Streams state stores. Without them, the application must reprocess all input data from scratch.
State store rebuild time after restore — Even with changelog topics restored, state stores must be rebuilt locally. Factor this time into your RTO calculations.
Repartition topics may need to be excluded — Repartition topics are intermediate topics generated by Streams. They can be regenerated from input data and do not need to be backed up.
Consumer offset recovery is critical for Streams apps — Streams applications use consumer offsets to track processing progress. Incorrect offsets can cause duplicate processing or data loss.

Recommended Patterns

Include all changelog topics in backup scope — Use topic name patterns to capture changelog topics (typically suffixed with -changelog).
Exclude repartition topics — Repartition topics (typically suffixed with -repartition) regenerate automatically and waste storage if backed up.
Test Streams app recovery as part of DR drills — Streams recovery is more complex than simple consumer recovery. Validate it regularly.
Use PITR to restore to a consistent state across all related topics — Point-in-time recovery ensures that input topics, changelog topics, and output topics are restored to the same logical point.

info

Kafka Streams state store rebuild time depends on the volume of data in the changelog topics. For large state stores, this can take minutes to hours. Plan accordingly and consider standby replicas to reduce recovery time.

Configuration Example

Topic filtering for Kafka Streams applications — include changelogs, exclude repartition topics:

backup:
  name: streams-app-backup
  source:
    bootstrap-servers: kafka-cluster:9092
  storage:
    type: s3
    bucket: my-backup-bucket
    prefix: streams-app
  topics:
    include:
      # Input topics
      - "orders\\..*"
      - "payments\\..*"
      # Output topics
      - "enriched-orders"
      - "order-summaries"
      # Changelog topics (state stores)
      - "streams-app-.*-changelog"
    exclude:
      # Repartition topics (will regenerate)
      - "streams-app-.*-repartition"
      # Internal Streams topics
      - "__consumer_offsets"
      - "__transaction_state"

tip

Use a naming convention for your Streams application ID (e.g., streams-app-*) so that changelog and repartition topics can be easily identified with wildcard patterns.

See also: Kafka Streams Example

Regulatory & Compliance Scenarios

Industries such as finance, healthcare, and retail have specific regulatory requirements for data backup, retention, and protection. kafka-backup can be configured to meet these requirements, with Enterprise features providing additional compliance capabilities.

Compliance Mapping

Regulation	Requirement	kafka-backup Feature
GDPR	Right to be forgotten, data minimisation	Data masking, field-level redaction (Enterprise)
SOX	Financial data retention (7 years)	Long-term retention with lifecycle policies
HIPAA	PHI protection, access logging	Encryption at rest, audit logging (Enterprise)
PCI DSS	Cardholder data protection	Field-level encryption, RBAC (Enterprise)
DORA	IT system resilience testing	DR testing framework, RTO/RPO tracking

Recommended Patterns

Define compliance requirements per topic — Not all topics carry regulated data. Tag topics with their compliance classification and apply appropriate backup policies.
Use Enterprise features for regulated industries — Field-level encryption, data masking, RBAC, and audit logging are essential for GDPR, HIPAA, and PCI DSS compliance.
Implement audit logging for all operations — Every backup, restore, and configuration change should be logged with the operator identity, timestamp, and outcome.
Conduct regular compliance audits — Periodically review backup configurations, retention policies, and access controls against regulatory requirements.
Maintain evidence of DR testing for auditors — Regulators such as those enforcing DORA require documented evidence that disaster recovery procedures have been tested.

warning

Regulatory non-compliance can result in significant fines and reputational damage. Treat compliance requirements as hard constraints, not aspirational goals. If in doubt, consult your compliance or legal team before finalising backup configurations.

tip

Use the Enterprise audit logging feature to generate compliance reports automatically. These reports can be exported in formats suitable for external auditors and regulators.

Multi-Cloud & Hybrid Deployments​

Key Considerations​

Recommended Patterns​

Configuration Example​

Kubernetes-Native Operations​

Key Concepts​

Best Practices​

Configuration Examples​

Schema Registry Integration​

Key Considerations​

Enterprise Feature​

Recommended Patterns​

Kafka Streams & Stateful Applications​

Key Considerations​

Recommended Patterns​

Configuration Example​

Regulatory & Compliance Scenarios​

Compliance Mapping​

Recommended Patterns​

Multi-Cloud & Hybrid Deployments

Key Considerations

Recommended Patterns

Configuration Example

Kubernetes-Native Operations

Key Concepts

Best Practices

Configuration Examples

Schema Registry Integration

Key Considerations

Enterprise Feature

Recommended Patterns

Kafka Streams & Stateful Applications

Key Considerations

Recommended Patterns

Configuration Example

Regulatory & Compliance Scenarios

Compliance Mapping

Recommended Patterns