Cross-Cutting Concerns
Architecture considerations that span multiple pillars of the Well-Architected Framework — topics that affect security, reliability, cost, operational excellence, and performance simultaneously.
Some architectural decisions do not fit neatly into a single pillar. They cut across every dimension of the Well-Architected Framework and must be addressed holistically. This page covers the most important cross-cutting concerns for organisations running OSO Kafka Backup in production.
Multi-Cloud & Hybrid Deployments
Many organisations operate Kafka across multiple clouds or hybrid on-prem/cloud environments. Your backup strategy must account for heterogeneous infrastructure, differing credential models, and the realities of cross-cloud networking.
Key Considerations
- Storage portability — kafka-backup supports S3, Azure Blob, GCS, and local filesystem. Backups created in one cloud can be restored in another.
- Credential management across clouds — Each cloud has its own identity model (IAM roles, managed identities, workload identity). Backup configs must handle these differences.
- Network connectivity and latency — Cross-cloud restores introduce network hops and potential bandwidth constraints.
- Data sovereignty and residency requirements — Regulations may restrict where backup data can be stored or transferred.
- Cost of cross-cloud data transfer — Egress charges can be significant when replicating backups between providers.
Recommended Patterns
- Back up locally, replicate cross-cloud for DR — Perform primary backups to the same cloud as the source cluster, then replicate to a secondary cloud for disaster recovery.
- Use S3-compatible storage (MinIO) as a universal intermediate format — MinIO provides an S3-compatible API that runs on any cloud or on-prem, giving you a consistent storage interface.
- Consistent config across clouds using GitOps — Store backup configurations in Git and deploy them identically across environments to reduce drift.
Start with local backups to minimise latency and cost, then add cross-cloud replication as a second stage. This avoids paying egress fees on every backup cycle.
Configuration Example
Multi-cloud backup with primary S3 and secondary Azure Blob for DR:
# Primary backup — S3 in AWS (same region as source Kafka)
backup:
name: production-primary
source:
bootstrap-servers: "${KAFKA_BOOTSTRAP_SERVERS}"
security-protocol: SASL_SSL
sasl-mechanism: SCRAM-SHA-512
sasl-username: "${KAFKA_USERNAME}"
sasl-password: "${KAFKA_PASSWORD}"
storage:
type: s3
bucket: "${AWS_BACKUP_BUCKET}"
region: "${AWS_REGION}"
prefix: kafka-backup/production
topics:
include:
- ".*"
---
# Secondary backup — Azure Blob for cross-cloud DR
backup:
name: production-dr
source:
bootstrap-servers: "${KAFKA_BOOTSTRAP_SERVERS}"
security-protocol: SASL_SSL
sasl-mechanism: SCRAM-SHA-512
sasl-username: "${KAFKA_USERNAME}"
sasl-password: "${KAFKA_PASSWORD}"
storage:
type: azure-blob
container: "${AZURE_BACKUP_CONTAINER}"
account-name: "${AZURE_STORAGE_ACCOUNT}"
account-key: "${AZURE_STORAGE_KEY}"
prefix: kafka-backup/production
topics:
include:
- ".*"
Cross-cloud credential environment variables must be managed carefully. Use a secrets manager (Vault, AWS Secrets Manager, Azure Key Vault) rather than storing credentials directly in config files or CI/CD pipelines.
Kubernetes-Native Operations
kafka-backup provides a Kubernetes operator with Custom Resource Definitions (CRDs) for GitOps-native backup management. This allows you to declare backup and restore jobs as Kubernetes resources, managed alongside your application manifests.
Key Concepts
| CRD | Purpose |
|---|---|
| KafkaBackup | Defines a backup job — source cluster, storage target, schedule, and topic filters |
| KafkaRestore | Defines a restore job — source backup, target cluster, and restore parameters |
| KafkaOffsetReset | Manages consumer offset recovery after a restore operation |
| KafkaOffsetRollback | Rolls back offset changes if a reset produces unexpected results |
Best Practices
- Store CRDs in Git alongside application manifests — Backup definitions should live in the same repository as the services that produce and consume the data.
- Use ArgoCD or Flux for GitOps deployment — Automate CRD deployment through your existing GitOps pipeline.
- Define resource requests and limits — Prevent backup pods from starving other workloads or being OOM-killed during large backups.
- Use Kubernetes RBAC to control who can create restore CRDs — Restores are destructive operations; limit access to authorised personnel.
- Monitor CRD status with kubectl and Prometheus — The operator exposes metrics and CRD status conditions for observability.
The Kubernetes operator watches for CRD changes and reconciles the desired state automatically. This means you can trigger a backup or restore simply by applying a manifest — no imperative commands required.
Configuration Examples
KafkaBackup CRD:
apiVersion: kafka.oso.dev/v1alpha1
kind: KafkaBackup
metadata:
name: production-daily
namespace: kafka-backup
spec:
schedule: "0 2 * * *"
source:
bootstrapServers: kafka-cluster-kafka-bootstrap:9093
securityProtocol: SASL_SSL
saslMechanism: SCRAM-SHA-512
credentialsSecret:
name: kafka-backup-credentials
storage:
type: s3
bucket: my-backup-bucket
region: eu-west-1
prefix: production/daily
topics:
include:
- ".*"
exclude:
- "__.*"
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2"
memory: 4Gi
KafkaRestore CRD:
apiVersion: kafka.oso.dev/v1alpha1
kind: KafkaRestore
metadata:
name: restore-production-20260324
namespace: kafka-backup
spec:
backup:
name: production-daily
snapshot: "2026-03-24T02:00:00Z"
target:
bootstrapServers: kafka-cluster-kafka-bootstrap:9093
securityProtocol: SASL_SSL
saslMechanism: SCRAM-SHA-512
credentialsSecret:
name: kafka-restore-credentials
topics:
include:
- "orders.*"
- "payments.*"
restoreOffsets: true
resources:
requests:
cpu: "1"
memory: 2Gi
limits:
cpu: "4"
memory: 8Gi
Use kubectl get kafkabackup and kubectl get kafkarestore to check the status of backup and restore operations. The operator sets status conditions such as Ready, Running, Completed, and Failed.
See also: Operator Overview, KafkaBackup CRD, KafkaRestore CRD, GitOps Guide
Schema Registry Integration
For clusters using a Schema Registry (Confluent, Apicurio), backup must include schemas alongside topic data. Without schemas, consumers cannot deserialise restored messages, and producers cannot validate new messages against the expected format.
Key Considerations
- Schema IDs may not be preserved across clusters — Schema IDs are auto-incremented integers assigned by the registry. A restore to a different cluster will likely produce different IDs.
- Schema evolution history should be backed up — Consumers may depend on older schema versions for backward compatibility.
- Restore must handle schema ID remapping — Messages reference schema IDs in their headers. After restore, these IDs must map to the correct schemas in the target registry.
Restoring topic data without its associated schemas will result in deserialisation failures for all Avro, Protobuf, or JSON Schema consumers. Always include schema backup in your DR plan.
Enterprise Feature
The Enterprise edition provides integrated Schema Registry backup and restore with:
- Automatic ID remapping — Schema IDs in restored messages are updated to match the target registry.
- Compatibility validation — Schemas are validated against the target registry's compatibility settings before restore.
- Full evolution history — All schema versions and their metadata are preserved.
Recommended Patterns
- Back up Schema Registry independently — Export schemas via the Schema Registry REST API as a supplementary backup.
- Use Enterprise for integrated schema backup/restore — The Enterprise edition handles the complexity of ID remapping and compatibility checks automatically.
- Test schema compatibility after restore — Verify that consumers can deserialise messages and producers can register new schemas.
See also: Schema Registry (Enterprise)
Kafka Streams & Stateful Applications
Kafka Streams applications maintain local state stores backed by changelog topics. Backup and restore of a Streams application requires special consideration to ensure the application can recover its state correctly.
Key Considerations
- Changelog topics must be included in backup — These topics are the source of truth for Streams state stores. Without them, the application must reprocess all input data from scratch.
- State store rebuild time after restore — Even with changelog topics restored, state stores must be rebuilt locally. Factor this time into your RTO calculations.
- Repartition topics may need to be excluded — Repartition topics are intermediate topics generated by Streams. They can be regenerated from input data and do not need to be backed up.
- Consumer offset recovery is critical for Streams apps — Streams applications use consumer offsets to track processing progress. Incorrect offsets can cause duplicate processing or data loss.
Recommended Patterns
- Include all changelog topics in backup scope — Use topic name patterns to capture changelog topics (typically suffixed with
-changelog). - Exclude repartition topics — Repartition topics (typically suffixed with
-repartition) regenerate automatically and waste storage if backed up. - Test Streams app recovery as part of DR drills — Streams recovery is more complex than simple consumer recovery. Validate it regularly.
- Use PITR to restore to a consistent state across all related topics — Point-in-time recovery ensures that input topics, changelog topics, and output topics are restored to the same logical point.
Kafka Streams state store rebuild time depends on the volume of data in the changelog topics. For large state stores, this can take minutes to hours. Plan accordingly and consider standby replicas to reduce recovery time.
Configuration Example
Topic filtering for Kafka Streams applications — include changelogs, exclude repartition topics:
backup:
name: streams-app-backup
source:
bootstrap-servers: kafka-cluster:9092
storage:
type: s3
bucket: my-backup-bucket
prefix: streams-app
topics:
include:
# Input topics
- "orders\\..*"
- "payments\\..*"
# Output topics
- "enriched-orders"
- "order-summaries"
# Changelog topics (state stores)
- "streams-app-.*-changelog"
exclude:
# Repartition topics (will regenerate)
- "streams-app-.*-repartition"
# Internal Streams topics
- "__consumer_offsets"
- "__transaction_state"
Use a naming convention for your Streams application ID (e.g., streams-app-*) so that changelog and repartition topics can be easily identified with wildcard patterns.
See also: Kafka Streams Example
Regulatory & Compliance Scenarios
Industries such as finance, healthcare, and retail have specific regulatory requirements for data backup, retention, and protection. kafka-backup can be configured to meet these requirements, with Enterprise features providing additional compliance capabilities.
Compliance Mapping
| Regulation | Requirement | kafka-backup Feature |
|---|---|---|
| GDPR | Right to be forgotten, data minimisation | Data masking, field-level redaction (Enterprise) |
| SOX | Financial data retention (7 years) | Long-term retention with lifecycle policies |
| HIPAA | PHI protection, access logging | Encryption at rest, audit logging (Enterprise) |
| PCI DSS | Cardholder data protection | Field-level encryption, RBAC (Enterprise) |
| DORA | IT system resilience testing | DR testing framework, RTO/RPO tracking |
Recommended Patterns
- Define compliance requirements per topic — Not all topics carry regulated data. Tag topics with their compliance classification and apply appropriate backup policies.
- Use Enterprise features for regulated industries — Field-level encryption, data masking, RBAC, and audit logging are essential for GDPR, HIPAA, and PCI DSS compliance.
- Implement audit logging for all operations — Every backup, restore, and configuration change should be logged with the operator identity, timestamp, and outcome.
- Conduct regular compliance audits — Periodically review backup configurations, retention policies, and access controls against regulatory requirements.
- Maintain evidence of DR testing for auditors — Regulators such as those enforcing DORA require documented evidence that disaster recovery procedures have been tested.
Regulatory non-compliance can result in significant fines and reputational damage. Treat compliance requirements as hard constraints, not aspirational goals. If in doubt, consult your compliance or legal team before finalising backup configurations.
Use the Enterprise audit logging feature to generate compliance reports automatically. These reports can be exported in formats suitable for external auditors and regulators.
See also: Audit Logging (Enterprise), Encryption (Enterprise), RBAC (Enterprise), Compliance Audit Use Case