KafkaBackup CRD
The KafkaBackup custom resource defines a backup run or recurring backup schedule for Kafka topics.
Full Specification
apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: production-backup
namespace: kafka-backup
spec:
kafkaCluster:
bootstrapServers:
- kafka-0.kafka.svc:9092
- kafka-1.kafka.svc:9092
securityProtocol: SASL_SSL
caSecret: # optional: separate CA secret (e.g. Strimzi)
name: cluster-ca-cert
caKey: ca.crt
tlsSecret:
name: kafka-tls
caKey: ca.crt
certKey: tls.crt
keyKey: tls.key
saslSecret:
name: kafka-credentials
mechanism: SCRAM-SHA-512
usernameKey: username
passwordKey: password
connection:
tcpKeepalive: true
keepaliveTimeSecs: 60
keepaliveIntervalSecs: 20
tcpNodelay: true
connectionsPerBroker: 4
topics:
- orders
- payments
- "events-*"
storage:
storageType: s3
s3:
bucket: kafka-backups
region: us-west-2
prefix: production/hourly
endpoint: https://s3.us-west-2.amazonaws.com
pathStyle: false
allowHttp: false
credentialsSecret:
name: s3-credentials
accessKeyIdKey: AWS_ACCESS_KEY_ID
secretAccessKeyKey: AWS_SECRET_ACCESS_KEY
schedule: "0 0 * * * * *" # Every hour, cron format includes seconds
suspend: false
compression: zstd
compressionLevel: 3
segmentMaxBytes: 134217728
segmentMaxIntervalMs: 60000
# Backup mode
continuous: false
stopAtCurrentOffsets: true
pollIntervalMs: 100
includeOffsetHeaders: true
sourceClusterId: production-us-west-2
consumerGroupSnapshot: true
checkpoint:
enabled: true
intervalSecs: 30
retention:
enabled: false
maxAgeDays: 30
keepLast: 3
dryRun: true
rateLimiting:
recordsPerSec: 0
bytesPerSec: 0
maxConcurrentPartitions: 4
circuitBreaker:
enabled: true
failureThreshold: 5
resetTimeoutSecs: 60
successThreshold: 3
operationTimeoutMs: 30000
metrics:
enabled: true
port: 9090
bindAddress: "0.0.0.0"
path: /metrics
updateIntervalMs: 500
maxPartitionLabels: 100
template:
pod:
hostAliases:
- ip: "10.10.0.5"
hostnames:
- s3.internal
- minio.internal
Backup Mode Changes in v1.0.0
checkpoint.enabled no longer makes a backup run continuously. Use these fields explicitly:
| Field | Default | Description |
|---|---|---|
continuous | false | Keep polling and writing new records after the initial pass |
stopAtCurrentOffsets | false | Snapshot mode: capture starting high watermarks and exit after all partitions catch up |
consumerGroupSnapshot | false | Write consumer-groups-snapshot.json after each backup cycle |
For scheduled point-in-time backups, set stopAtCurrentOffsets: true. For streaming backups, set continuous: true. Do not set continuous and stopAtCurrentOffsets together.
Backup Data Retention
Operator-managed KafkaBackup retention is disabled by default. When spec.retention is absent, or when spec.retention.enabled: false, the operator does not delete backup data.
Enable retention per KafkaBackup resource when you want the operator to prune complete backup sets after a successful backup run:
spec:
retention:
enabled: true
maxAgeDays: 30
keepLast: 3
dryRun: false
Retention deletes whole backup IDs, not individual segments, manifests, or offset files from a still-retained backup set. This avoids creating partially pruned manifests that would break point-in-time restore unexpectedly.
When retention is enabled:
- Set at least one of
maxAgeDaysorkeepLast. maxAgeDaysandkeepLastmust be greater than0when set.- The current backup ID is always retained.
- If only
maxAgeDaysis set, the operator still keeps at least the newest backup set. dryRun: truereports what would be deleted without deleting data.- Retention failures are reported in status but do not turn a successful backup into a failed backup.
Operator-managed retention is supported for PVC/local storage, S3/S3-compatible storage, and Azure Blob Storage. GCS retention is not currently wired through the operator; use a GCS bucket lifecycle policy for that backend.
Storage lifecycle policies are still a good option when retention should be enforced outside Kubernetes, when you need backend-native legal hold or object-lock controls, or when you use GCS. Keep all retention windows aligned with restore requirements because deleting old backup sets makes older point-in-time restore windows unavailable.
KafkaBackupValidation.spec.evidence.retentionDays controls validation evidence retention only. It does not control KafkaBackup data retention.
Spec Fields
kafkaCluster
| Field | Type | Required | Description |
|---|---|---|---|
bootstrapServers | []string | Yes | Kafka broker addresses |
securityProtocol | string | No | PLAINTEXT, SSL, SASL_PLAINTEXT, or SASL_SSL |
tlsSecret | object | No | TLS certificate secret reference |
caSecret | object | No | Separate CA certificate secret (e.g. Strimzi cluster CA). Overrides caKey in tlsSecret when both are set |
saslSecret | object | No | SASL credentials secret reference |
connection | object | No | Kafka TCP connection tuning |
kafkaCluster.connection
| Field | Type | Default | Description |
|---|---|---|---|
tcpKeepalive | bool | true | Enable TCP keepalive |
keepaliveTimeSecs | int | 60 | Seconds before the first keepalive probe |
keepaliveIntervalSecs | int | 20 | Seconds between keepalive probes |
tcpNodelay | bool | true | Enable TCP_NODELAY |
connectionsPerBroker | int | 4 | TCP connections to maintain per broker |
storage
| Field | Type | Required | Description |
|---|---|---|---|
storageType | string | No | pvc, s3, azure, or gcs; defaults to pvc |
pvc | object | When storageType: pvc | PVC storage configuration |
s3 | object | When storageType: s3 | S3 or S3-compatible storage configuration |
azure | object | When storageType: azure | Azure Blob Storage configuration |
gcs | object | When storageType: gcs | Google Cloud Storage configuration |
s3
| Field | Type | Required | Description |
|---|---|---|---|
bucket | string | Yes | Bucket name |
region | string | Yes | AWS region |
prefix | string | No | Object key prefix |
endpoint | string | No | Custom S3-compatible endpoint |
pathStyle | bool | No | Force path-style addressing for MinIO, Ceph, and similar endpoints |
allowHttp | bool | No | Allow HTTP endpoint traffic; the operator logs a warning when enabled |
credentialsSecret | object | Yes | Secret containing access key credentials |
azure
| Field | Type | Required | Description |
|---|---|---|---|
accountName | string | Yes | Storage account name |
container | string | Yes | Blob container name |
prefix | string | No | Blob prefix |
endpoint | string | No | Custom endpoint for sovereign cloud or private endpoint use |
useWorkloadIdentity | bool | No | Use AKS Workload Identity |
credentialsSecret | object | No | Account key secret |
sasTokenSecret | object | No | SAS token secret |
servicePrincipalSecret | object | No | Service principal secret |
Azure authentication methods are mutually exclusive. If none are set, the adapter falls back to Azure default credentials when the runtime environment supports them.
Backup Tuning
| Field | Type | Default | Description |
|---|---|---|---|
compression | string | zstd | none, lz4, or zstd |
compressionLevel | int | 3 | Compression level |
segmentMaxBytes | int | 134217728 | Rotate backup segments after this many bytes |
segmentMaxIntervalMs | int | 60000 | Rotate backup segments after this many milliseconds |
pollIntervalMs | int | 100 | Poll interval for continuous mode |
includeOffsetHeaders | bool | true | Add original offset headers for restore and offset mapping |
sourceClusterId | string | unset | Source cluster ID recorded in manifests and offset headers |
schedule | string | unset | Cron schedule. The operator uses the Rust cron syntax with seconds, for example 0 0 * * * * * |
suspend | bool | false | Pause scheduled backups |
checkpoint
| Field | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable resumable checkpoints |
intervalSecs | int | 30 | Checkpoint interval |
storage | object | unset | Optional PVC checkpoint storage override |
retention
| Field | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | Enable operator-managed retention pruning |
maxAgeDays | int | unset | Delete complete backup sets older than this many days |
keepLast | int | unset | Keep at least this many newest backup sets |
dryRun | bool | false | Report eligible backup sets without deleting data |
retention.enabled: true requires at least one of maxAgeDays or keepLast. Retention runs after a backup completes successfully. For long-running continuous: true backups, retention does not prune while the backup engine is still running.
template.pod
| Field | Type | Required | Description |
|---|---|---|---|
hostAliases | array | No | Pod-level Kubernetes hostAliases entries injected into backup Job and CronJob pods. Use this for split-DNS or private endpoint storage access |
Each hostAliases item uses the Kubernetes HostAlias shape:
| Field | Type | Required | Description |
|---|---|---|---|
ip | string | Yes | IP address to write into the pod /etc/hosts file |
hostnames | []string | No | Hostnames mapped to ip |
Prefer DNS or Kubernetes Services when possible. Use hostAliases as a narrow pod-local override for cases such as VPN-only S3-compatible endpoints or split-DNS storage names.
Examples
Scheduled Snapshot Backup
apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: hourly-snapshot
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- orders
storage:
storageType: pvc
pvc:
claimName: kafka-backups
schedule: "0 0 * * * * *"
stopAtCurrentOffsets: true
compression: zstd
Scheduled Backup with Retention Dry Run
apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: hourly-snapshot
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- orders
storage:
storageType: s3
s3:
bucket: kafka-backups
region: us-west-2
prefix: hourly
credentialsSecret:
name: s3-credentials
schedule: "0 0 * * * * *"
stopAtCurrentOffsets: true
retention:
enabled: true
maxAgeDays: 30
keepLast: 3
dryRun: true
Continuous Backup with Consumer Group Snapshot
apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: continuous-backup
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- "*"
storage:
storageType: azure
azure:
accountName: kafkabackups123456
container: kafka-backups
prefix: production
useWorkloadIdentity: true
continuous: true
includeOffsetHeaders: true
sourceClusterId: production
consumerGroupSnapshot: true
S3-Compatible Endpoint
apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: minio-backup
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- orders
storage:
storageType: s3
s3:
bucket: kafka-backups
region: us-east-1
endpoint: http://minio.storage.svc.cluster.local:9000
pathStyle: true
allowHttp: true
credentialsSecret:
name: minio-credentials
Split-DNS S3 Endpoint
Use template.pod.hostAliases when backup pods must reach an S3-compatible endpoint by its public hostname but resolve it to an internal IP address from inside the cluster.
apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: private-s3-backup
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- orders
storage:
storageType: s3
s3:
bucket: kafka-backups
region: us-east-1
endpoint: https://s3.internal
pathStyle: true
credentialsSecret:
name: s3-credentials
template:
pod:
hostAliases:
- ip: "10.10.0.5"
hostnames:
- s3.internal
- minio.internal
Status
status:
phase: Completed
message: "Backup completed successfully"
lastBackupTime: "2026-04-13T12:00:00Z"
nextScheduledBackup: "2026-04-13T13:00:00Z"
recordsProcessed: 1000000
bytesProcessed: 1073741824
segmentsCompleted: 8
checkpointEnabled: true
lastCheckpointTime: "2026-04-13T12:05:00Z"
resumable: true
backupId: "production-backup-20260413-120000"
lastRetentionTime: "2026-04-13T12:06:00Z"
retentionInspectedBackups: 12
retentionEligibleBackups: 2
retentionDeletedBackups: 0
retentionReclaimedBytes: 536870912
retentionDryRun: true
retentionError: null
Next Steps
- KafkaRestore - Restore from backups
- Scheduled Backups Guide - Scheduling strategies
- Backup Retention Guide - Configure opt-in retention
- Secrets Guide - Configure credentials