KafkaBackup CRD
The KafkaBackup custom resource defines a backup run or recurring backup schedule for Kafka topics.
Full Specification
apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: production-backup
namespace: kafka-backup
spec:
kafkaCluster:
bootstrapServers:
- kafka-0.kafka.svc:9092
- kafka-1.kafka.svc:9092
securityProtocol: SASL_SSL
caSecret: # optional: separate CA secret (e.g. Strimzi)
name: cluster-ca-cert
caKey: ca.crt
tlsSecret:
name: kafka-tls
caKey: ca.crt
certKey: tls.crt
keyKey: tls.key
saslSecret:
name: kafka-credentials
mechanism: SCRAM-SHA-512
usernameKey: username
passwordKey: password
connection:
tcpKeepalive: true
keepaliveTimeSecs: 60
keepaliveIntervalSecs: 20
tcpNodelay: true
connectionsPerBroker: 4
topics:
- orders
- payments
- "events-*"
storage:
storageType: s3
s3:
bucket: kafka-backups
region: us-west-2
prefix: production/hourly
endpoint: https://s3.us-west-2.amazonaws.com
pathStyle: false
allowHttp: false
credentialsSecret:
name: s3-credentials
accessKeyIdKey: AWS_ACCESS_KEY_ID
secretAccessKeyKey: AWS_SECRET_ACCESS_KEY
schedule: "0 0 * * * * *" # Every hour, cron format includes seconds
suspend: false
compression: zstd
compressionLevel: 3
segmentMaxBytes: 134217728
segmentMaxIntervalMs: 60000
# Backup mode
continuous: false
stopAtCurrentOffsets: true
pollIntervalMs: 100
includeOffsetHeaders: true
sourceClusterId: production-us-west-2
consumerGroupSnapshot: true
checkpoint:
enabled: true
intervalSecs: 30
rateLimiting:
recordsPerSec: 0
bytesPerSec: 0
maxConcurrentPartitions: 4
circuitBreaker:
enabled: true
failureThreshold: 5
resetTimeoutSecs: 60
successThreshold: 3
operationTimeoutMs: 30000
metrics:
enabled: true
port: 9090
bindAddress: "0.0.0.0"
path: /metrics
updateIntervalMs: 500
maxPartitionLabels: 100
template:
pod:
hostAliases:
- ip: "10.10.0.5"
hostnames:
- s3.internal
- minio.internal
Backup Mode Changes in v1.0.0
checkpoint.enabled no longer makes a backup run continuously. Use these fields explicitly:
| Field | Default | Description |
|---|---|---|
continuous | false | Keep polling and writing new records after the initial pass |
stopAtCurrentOffsets | false | Snapshot mode: capture starting high watermarks and exit after all partitions catch up |
consumerGroupSnapshot | false | Write consumer-groups-snapshot.json after each backup cycle |
For scheduled point-in-time backups, set stopAtCurrentOffsets: true. For streaming backups, set continuous: true. Do not set continuous and stopAtCurrentOffsets together.
Backup Data Retention
KafkaBackup does not currently include a backup data retention field and does not automatically delete old backup segments, manifests, offset stores, or consumer group snapshots.
Backup data remains in the configured storage backend until it is removed outside the operator:
- For S3, S3-compatible storage, Azure Blob Storage, and GCS, configure lifecycle or expiration policies on the bucket/container prefix used by the backup.
- For PVC storage, run an external cleanup process, such as a Kubernetes
CronJob, that mounts the same PVC and removes old backup files. Kubernetes PV reclaim policies only apply when a PVC is released; they do not clean up old files inside an active volume.
Deleting old segment objects can make older PITR restore windows unavailable. Keep storage lifecycle rules aligned with your restore requirements.
KafkaBackupValidation.spec.evidence.retentionDays controls validation evidence retention only. It does not control KafkaBackup data retention.
Spec Fields
kafkaCluster
| Field | Type | Required | Description |
|---|---|---|---|
bootstrapServers | []string | Yes | Kafka broker addresses |
securityProtocol | string | No | PLAINTEXT, SSL, SASL_PLAINTEXT, or SASL_SSL |
tlsSecret | object | No | TLS certificate secret reference |
caSecret | object | No | Separate CA certificate secret (e.g. Strimzi cluster CA). Overrides caKey in tlsSecret when both are set |
saslSecret | object | No | SASL credentials secret reference |
connection | object | No | Kafka TCP connection tuning |
kafkaCluster.connection
| Field | Type | Default | Description |
|---|---|---|---|
tcpKeepalive | bool | true | Enable TCP keepalive |
keepaliveTimeSecs | int | 60 | Seconds before the first keepalive probe |
keepaliveIntervalSecs | int | 20 | Seconds between keepalive probes |
tcpNodelay | bool | true | Enable TCP_NODELAY |
connectionsPerBroker | int | 4 | TCP connections to maintain per broker |
storage
| Field | Type | Required | Description |
|---|---|---|---|
storageType | string | No | pvc, s3, azure, or gcs; defaults to pvc |
pvc | object | When storageType: pvc | PVC storage configuration |
s3 | object | When storageType: s3 | S3 or S3-compatible storage configuration |
azure | object | When storageType: azure | Azure Blob Storage configuration |
gcs | object | When storageType: gcs | Google Cloud Storage configuration |
s3
| Field | Type | Required | Description |
|---|---|---|---|
bucket | string | Yes | Bucket name |
region | string | Yes | AWS region |
prefix | string | No | Object key prefix |
endpoint | string | No | Custom S3-compatible endpoint |
pathStyle | bool | No | Force path-style addressing for MinIO, Ceph, and similar endpoints |
allowHttp | bool | No | Allow HTTP endpoint traffic; the operator logs a warning when enabled |
credentialsSecret | object | Yes | Secret containing access key credentials |
azure
| Field | Type | Required | Description |
|---|---|---|---|
accountName | string | Yes | Storage account name |
container | string | Yes | Blob container name |
prefix | string | No | Blob prefix |
endpoint | string | No | Custom endpoint for sovereign cloud or private endpoint use |
useWorkloadIdentity | bool | No | Use AKS Workload Identity |
credentialsSecret | object | No | Account key secret |
sasTokenSecret | object | No | SAS token secret |
servicePrincipalSecret | object | No | Service principal secret |
Azure authentication methods are mutually exclusive. If none are set, the adapter falls back to Azure default credentials when the runtime environment supports them.
Backup Tuning
| Field | Type | Default | Description |
|---|---|---|---|
compression | string | zstd | none, lz4, or zstd |
compressionLevel | int | 3 | Compression level |
segmentMaxBytes | int | 134217728 | Rotate backup segments after this many bytes |
segmentMaxIntervalMs | int | 60000 | Rotate backup segments after this many milliseconds |
pollIntervalMs | int | 100 | Poll interval for continuous mode |
includeOffsetHeaders | bool | true | Add original offset headers for restore and offset mapping |
sourceClusterId | string | unset | Source cluster ID recorded in manifests and offset headers |
schedule | string | unset | Cron schedule. The operator uses the Rust cron syntax with seconds, for example 0 0 * * * * * |
suspend | bool | false | Pause scheduled backups |
checkpoint
| Field | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable resumable checkpoints |
intervalSecs | int | 30 | Checkpoint interval |
storage | object | unset | Optional PVC checkpoint storage override |
template.pod
| Field | Type | Required | Description |
|---|---|---|---|
hostAliases | array | No | Pod-level Kubernetes hostAliases entries injected into backup Job and CronJob pods. Use this for split-DNS or private endpoint storage access |
Each hostAliases item uses the Kubernetes HostAlias shape:
| Field | Type | Required | Description |
|---|---|---|---|
ip | string | Yes | IP address to write into the pod /etc/hosts file |
hostnames | []string | No | Hostnames mapped to ip |
Prefer DNS or Kubernetes Services when possible. Use hostAliases as a narrow pod-local override for cases such as VPN-only S3-compatible endpoints or split-DNS storage names.
Examples
Scheduled Snapshot Backup
apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: hourly-snapshot
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- orders
storage:
storageType: pvc
pvc:
claimName: kafka-backups
schedule: "0 0 * * * * *"
stopAtCurrentOffsets: true
compression: zstd
Continuous Backup with Consumer Group Snapshot
apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: continuous-backup
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- "*"
storage:
storageType: azure
azure:
accountName: kafkabackups123456
container: kafka-backups
prefix: production
useWorkloadIdentity: true
continuous: true
includeOffsetHeaders: true
sourceClusterId: production
consumerGroupSnapshot: true
S3-Compatible Endpoint
apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: minio-backup
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- orders
storage:
storageType: s3
s3:
bucket: kafka-backups
region: us-east-1
endpoint: http://minio.storage.svc.cluster.local:9000
pathStyle: true
allowHttp: true
credentialsSecret:
name: minio-credentials
Split-DNS S3 Endpoint
Use template.pod.hostAliases when backup pods must reach an S3-compatible endpoint by its public hostname but resolve it to an internal IP address from inside the cluster.
apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: private-s3-backup
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- orders
storage:
storageType: s3
s3:
bucket: kafka-backups
region: us-east-1
endpoint: https://s3.internal
pathStyle: true
credentialsSecret:
name: s3-credentials
template:
pod:
hostAliases:
- ip: "10.10.0.5"
hostnames:
- s3.internal
- minio.internal
Status
status:
phase: Completed
message: "Backup completed successfully"
lastBackupTime: "2026-04-13T12:00:00Z"
nextScheduledBackup: "2026-04-13T13:00:00Z"
recordsProcessed: 1000000
bytesProcessed: 1073741824
segmentsCompleted: 8
checkpointEnabled: true
lastCheckpointTime: "2026-04-13T12:05:00Z"
resumable: true
backupId: "production-backup-20260413-120000"
Next Steps
- KafkaRestore - Restore from backups
- Scheduled Backups Guide - Scheduling strategies
- Secrets Guide - Configure credentials