Skip to main content

KafkaBackup CRD

The KafkaBackup custom resource defines a backup run or recurring backup schedule for Kafka topics.

Full Specification

apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: production-backup
namespace: kafka-backup
spec:
kafkaCluster:
bootstrapServers:
- kafka-0.kafka.svc:9092
- kafka-1.kafka.svc:9092
securityProtocol: SASL_SSL
caSecret: # optional: separate CA secret (e.g. Strimzi)
name: cluster-ca-cert
caKey: ca.crt
tlsSecret:
name: kafka-tls
caKey: ca.crt
certKey: tls.crt
keyKey: tls.key
saslSecret:
name: kafka-credentials
mechanism: SCRAM-SHA-512
usernameKey: username
passwordKey: password
connection:
tcpKeepalive: true
keepaliveTimeSecs: 60
keepaliveIntervalSecs: 20
tcpNodelay: true
connectionsPerBroker: 4

topics:
- orders
- payments
- "events-*"

storage:
storageType: s3
s3:
bucket: kafka-backups
region: us-west-2
prefix: production/hourly
endpoint: https://s3.us-west-2.amazonaws.com
pathStyle: false
allowHttp: false
credentialsSecret:
name: s3-credentials
accessKeyIdKey: AWS_ACCESS_KEY_ID
secretAccessKeyKey: AWS_SECRET_ACCESS_KEY

schedule: "0 0 * * * * *" # Every hour, cron format includes seconds
suspend: false

compression: zstd
compressionLevel: 3
segmentMaxBytes: 134217728
segmentMaxIntervalMs: 60000

# Backup mode
continuous: false
stopAtCurrentOffsets: true
pollIntervalMs: 100

includeOffsetHeaders: true
sourceClusterId: production-us-west-2
consumerGroupSnapshot: true

checkpoint:
enabled: true
intervalSecs: 30

rateLimiting:
recordsPerSec: 0
bytesPerSec: 0
maxConcurrentPartitions: 4

circuitBreaker:
enabled: true
failureThreshold: 5
resetTimeoutSecs: 60
successThreshold: 3
operationTimeoutMs: 30000

metrics:
enabled: true
port: 9090
bindAddress: "0.0.0.0"
path: /metrics
updateIntervalMs: 500
maxPartitionLabels: 100

template:
pod:
hostAliases:
- ip: "10.10.0.5"
hostnames:
- s3.internal
- minio.internal

Backup Mode Changes in v1.0.0

checkpoint.enabled no longer makes a backup run continuously. Use these fields explicitly:

FieldDefaultDescription
continuousfalseKeep polling and writing new records after the initial pass
stopAtCurrentOffsetsfalseSnapshot mode: capture starting high watermarks and exit after all partitions catch up
consumerGroupSnapshotfalseWrite consumer-groups-snapshot.json after each backup cycle

For scheduled point-in-time backups, set stopAtCurrentOffsets: true. For streaming backups, set continuous: true. Do not set continuous and stopAtCurrentOffsets together.

Backup Data Retention

KafkaBackup does not currently include a backup data retention field and does not automatically delete old backup segments, manifests, offset stores, or consumer group snapshots.

Backup data remains in the configured storage backend until it is removed outside the operator:

  • For S3, S3-compatible storage, Azure Blob Storage, and GCS, configure lifecycle or expiration policies on the bucket/container prefix used by the backup.
  • For PVC storage, run an external cleanup process, such as a Kubernetes CronJob, that mounts the same PVC and removes old backup files. Kubernetes PV reclaim policies only apply when a PVC is released; they do not clean up old files inside an active volume.

Deleting old segment objects can make older PITR restore windows unavailable. Keep storage lifecycle rules aligned with your restore requirements.

info

KafkaBackupValidation.spec.evidence.retentionDays controls validation evidence retention only. It does not control KafkaBackup data retention.

Spec Fields

kafkaCluster

FieldTypeRequiredDescription
bootstrapServers[]stringYesKafka broker addresses
securityProtocolstringNoPLAINTEXT, SSL, SASL_PLAINTEXT, or SASL_SSL
tlsSecretobjectNoTLS certificate secret reference
caSecretobjectNoSeparate CA certificate secret (e.g. Strimzi cluster CA). Overrides caKey in tlsSecret when both are set
saslSecretobjectNoSASL credentials secret reference
connectionobjectNoKafka TCP connection tuning

kafkaCluster.connection

FieldTypeDefaultDescription
tcpKeepalivebooltrueEnable TCP keepalive
keepaliveTimeSecsint60Seconds before the first keepalive probe
keepaliveIntervalSecsint20Seconds between keepalive probes
tcpNodelaybooltrueEnable TCP_NODELAY
connectionsPerBrokerint4TCP connections to maintain per broker

storage

FieldTypeRequiredDescription
storageTypestringNopvc, s3, azure, or gcs; defaults to pvc
pvcobjectWhen storageType: pvcPVC storage configuration
s3objectWhen storageType: s3S3 or S3-compatible storage configuration
azureobjectWhen storageType: azureAzure Blob Storage configuration
gcsobjectWhen storageType: gcsGoogle Cloud Storage configuration

s3

FieldTypeRequiredDescription
bucketstringYesBucket name
regionstringYesAWS region
prefixstringNoObject key prefix
endpointstringNoCustom S3-compatible endpoint
pathStyleboolNoForce path-style addressing for MinIO, Ceph, and similar endpoints
allowHttpboolNoAllow HTTP endpoint traffic; the operator logs a warning when enabled
credentialsSecretobjectYesSecret containing access key credentials

azure

FieldTypeRequiredDescription
accountNamestringYesStorage account name
containerstringYesBlob container name
prefixstringNoBlob prefix
endpointstringNoCustom endpoint for sovereign cloud or private endpoint use
useWorkloadIdentityboolNoUse AKS Workload Identity
credentialsSecretobjectNoAccount key secret
sasTokenSecretobjectNoSAS token secret
servicePrincipalSecretobjectNoService principal secret

Azure authentication methods are mutually exclusive. If none are set, the adapter falls back to Azure default credentials when the runtime environment supports them.

Backup Tuning

FieldTypeDefaultDescription
compressionstringzstdnone, lz4, or zstd
compressionLevelint3Compression level
segmentMaxBytesint134217728Rotate backup segments after this many bytes
segmentMaxIntervalMsint60000Rotate backup segments after this many milliseconds
pollIntervalMsint100Poll interval for continuous mode
includeOffsetHeadersbooltrueAdd original offset headers for restore and offset mapping
sourceClusterIdstringunsetSource cluster ID recorded in manifests and offset headers
schedulestringunsetCron schedule. The operator uses the Rust cron syntax with seconds, for example 0 0 * * * * *
suspendboolfalsePause scheduled backups

checkpoint

FieldTypeDefaultDescription
enabledbooltrueEnable resumable checkpoints
intervalSecsint30Checkpoint interval
storageobjectunsetOptional PVC checkpoint storage override

template.pod

FieldTypeRequiredDescription
hostAliasesarrayNoPod-level Kubernetes hostAliases entries injected into backup Job and CronJob pods. Use this for split-DNS or private endpoint storage access

Each hostAliases item uses the Kubernetes HostAlias shape:

FieldTypeRequiredDescription
ipstringYesIP address to write into the pod /etc/hosts file
hostnames[]stringNoHostnames mapped to ip
tip

Prefer DNS or Kubernetes Services when possible. Use hostAliases as a narrow pod-local override for cases such as VPN-only S3-compatible endpoints or split-DNS storage names.

Examples

Scheduled Snapshot Backup

apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: hourly-snapshot
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- orders
storage:
storageType: pvc
pvc:
claimName: kafka-backups
schedule: "0 0 * * * * *"
stopAtCurrentOffsets: true
compression: zstd

Continuous Backup with Consumer Group Snapshot

apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: continuous-backup
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- "*"
storage:
storageType: azure
azure:
accountName: kafkabackups123456
container: kafka-backups
prefix: production
useWorkloadIdentity: true
continuous: true
includeOffsetHeaders: true
sourceClusterId: production
consumerGroupSnapshot: true

S3-Compatible Endpoint

apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: minio-backup
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- orders
storage:
storageType: s3
s3:
bucket: kafka-backups
region: us-east-1
endpoint: http://minio.storage.svc.cluster.local:9000
pathStyle: true
allowHttp: true
credentialsSecret:
name: minio-credentials

Split-DNS S3 Endpoint

Use template.pod.hostAliases when backup pods must reach an S3-compatible endpoint by its public hostname but resolve it to an internal IP address from inside the cluster.

apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: private-s3-backup
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- orders
storage:
storageType: s3
s3:
bucket: kafka-backups
region: us-east-1
endpoint: https://s3.internal
pathStyle: true
credentialsSecret:
name: s3-credentials
template:
pod:
hostAliases:
- ip: "10.10.0.5"
hostnames:
- s3.internal
- minio.internal

Status

status:
phase: Completed
message: "Backup completed successfully"
lastBackupTime: "2026-04-13T12:00:00Z"
nextScheduledBackup: "2026-04-13T13:00:00Z"
recordsProcessed: 1000000
bytesProcessed: 1073741824
segmentsCompleted: 8
checkpointEnabled: true
lastCheckpointTime: "2026-04-13T12:05:00Z"
resumable: true
backupId: "production-backup-20260413-120000"

Next Steps