Skip to main content

KafkaBackup CRD

The KafkaBackup custom resource defines a backup run or recurring backup schedule for Kafka topics.

Full Specification

apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: production-backup
namespace: kafka-backup
spec:
kafkaCluster:
bootstrapServers:
- kafka-0.kafka.svc:9092
- kafka-1.kafka.svc:9092
securityProtocol: SASL_SSL
caSecret: # optional: separate CA secret (e.g. Strimzi)
name: cluster-ca-cert
caKey: ca.crt
tlsSecret:
name: kafka-tls
caKey: ca.crt
certKey: tls.crt
keyKey: tls.key
saslSecret:
name: kafka-credentials
mechanism: SCRAM-SHA-512
usernameKey: username
passwordKey: password
connection:
tcpKeepalive: true
keepaliveTimeSecs: 60
keepaliveIntervalSecs: 20
tcpNodelay: true
connectionsPerBroker: 4

topics:
- orders
- payments
- "events-*"

storage:
storageType: s3
s3:
bucket: kafka-backups
region: us-west-2
prefix: production/hourly
endpoint: https://s3.us-west-2.amazonaws.com
pathStyle: false
allowHttp: false
credentialsSecret:
name: s3-credentials
accessKeyIdKey: AWS_ACCESS_KEY_ID
secretAccessKeyKey: AWS_SECRET_ACCESS_KEY

schedule: "0 0 * * * * *" # Every hour, cron format includes seconds
suspend: false

compression: zstd
compressionLevel: 3
segmentMaxBytes: 134217728
segmentMaxIntervalMs: 60000

# Backup mode
continuous: false
stopAtCurrentOffsets: true
pollIntervalMs: 100

includeOffsetHeaders: true
sourceClusterId: production-us-west-2
consumerGroupSnapshot: true

checkpoint:
enabled: true
intervalSecs: 30

retention:
enabled: false
maxAgeDays: 30
keepLast: 3
dryRun: true

rateLimiting:
recordsPerSec: 0
bytesPerSec: 0
maxConcurrentPartitions: 4

circuitBreaker:
enabled: true
failureThreshold: 5
resetTimeoutSecs: 60
successThreshold: 3
operationTimeoutMs: 30000

metrics:
enabled: true
port: 9090
bindAddress: "0.0.0.0"
path: /metrics
updateIntervalMs: 500
maxPartitionLabels: 100

template:
pod:
hostAliases:
- ip: "10.10.0.5"
hostnames:
- s3.internal
- minio.internal

Backup Mode Changes in v1.0.0

checkpoint.enabled no longer makes a backup run continuously. Use these fields explicitly:

FieldDefaultDescription
continuousfalseKeep polling and writing new records after the initial pass
stopAtCurrentOffsetsfalseSnapshot mode: capture starting high watermarks and exit after all partitions catch up
consumerGroupSnapshotfalseWrite consumer-groups-snapshot.json after each backup cycle

For scheduled point-in-time backups, set stopAtCurrentOffsets: true. For streaming backups, set continuous: true. Do not set continuous and stopAtCurrentOffsets together.

Backup Data Retention

Operator-managed KafkaBackup retention is disabled by default. When spec.retention is absent, or when spec.retention.enabled: false, the operator does not delete backup data.

Enable retention per KafkaBackup resource when you want the operator to prune complete backup sets after a successful backup run:

spec:
retention:
enabled: true
maxAgeDays: 30
keepLast: 3
dryRun: false

Retention deletes whole backup IDs, not individual segments, manifests, or offset files from a still-retained backup set. This avoids creating partially pruned manifests that would break point-in-time restore unexpectedly.

When retention is enabled:

  • Set at least one of maxAgeDays or keepLast.
  • maxAgeDays and keepLast must be greater than 0 when set.
  • The current backup ID is always retained.
  • If only maxAgeDays is set, the operator still keeps at least the newest backup set.
  • dryRun: true reports what would be deleted without deleting data.
  • Retention failures are reported in status but do not turn a successful backup into a failed backup.

Operator-managed retention is supported for PVC/local storage, S3/S3-compatible storage, and Azure Blob Storage. GCS retention is not currently wired through the operator; use a GCS bucket lifecycle policy for that backend.

Storage lifecycle policies are still a good option when retention should be enforced outside Kubernetes, when you need backend-native legal hold or object-lock controls, or when you use GCS. Keep all retention windows aligned with restore requirements because deleting old backup sets makes older point-in-time restore windows unavailable.

info

KafkaBackupValidation.spec.evidence.retentionDays controls validation evidence retention only. It does not control KafkaBackup data retention.

Spec Fields

kafkaCluster

FieldTypeRequiredDescription
bootstrapServers[]stringYesKafka broker addresses
securityProtocolstringNoPLAINTEXT, SSL, SASL_PLAINTEXT, or SASL_SSL
tlsSecretobjectNoTLS certificate secret reference
caSecretobjectNoSeparate CA certificate secret (e.g. Strimzi cluster CA). Overrides caKey in tlsSecret when both are set
saslSecretobjectNoSASL credentials secret reference
connectionobjectNoKafka TCP connection tuning

kafkaCluster.connection

FieldTypeDefaultDescription
tcpKeepalivebooltrueEnable TCP keepalive
keepaliveTimeSecsint60Seconds before the first keepalive probe
keepaliveIntervalSecsint20Seconds between keepalive probes
tcpNodelaybooltrueEnable TCP_NODELAY
connectionsPerBrokerint4TCP connections to maintain per broker

storage

FieldTypeRequiredDescription
storageTypestringNopvc, s3, azure, or gcs; defaults to pvc
pvcobjectWhen storageType: pvcPVC storage configuration
s3objectWhen storageType: s3S3 or S3-compatible storage configuration
azureobjectWhen storageType: azureAzure Blob Storage configuration
gcsobjectWhen storageType: gcsGoogle Cloud Storage configuration

s3

FieldTypeRequiredDescription
bucketstringYesBucket name
regionstringYesAWS region
prefixstringNoObject key prefix
endpointstringNoCustom S3-compatible endpoint
pathStyleboolNoForce path-style addressing for MinIO, Ceph, and similar endpoints
allowHttpboolNoAllow HTTP endpoint traffic; the operator logs a warning when enabled
credentialsSecretobjectYesSecret containing access key credentials

azure

FieldTypeRequiredDescription
accountNamestringYesStorage account name
containerstringYesBlob container name
prefixstringNoBlob prefix
endpointstringNoCustom endpoint for sovereign cloud or private endpoint use
useWorkloadIdentityboolNoUse AKS Workload Identity
credentialsSecretobjectNoAccount key secret
sasTokenSecretobjectNoSAS token secret
servicePrincipalSecretobjectNoService principal secret

Azure authentication methods are mutually exclusive. If none are set, the adapter falls back to Azure default credentials when the runtime environment supports them.

Backup Tuning

FieldTypeDefaultDescription
compressionstringzstdnone, lz4, or zstd
compressionLevelint3Compression level
segmentMaxBytesint134217728Rotate backup segments after this many bytes
segmentMaxIntervalMsint60000Rotate backup segments after this many milliseconds
pollIntervalMsint100Poll interval for continuous mode
includeOffsetHeadersbooltrueAdd original offset headers for restore and offset mapping
sourceClusterIdstringunsetSource cluster ID recorded in manifests and offset headers
schedulestringunsetCron schedule. The operator uses the Rust cron syntax with seconds, for example 0 0 * * * * *
suspendboolfalsePause scheduled backups

checkpoint

FieldTypeDefaultDescription
enabledbooltrueEnable resumable checkpoints
intervalSecsint30Checkpoint interval
storageobjectunsetOptional PVC checkpoint storage override

retention

FieldTypeDefaultDescription
enabledboolfalseEnable operator-managed retention pruning
maxAgeDaysintunsetDelete complete backup sets older than this many days
keepLastintunsetKeep at least this many newest backup sets
dryRunboolfalseReport eligible backup sets without deleting data

retention.enabled: true requires at least one of maxAgeDays or keepLast. Retention runs after a backup completes successfully. For long-running continuous: true backups, retention does not prune while the backup engine is still running.

template.pod

FieldTypeRequiredDescription
hostAliasesarrayNoPod-level Kubernetes hostAliases entries injected into backup Job and CronJob pods. Use this for split-DNS or private endpoint storage access

Each hostAliases item uses the Kubernetes HostAlias shape:

FieldTypeRequiredDescription
ipstringYesIP address to write into the pod /etc/hosts file
hostnames[]stringNoHostnames mapped to ip
tip

Prefer DNS or Kubernetes Services when possible. Use hostAliases as a narrow pod-local override for cases such as VPN-only S3-compatible endpoints or split-DNS storage names.

Examples

Scheduled Snapshot Backup

apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: hourly-snapshot
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- orders
storage:
storageType: pvc
pvc:
claimName: kafka-backups
schedule: "0 0 * * * * *"
stopAtCurrentOffsets: true
compression: zstd

Scheduled Backup with Retention Dry Run

apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: hourly-snapshot
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- orders
storage:
storageType: s3
s3:
bucket: kafka-backups
region: us-west-2
prefix: hourly
credentialsSecret:
name: s3-credentials
schedule: "0 0 * * * * *"
stopAtCurrentOffsets: true
retention:
enabled: true
maxAgeDays: 30
keepLast: 3
dryRun: true

Continuous Backup with Consumer Group Snapshot

apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: continuous-backup
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- "*"
storage:
storageType: azure
azure:
accountName: kafkabackups123456
container: kafka-backups
prefix: production
useWorkloadIdentity: true
continuous: true
includeOffsetHeaders: true
sourceClusterId: production
consumerGroupSnapshot: true

S3-Compatible Endpoint

apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: minio-backup
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- orders
storage:
storageType: s3
s3:
bucket: kafka-backups
region: us-east-1
endpoint: http://minio.storage.svc.cluster.local:9000
pathStyle: true
allowHttp: true
credentialsSecret:
name: minio-credentials

Split-DNS S3 Endpoint

Use template.pod.hostAliases when backup pods must reach an S3-compatible endpoint by its public hostname but resolve it to an internal IP address from inside the cluster.

apiVersion: kafka.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: private-s3-backup
spec:
kafkaCluster:
bootstrapServers:
- kafka:9092
topics:
- orders
storage:
storageType: s3
s3:
bucket: kafka-backups
region: us-east-1
endpoint: https://s3.internal
pathStyle: true
credentialsSecret:
name: s3-credentials
template:
pod:
hostAliases:
- ip: "10.10.0.5"
hostnames:
- s3.internal
- minio.internal

Status

status:
phase: Completed
message: "Backup completed successfully"
lastBackupTime: "2026-04-13T12:00:00Z"
nextScheduledBackup: "2026-04-13T13:00:00Z"
recordsProcessed: 1000000
bytesProcessed: 1073741824
segmentsCompleted: 8
checkpointEnabled: true
lastCheckpointTime: "2026-04-13T12:05:00Z"
resumable: true
backupId: "production-backup-20260413-120000"
lastRetentionTime: "2026-04-13T12:06:00Z"
retentionInspectedBackups: 12
retentionEligibleBackups: 2
retentionDeletedBackups: 0
retentionReclaimedBytes: 536870912
retentionDryRun: true
retentionError: null

Next Steps