Restore Jobs and Retry Behavior

The operator executes every backup and restore as a Kubernetes Job. This page describes how those Jobs behave: how many times they may run, how to control retries with spec.backoffLimit, what the CR status reports, and what happens when you delete a resource.

Restores run exactly once

A KafkaRestore is one-shot. The operator creates a single Job for it and never creates another — whether the Job succeeds or fails. Because a restore appends to (or purges) the target topics, implicitly re-running one could duplicate data, so every run must be intentional:

After a successful restore the CR reports RestoreComplete=True and is never re-executed.
After a failed restore the CR reports Ready=False / RestoreFailed and the operator does not retry it.
To run a restore again, delete the KafkaRestore and create a new one.

Available from operator 0.2.9. Earlier versions could re-create a completed restore Job on every 5-minute reconcile (#29).

Controlling pod retries: `spec.backoffLimit`

Within its single Job, pod-level retries are governed by the Job's backoffLimit, which you can set on both CRDs:

apiVersion: kafkabackup.com/v1alpha1
kind: KafkaRestore
metadata:
  name: restore-orders
  namespace: kafka
spec:
  strimziClusterRef:
    name: my-cluster
  backupRef:
    name: daily-backup
  backoffLimit: 2   # allow up to 2 pod retries (3 attempts total)

CRD	Default	Rationale
`KafkaRestore`	`0` — exactly one attempt	A retried pod re-applies a partially completed restore, which can duplicate records. Opt in deliberately if your restore is idempotent (e.g. `dryRun`).
`KafkaBackup`	`3`	Re-running a backup is safe; transient failures (broker restarts, network blips) are retried automatically. Applies to one-shot Jobs and scheduled CronJob runs.

apiVersion: kafkabackup.com/v1alpha1
kind: KafkaBackup
metadata:
  name: daily-backup
  namespace: kafka
spec:
  strimziClusterRef:
    name: my-cluster
  schedule:
    cron: "0 2 * * *"
  backoffLimit: 1   # tighten scheduled runs to a single retry
  storage:
    type: s3
    s3:
      bucket: my-kafka-backups
      region: eu-west-1

spec.backoffLimit is available from operator 0.2.10 (#31). Before that, all Jobs used a fixed backoffLimit: 3.

Status conditions

The operator watches its Jobs, so the CR status reflects the outcome within seconds of the Job finishing:

Condition	Meaning
`Ready=False` / `RestoreRunning`	The restore Job is running (or waiting to start)
`Ready=True` / `RestoreCompleted` and `RestoreComplete=True`	The restore Job succeeded; `status.restore` carries start/completion times
`Ready=False` / `RestoreFailed` and `Error=True`	The restore Job exhausted its `backoffLimit`; the operator will not retry

KafkaBackup reports the analogous BackupRunning / BackupCompleted / BackupFailed reasons, plus status.lastBackup and status.backupHistory for completed runs.

kubectl get kafkarestore restore-orders -n kafka \
  -o jsonpath='{range .status.conditions[*]}{.type}={.status} ({.reason}){"\n"}{end}'

Cleanup on deletion

Deleting a KafkaBackup or KafkaRestore removes everything the operator created for it — Jobs, scheduled CronJobs, generated ConfigMaps, and the Jobs' pods. Deletes are issued with Background propagation so the garbage collector removes dependents; completed pods are not left behind.

Available from operator 0.2.10. Earlier versions left Completed pods orphaned after CR deletion (#30).

kubectl delete kafkarestore restore-orders -n kafka
kubectl get jobs,pods -n kafka -l kafkabackup.com/restore=restore-orders
# No resources found — Jobs and pods are garbage collected together

Pausing reconciliation

Add Strimzi's standard annotation to temporarily stop reconciliation for a KafkaBackup or KafkaRestore:

kubectl annotate kafkabackup daily-backup -n kafka \
  strimzi.io/pause-reconciliation="true"

The operator reports ReconciliationPaused=True, but does not add a finalizer, resolve Strimzi dependencies, or create/update ConfigMaps, Jobs, or CronJobs. Deletion cleanup still runs for resources that already have the operator finalizer.

Resume by removing the annotation (or setting it to "false"):

kubectl annotate kafkabackup daily-backup -n kafka \
  strimzi.io/pause-reconciliation-

Available from operator 0.2.15 (#44).

Restores run exactly once​

Controlling pod retries: spec.backoffLimit​

Status conditions​

Cleanup on deletion​

Pausing reconciliation​

Restores run exactly once

Controlling pod retries: `spec.backoffLimit`

Status conditions

Cleanup on deletion

Pausing reconciliation