How to Backup and Restore Kafka Topics: A Step-by-Step Guide

July 3, 2026 · 6 min read

The team behind OSO Kafka Backup

You cannot undo a deleted Kafka topic unless you have a backup. To backup a Kafka topic, you capture its records, partition layout, and offsets to durable storage outside the cluster; to restore, you produce that data back into the same or a different cluster. This guide walks through three ways to do it — from a purpose-built CLI to a bare consumer script — and how to verify the result actually restores.

Key takeaway

Quick decision guide: a handful of small topics for a one-off → consumer script. Production topics that need offset preservation or point-in-time restore → the kafka-backup CLI. An existing Kafka Connect estate that only needs raw record archiving → an S3 sink connector.

What "backing up a Kafka topic" actually means

A real topic backup captures four things:

Records — keys, values, headers, and timestamps
Partition layout — which records lived on which partition, in what order
Offsets — both record offsets and consumer group positions
Topic configuration — partition count, retention, cleanup policy

Two things that are not backups, despite being treated as such:

Retention is scheduled deletion. When retention.ms expires, the data is gone regardless of whether anyone still needs it.
Replication (in-cluster RF=3, or MirrorMaker 2 across clusters) copies every write — including the accidental delete and the poisoned deploy — within milliseconds.

Backups exist so you can go backwards in time. Replication only goes forwards.

Method 1 — kafka-backup CLI (production topics)

The OSO Kafka Backup CLI backs up topics with offset preservation and restores them with millisecond-precision time windows. It is the right default for production.

Step 1: Install and configure

Follow the installation guide for your platform, then write a backup config:

backup.yaml
mode: backup

source:
  bootstrap_servers:
    - broker-1:9092

topics:
  include:
    - orders
    - payments

storage:
  backend: s3
  bucket: my-kafka-backups
  region: us-west-2
  prefix: backups/production

backup:
  compression: zstd
  start_offset: earliest

Filesystem, Azure Blob, and GCS backends use the same shape — see the configuration reference and the S3 integration guide.

Step 2: Run the backup

kafka-backup backup --config backup.yaml

Progress is checkpointed as it runs, so an interrupted backup resumes rather than restarting. For continuously changing topics, set continuous: true to stream changes instead of taking discrete snapshots.

Step 3: Restore — to anywhere, at any point in time

Restore to the original cluster, a new cluster, or a renamed topic:

restore.yaml
mode: restore
backup_id: "prod-backup-20260701"

target:
  bootstrap_servers:
    - dr-broker-1:9092

storage:
  backend: s3
  bucket: my-kafka-backups
  region: us-west-2
  prefix: backups/production

restore:
  create_topics: true
  topic_mapping:
    orders: orders-restored
  # Point-in-time: only records up to the moment before the incident
  time_window_end: 1751500800000   # Unix ms

kafka-backup restore --config restore.yaml

The time_window_end option is what makes this a genuine undo button: restore the topic to 14:03:27.451, the millisecond before the bad deploy started writing. Set dry_run: true first to validate the whole plan without producing a record.

Use this method when: you need point-in-time recovery, consumer offsets must survive the restore, or the backup must live outside the cluster's failure domain.

Method 2 — Kafka Connect S3 sink (existing Connect estates)

If you already operate Kafka Connect, an S3 sink connector can archive topic records to a bucket:

s3-sink.json
{
  "name": "orders-s3-backup",
  "config": {
    "connector.class": "io.confluent.connect.s3.S3SinkConnector",
    "topics": "orders",
    "s3.bucket.name": "my-kafka-archive",
    "s3.region": "us-west-2",
    "format.class": "io.confluent.connect.s3.format.json.JsonFormat",
    "flush.size": "10000"
  }
}

Restoring means running the matching S3 source connector to replay the records into a topic.

The trade-offs are real, though: consumer group offsets are not captured, restores replay records with new offsets (breaking offset-based consumers), and there is no point-in-time selection beyond whatever partitioning your sink wrote. It is archiving, not recovery tooling.

Use this method when: you need raw record archives for analytics or compliance, offsets do not matter, and Connect is already running.

Method 3 — Consumer script (small topics, one-offs)

For a small topic in a dev environment, a script can be enough:

# Backup: dump records with key, timestamp, and partition
kafka-console-consumer --bootstrap-server broker-1:9092 \
  --topic orders --from-beginning \
  --property print.key=true \
  --property print.timestamp=true \
  --property print.partition=true \
  --timeout-ms 10000 > orders-backup.txt

# Restore: replay values into a new topic
kafka-console-producer --bootstrap-server broker-1:9092 \
  --topic orders-restored < orders-backup.txt

Be honest about the limits: no offset preservation, no header capture in older tooling, timestamps become produce-time on restore, and nothing about this is incremental. It is a photocopy, not a backup system.

Use this method when: the topic is small, the moment is now, and the stakes are low.

Verifying the backup (whichever method you chose)

An unverified backup is a guess. After every backup — and on a weekly schedule — check:

Record counts match between source and restored topic (kafka-run-class kafka.tools.GetOffsetShell on both sides)
Offset continuity — no gaps at segment boundaries
Schema compatibility — restored records deserialize with the current schema
Consumer resume — a consumer group restored with the data picks up where it left off instead of reprocessing from zero

The backup best practices guide covers turning this checklist into automated, alerting-backed verification.

Frequently asked questions

How do you backup a Kafka topic?

Capture the topic's records, partition layout, offsets, and configuration to storage outside the cluster. In practice: run a backup tool such as the kafka-backup CLI with a config naming the topics and a storage backend (S3, Azure Blob, GCS, or filesystem), or archive records with a Kafka Connect S3 sink if offsets do not matter.

Can you restore a deleted Kafka topic?

Only from a backup taken before the deletion. Replication does not help — the delete propagates to replicas and mirrored clusters. From a backup, recreate the topic (create_topics: true) and restore records and consumer offsets, optionally to a new topic name.

How do you backup Kafka topics to S3?

Point a backup config at an S3 bucket (backend: s3, bucket, region, prefix) and run kafka-backup backup --config backup.yaml. Records are compressed with Zstandard or LZ4 before upload. A Kafka Connect S3 sink connector is an alternative when you only need record archiving.

How to take backup of a Kafka topic without downtime?

Backups read topics through standard consumer protocols, so producers and consumers keep running during the backup. For topics with constant writes, continuous mode streams changes instead of taking point snapshots.

How do you verify a Kafka backup?

Restore it — to a scratch cluster or with dry_run: true — and compare record counts, check offset continuity, confirm schemas deserialize, and verify a consumer group resumes from its restored offsets. Schedule this weekly; a backup that has never been restored is unproven.

Next steps: the first backup tutorial walks this end to end, and the CLI reference documents every flag used above.

What "backing up a Kafka topic" actually means​

Method 1 — kafka-backup CLI (production topics)​

Step 1: Install and configure​

Step 2: Run the backup​

Step 3: Restore — to anywhere, at any point in time​

Method 2 — Kafka Connect S3 sink (existing Connect estates)​

Method 3 — Consumer script (small topics, one-offs)​

Verifying the backup (whichever method you chose)​