How to Backup and Restore Kafka Topics: A Step-by-Step Guide
You cannot undo a deleted Kafka topic unless you have a backup. To backup a Kafka topic, you capture its records, partition layout, and offsets to durable storage outside the cluster; to restore, you produce that data back into the same or a different cluster. This guide walks through three ways to do it — from a purpose-built CLI to a bare consumer script — and how to verify the result actually restores.
Quick decision guide: a handful of small topics for a one-off → consumer script.
Production topics that need offset preservation or point-in-time restore → the
kafka-backup CLI. An existing Kafka Connect estate that only needs raw record
archiving → an S3 sink connector.
What "backing up a Kafka topic" actually means
A real topic backup captures four things:
- Records — keys, values, headers, and timestamps
- Partition layout — which records lived on which partition, in what order
- Offsets — both record offsets and consumer group positions
- Topic configuration — partition count, retention, cleanup policy
Two things that are not backups, despite being treated as such:
- Retention is scheduled deletion. When
retention.msexpires, the data is gone regardless of whether anyone still needs it. - Replication (in-cluster RF=3, or MirrorMaker 2 across clusters) copies every write — including the accidental delete and the poisoned deploy — within milliseconds.
Backups exist so you can go backwards in time. Replication only goes forwards.
Method 1 — kafka-backup CLI (production topics)
The OSO Kafka Backup CLI backs up topics with offset preservation and restores them with millisecond-precision time windows. It is the right default for production.
Step 1: Install and configure
Follow the installation guide for your platform, then write a backup config:
mode: backup
source:
bootstrap_servers:
- broker-1:9092
topics:
include:
- orders
- payments
storage:
backend: s3
bucket: my-kafka-backups
region: us-west-2
prefix: backups/production
backup:
compression: zstd
start_offset: earliest
Filesystem, Azure Blob, and GCS backends use the same shape — see the configuration reference and the S3 integration guide.
Step 2: Run the backup
kafka-backup backup --config backup.yaml
Progress is checkpointed as it runs, so an interrupted backup resumes rather
than restarting. For continuously changing topics, set continuous: true to
stream changes instead of taking discrete snapshots.
Step 3: Restore — to anywhere, at any point in time
Restore to the original cluster, a new cluster, or a renamed topic:
mode: restore
backup_id: "prod-backup-20260701"
target:
bootstrap_servers:
- dr-broker-1:9092
storage:
backend: s3
bucket: my-kafka-backups
region: us-west-2
prefix: backups/production
restore:
create_topics: true
topic_mapping:
orders: orders-restored
# Point-in-time: only records up to the moment before the incident
time_window_end: 1751500800000 # Unix ms
kafka-backup restore --config restore.yaml
The time_window_end option is what makes this a genuine undo button: restore
the topic to 14:03:27.451, the millisecond before the bad deploy started
writing. Set dry_run: true first to validate the whole plan without producing
a record.
Use this method when: you need point-in-time recovery, consumer offsets must survive the restore, or the backup must live outside the cluster's failure domain.
Method 2 — Kafka Connect S3 sink (existing Connect estates)
If you already operate Kafka Connect, an S3 sink connector can archive topic records to a bucket:
{
"name": "orders-s3-backup",
"config": {
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"topics": "orders",
"s3.bucket.name": "my-kafka-archive",
"s3.region": "us-west-2",
"format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"flush.size": "10000"
}
}
Restoring means running the matching S3 source connector to replay the records into a topic.
The trade-offs are real, though: consumer group offsets are not captured, restores replay records with new offsets (breaking offset-based consumers), and there is no point-in-time selection beyond whatever partitioning your sink wrote. It is archiving, not recovery tooling.
Use this method when: you need raw record archives for analytics or compliance, offsets do not matter, and Connect is already running.
Method 3 — Consumer script (small topics, one-offs)
For a small topic in a dev environment, a script can be enough:
# Backup: dump records with key, timestamp, and partition
kafka-console-consumer --bootstrap-server broker-1:9092 \
--topic orders --from-beginning \
--property print.key=true \
--property print.timestamp=true \
--property print.partition=true \
--timeout-ms 10000 > orders-backup.txt
# Restore: replay values into a new topic
kafka-console-producer --bootstrap-server broker-1:9092 \
--topic orders-restored < orders-backup.txt
Be honest about the limits: no offset preservation, no header capture in older tooling, timestamps become produce-time on restore, and nothing about this is incremental. It is a photocopy, not a backup system.
Use this method when: the topic is small, the moment is now, and the stakes are low.
Verifying the backup (whichever method you chose)
An unverified backup is a guess. After every backup — and on a weekly schedule — check:
- Record counts match between source and restored topic
(
kafka-run-class kafka.tools.GetOffsetShellon both sides) - Offset continuity — no gaps at segment boundaries
- Schema compatibility — restored records deserialize with the current schema
- Consumer resume — a consumer group restored with the data picks up where it left off instead of reprocessing from zero
The backup best practices guide covers turning this checklist into automated, alerting-backed verification.
Frequently asked questions
How do you backup a Kafka topic?
Capture the topic's records, partition layout, offsets, and configuration to storage outside the cluster. In practice: run a backup tool such as the kafka-backup CLI with a config naming the topics and a storage backend (S3, Azure Blob, GCS, or filesystem), or archive records with a Kafka Connect S3 sink if offsets do not matter.
Can you restore a deleted Kafka topic?
Only from a backup taken before the deletion. Replication does not help — the delete propagates to replicas and mirrored clusters. From a backup, recreate the topic (create_topics: true) and restore records and consumer offsets, optionally to a new topic name.
How do you backup Kafka topics to S3?
Point a backup config at an S3 bucket (backend: s3, bucket, region, prefix) and run kafka-backup backup --config backup.yaml. Records are compressed with Zstandard or LZ4 before upload. A Kafka Connect S3 sink connector is an alternative when you only need record archiving.
How to take backup of a Kafka topic without downtime?
Backups read topics through standard consumer protocols, so producers and consumers keep running during the backup. For topics with constant writes, continuous mode streams changes instead of taking point snapshots.
How do you verify a Kafka backup?
Restore it — to a scratch cluster or with dry_run: true — and compare record counts, check offset continuity, confirm schemas deserialize, and verify a consumer group resumes from its restored offsets. Schedule this weekly; a backup that has never been restored is unproven.
Next steps: the first backup tutorial walks this end to end, and the CLI reference documents every flag used above.