Back up Apache Kafka to Amazon S3
Stream compressed topic data, consumer group offsets, and cluster metadata into S3 — then restore any of it, to any cluster, at a precise moment in time.
Why S3 for Kafka backups
S3 gives Kafka backups exactly what a backup target needs: eleven-nines durability, storage priced per GB rather than per broker, lifecycle policies for retention tiers, and complete isolation from your cluster's failure domain. OSO Kafka Backup writes an efficient, compressed layout to your bucket — see the storage format reference for the on-disk details.
The same s3 backend also works with S3-compatible stores such as MinIO and Ceph via a
custom endpoint, so on-prem estates get the same workflow.
Configuration
Point the backup at your bucket and run it. Credentials resolve through the standard AWS credential chain (environment, instance profile, IRSA) unless set explicitly — the AWS S3 setup guide covers IAM policies and bucket configuration in full.
- Amazon S3
- MinIO / S3-compatible
- Run it
mode: backup
storage:
backend: s3
bucket: my-kafka-backups
region: us-west-2
prefix: backups/production
mode: backup
storage:
backend: s3
bucket: my-kafka-backups
endpoint: https://minio.example.com:9000
access_key: ${AWS_ACCESS_KEY_ID}
secret_key: ${AWS_SECRET_ACCESS_KEY}
kafka-backup backup --config backup.yaml
What gets backed up
Every backup captures topic records with their timestamps and headers, consumer group offsets, and topic configuration — compressed with Zstd or LZ4. Restores can target the original cluster or a brand new one, and can stop at a precise millisecond, which is what makes recovery from bad deploys and accidental deletes possible.
Frequently asked questions
Does OSO Kafka Backup work with S3-compatible storage like MinIO?
Yes. Set a custom endpoint on the s3 backend and the same backup and restore workflow works against MinIO, Ceph RGW, and other S3-compatible object stores.
How are AWS credentials provided?
By default the standard AWS credential chain is used — environment variables, shared credentials file, EC2 instance profile, or IRSA on EKS. You can also set access_key and secret_key explicitly in the storage configuration.
Can I use S3 lifecycle policies with Kafka backups?
Yes. Backups are written under a configurable key prefix, so you can attach lifecycle rules to transition older backups to infrequent access or Glacier tiers, or to expire them in line with your retention policy.
How is backup data compressed?
Topic data is compressed with Zstd or LZ4 before upload, independent of the compression producers used, which typically reduces storage cost substantially compared with raw log segments.
Ready to protect your Kafka data?
Take your first backup in minutes with the open source CLI, or talk to us about Enterprise features like encryption, RBAC, and audit logging.