Back up Apache Kafka to Google Cloud Storage
Stream compressed topic data, consumer group offsets, and cluster metadata into a GCS bucket — then restore any of it, to any cluster, at a precise moment in time.
Why Google Cloud Storage for Kafka backups
GCS is a natural backup target for Kafka running on GCP: storage priced per GB rather than per broker, lifecycle rules that move older backups to Nearline, Coldline, or Archive classes, and dual-region or multi-region buckets when the backups themselves need geographic redundancy. Because backups live outside the cluster's failure domain, they survive anything that takes the cluster down. OSO Kafka Backup writes an efficient, compressed layout to your bucket — the storage format reference covers the details.
On GKE, Workload Identity removes key management entirely: the backup pod authenticates as a Google service account with no JSON key to mount, rotate, or leak.
Configuration
Point the gcs backend at your bucket and run it. Credentials resolve through
GOOGLE_APPLICATION_CREDENTIALS or the instance metadata service unless a service account key
is set explicitly — the GCS setup guide covers bucket creation,
IAM roles, and Workload Identity binding in full.
- Service account key
- Workload Identity (GKE)
- Run it
mode: backup
storage:
backend: gcs
bucket: my-kafka-backups
prefix: backups/production
service_account_json: /etc/gcp/service-account.json
mode: backup
storage:
backend: gcs
bucket: my-kafka-backups
prefix: backups/production
# No credentials needed - the pod's Kubernetes service
# account is bound to a Google service account
kafka-backup backup --config backup.yaml
What gets backed up
Every backup captures topic records with their timestamps and headers, consumer group offsets, and topic configuration — compressed with Zstd or LZ4 before upload. Restores can target the original cluster or a brand new one, and can stop at a precise millisecond, which is what makes recovery from bad deploys and accidental deletes possible.
Frequently asked questions
How does OSO Kafka Backup authenticate to Google Cloud Storage?
Three ways: a service account JSON key set via service_account_json in the storage config, the GOOGLE_APPLICATION_CREDENTIALS environment variable, or ambient credentials — Workload Identity on GKE and the metadata service on GCE — with no key file at all.
Can I use GCS lifecycle rules with Kafka backups?
Yes. Backups are written under a configurable object prefix, so you can attach lifecycle rules that transition older backups to Nearline, Coldline, or Archive storage classes, or delete them in line with your retention policy.
Does this work with dual-region or multi-region GCS buckets?
Yes. The gcs backend addresses the bucket by name, so regional, dual-region, and multi-region buckets all work unchanged. Dual-region buckets give the backup data itself geographic redundancy without a second backup job.
Can I back up Kafka on GKE without managing key files?
Yes. Bind the backup pod’s Kubernetes service account to a Google service account with Workload Identity, grant it access to the bucket, and omit credentials from the config entirely — the setup guide walks through the binding commands.
How is backup data compressed?
Topic data is compressed with Zstd or LZ4 before upload, independent of the compression producers used, which typically reduces storage cost substantially compared with raw log segments.
Ready to protect your Kafka data?
Take your first backup in minutes with the open source CLI, or talk to us about Enterprise features like encryption, RBAC, and audit logging.