Frequently Asked Questions
Common questions about OSO Kafka Backup, organized by category.
General
What is OSO Kafka Backup?
OSO Kafka Backup is an open-source, high-performance backup and restore tool for Apache Kafka, written in Rust. It provides point-in-time recovery (PITR) for Kafka topics and consumer group offsets, supports multi-cloud storage backends (S3, Azure Blob Storage, GCS), and ships as a single static binary. The project is licensed under the MIT License.
How does OSO Kafka Backup differ from MirrorMaker 2?
MirrorMaker 2 is a replication tool designed to mirror data between live Kafka clusters in real time. OSO Kafka Backup is a backup and recovery tool designed to create durable, versioned copies of your Kafka data in external object storage. Key differences:
| Capability | MirrorMaker 2 | OSO Kafka Backup |
|---|---|---|
| Primary purpose | Cross-cluster replication | Backup and restore |
| Storage target | Another Kafka cluster | Object storage (S3, GCS, Azure Blob) |
| Point-in-time recovery | No | Yes |
| Offset recovery | Limited | Full consumer group offset restore |
| Independent of Kafka | No (requires target cluster) | Yes (stores to object storage) |
Use MirrorMaker 2 for active-active or active-passive cluster topologies. Use OSO Kafka Backup for disaster recovery, compliance archival, and point-in-time restore scenarios.
How does OSO Kafka Backup compare to Confluent Replicator?
Unlike Confluent Replicator, OSO Kafka Backup:
- Stores backups in external object storage rather than requiring a destination Kafka cluster
- Supports point-in-time recovery (PITR) to restore data to any arbitrary timestamp
- Recovers consumer group offsets so applications resume from the correct position after restore
- Ships as a single binary with no dependencies on the Confluent Platform or Connect framework
- Is open source under the MIT License, with no per-broker licensing costs
Is OSO Kafka Backup production-ready?
Yes. OSO Kafka Backup is built in Rust for memory safety and high performance. In production environments it achieves throughput exceeding 100 MB/s and operates with less than 500 MB of memory. It includes built-in checkpointing for crash resilience, Prometheus metrics for observability, and has been validated across enterprise workloads.
What Kafka versions are supported?
OSO Kafka Backup supports any Kafka cluster that implements the Kafka protocol version 0.10 or later. This includes clusters running in both ZooKeeper mode and KRaft mode. The tool uses the standard Kafka consumer and producer APIs, so it is compatible with all Kafka distributions that adhere to the protocol.
What managed Kafka services are supported?
OSO Kafka Backup works with all major managed Kafka services, including:
- Amazon MSK (both provisioned and serverless)
- Confluent Cloud
- Aiven for Apache Kafka
- Redpanda (Kafka API-compatible)
- Azure Event Hubs for Kafka (Kafka protocol endpoint)
Any service that exposes a standard Kafka protocol endpoint is supported.
What is the difference between the OSS and Enterprise editions?
OSS edition includes:
- Full backup and restore functionality
- Point-in-time recovery (PITR)
- Compression (zstd, gzip, snappy, lz4)
- Prometheus metrics and monitoring
- Consumer group offset backup and restore
Enterprise edition adds:
- Client-side AES-256 encryption
- Role-based access control (RBAC)
- Audit logging
- GDPR compliance tools (data masking, right to be forgotten, field-level redaction)
- Schema Registry backup and restore
- Priority support with SLAs
Backup Operations
How do I schedule automated backups?
There are two primary approaches:
Kubernetes CronJob -- Run kafka-backup backup with stop_at_current_offsets: true on a schedule:
apiVersion: batch/v1
kind: CronJob
metadata:
name: kafka-backup-scheduled
spec:
schedule: "0 */6 * * *" # Every 6 hours
jobTemplate:
spec:
template:
spec:
containers:
- name: kafka-backup
image: ghcr.io/osodevops/kafka-backup:latest
args: ["backup", "--config", "/etc/kafka-backup/config.yaml"]
restartPolicy: OnFailure
Kafka Backup Operator -- Use the KafkaBackupSchedule CRD to define schedules declaratively:
apiVersion: kafkabackup.oso.sh/v1alpha1
kind: KafkaBackupSchedule
metadata:
name: daily-backup
spec:
schedule: "0 2 * * *"
backupSpec:
configRef:
name: backup-config
Can I back up specific topics?
Yes. Use topics.include and topics.exclude with wildcard patterns:
topics:
include:
- "orders.*"
- "payments.*"
- "inventory.updates"
exclude:
- "*.test"
- "*.staging"
Patterns use glob-style matching. If include is not specified, all topics are backed up. The exclude list takes precedence over include.
How does incremental backup work?
OSO Kafka Backup uses checkpoint-based incremental backups. A local SQLite database tracks the last committed offset for each topic-partition. On each backup run, the tool resumes consuming from the last checkpointed offset, so only new messages are read and stored. This makes subsequent backup runs significantly faster and reduces storage costs.
What happens if a backup fails mid-run?
The checkpoint mechanism ensures crash resilience. If a backup run fails or is interrupted, the checkpoint database retains the last successfully committed offset for each partition. The next backup run automatically resumes from that point. No data is lost and no duplicate data is written to storage.
How are consumer group offsets backed up?
Consumer group offsets are stored as x-original-offset headers within the backed-up messages. During restore, a three-phase process recovers them:
- Restore messages to the target cluster
- Plan offset reset using
kafka-backup offset-reset planto compute the mapping between original and new offsets - Execute offset reset using
kafka-backup offset-reset executeto commit the mapped offsets to the target cluster's consumer groups
Can I run multiple backup instances simultaneously?
Yes. You can run multiple instances of OSO Kafka Backup concurrently, provided each instance is configured to back up a different set of topics. Use non-overlapping topics.include patterns to partition the workload. Do not configure multiple instances to back up the same topic-partition, as this will result in duplicate data in storage.
How do I verify a backup?
Use the built-in validation command:
kafka-backup validate --deep --config /path/to/config.yaml
The --deep flag performs a full integrity check, verifying that all segments are present, checksums are valid, and the manifest is consistent with the stored data.
What is the maximum supported message size?
The maximum message size is governed by the Kafka cluster's max.message.bytes configuration, which defaults to 1 MB. OSO Kafka Backup has been tested with messages up to 10 MB. If your cluster uses a non-default maximum, ensure the backup tool's consumer configuration matches (via message.max.bytes in the consumer properties).
Restore & Recovery
How do I restore to a specific point in time?
Use the time_window_start and time_window_end parameters in your restore configuration, specified in epoch milliseconds:
restore:
time_window_start: 1742817600000 # 2026-03-24 12:00:00 UTC
time_window_end: 1742846400000 # 2026-03-24 20:00:00 UTC
source:
storage:
type: s3
bucket: my-kafka-backups
target:
bootstrap_servers: "target-kafka:9092"
Only messages with timestamps within the specified window will be restored.
How do I convert a date to epoch milliseconds?
Bash:
date -d "2026-03-24 12:00:00 UTC" +%s%3N
# Output: 1742817600000
Python:
from datetime import datetime
int(datetime(2026, 3, 24, 12).timestamp() * 1000)
# Output: 1742817600000
macOS (BSD date):
date -j -u -f "%Y-%m-%d %H:%M:%S" "2026-03-24 12:00:00" +%s000
Can I restore to a different cluster?
Yes. Specify the target cluster's bootstrap_servers in your restore configuration. The source and target clusters are completely independent. This is a core use case for disaster recovery -- restoring data to a standby cluster in a different region or cloud provider.
Can I restore to a different topic name?
Yes. Use the topic_mapping configuration to remap topic names during restore:
restore:
topic_mapping:
"orders.production": "orders.restored"
"payments.production": "payments.restored"
How do I recover consumer offsets after a restore?
Use the two-step offset reset workflow:
# Step 1: Generate the offset mapping plan
kafka-backup offset-reset plan \
--config /path/to/config.yaml \
--output offset-plan.json
# Step 2: Review and execute the plan
kafka-backup offset-reset execute \
--plan offset-plan.json \
--target-bootstrap-servers target-kafka:9092
The plan maps original offsets to the corresponding offsets in the restored topic, accounting for any gaps or reordering.
How long does a restore take?
Restore duration depends on the data volume, storage backend read throughput, network bandwidth, and target cluster write capacity. Under optimal conditions, OSO Kafka Backup achieves approximately 100 MB/s restore throughput. For example, restoring 1 TB of data takes roughly 2.5 to 3 hours.
Can I restore a subset of partitions?
Yes. Use the source_partitions configuration to specify which partitions to restore:
restore:
source_partitions: [0, 1, 2, 5]
Only the specified partitions will be restored from the backup.
What happens if the target topic already has data?
OSO Kafka Backup appends data to the target topic; it does not overwrite or truncate existing data. If you need a clean restore, create a new topic (or use topic_mapping to restore to a different topic name) to avoid mixing existing and restored data.
Storage
What storage backends are supported?
OSO Kafka Backup supports:
- Amazon S3
- Azure Blob Storage
- Google Cloud Storage (GCS)
- S3-compatible storage (MinIO, Ceph, Wasabi, DigitalOcean Spaces)
- Local filesystem (for testing and development)
How much storage will my backups consume?
Estimate storage as:
storage_required = raw_data_size / compression_ratio
Compression ratios vary by data type:
| Data Type | Compression (zstd) | Example |
|---|---|---|
| JSON | 5:1 to 7:1 | 1 TB raw ≈ 200-300 GB compressed |
| Avro | 2:1 to 3:1 | 1 TB raw ≈ 350-500 GB compressed |
| Protobuf | 2:1 to 3:1 | 1 TB raw ≈ 350-500 GB compressed |
| Already compressed | ~1:1 | No significant reduction |
What is the backup storage format?
Backups are organized as follows:
backup-root/
├── manifest.json
├── state/
│ └── offsets.db
└── topics/
└── {topic-name}/
└── partition={id}/
├── segment-000000000000.zst
├── segment-000000001000.zst
└── ...
manifest.json-- Metadata about the backup (topics, partitions, offsets, timestamps)state/offsets.db-- SQLite checkpoint database tracking committed offsetstopics/{topic}/partition={id}/segment-NNNN.zst-- Compressed data segments
Can I access backup data without restoring?
Yes. Use the describe command to inspect backup metadata:
kafka-backup describe --config /path/to/config.yaml
You can also directly access objects in S3 (or other storage) using standard tools such as the AWS CLI, gsutil, or az storage blob. Segment files are compressed with the configured algorithm (e.g., zstd) and contain Kafka records in a binary format.
Can I migrate backups between storage backends?
Yes. Since backups are stored as standard objects, you can copy them between backends using tools like aws s3 sync, gsutil rsync, azcopy, or rclone. After copying, update your restore configuration to point to the new storage location.
Does OSO Kafka Backup work with S3-compatible storage?
Yes. Configure the endpoint URL to point to your S3-compatible service:
storage:
type: s3
bucket: my-backups
region: us-east-1
endpoint: "https://minio.internal:9000"
force_path_style: true
This works with MinIO, Ceph Object Gateway, Wasabi, DigitalOcean Spaces, and other S3-compatible services.
Performance
What throughput can I expect?
Under optimal conditions, OSO Kafka Backup achieves 100+ MB/s for both backup and restore operations. Actual throughput depends on:
- Network bandwidth between Kafka, the backup tool, and storage
- Storage backend write/read latency
- Compression algorithm and level
- Message size (larger messages achieve higher throughput)
- Number of partitions being processed concurrently
How much memory does OSO Kafka Backup use?
Typical memory usage is under 500 MB when processing 4 partitions concurrently. Memory consumption scales with the number of concurrent partitions and the configured segment size. For high-concurrency workloads, monitor RSS via the process_resident_memory_bytes Prometheus metric and adjust segment_max_bytes or concurrency settings accordingly.
How do I tune for maximum throughput?
Refer to PE-01: Throughput Optimisation in the Performance Efficiency pillar. Key tuning parameters:
- Segment size: Increase
segment_max_bytesto reduce the number of storage write operations - Fetch size: Increase
fetch.max.bytesandmax.partition.fetch.bytesin the consumer config - Compression level: Use a lower zstd compression level (e.g., 1-3) for faster compression at the cost of slightly larger files
- Co-location: Deploy the backup tool in the same region and availability zone as the Kafka cluster and storage backend
What impact does backup have on the Kafka cluster?
Minimal. OSO Kafka Backup operates as a standard Kafka consumer. It does not require any broker restarts, plugins, or configuration changes. The impact is equivalent to adding another consumer to the cluster. For latency-sensitive workloads, consider configuring a dedicated consumer group and using rack-aware replica fetching.
How do I benchmark performance?
The kafka-backup-demos repository includes a benchmark suite that generates synthetic workloads and measures backup/restore throughput under various configurations. Use it to establish baselines for your environment before deploying to production.
Security & Compliance
Is backup data encrypted?
Server-side encryption: All major cloud storage providers offer server-side encryption (SSE-S3, SSE-KMS, Azure Storage Service Encryption, GCS default encryption). Enable this on your storage bucket for encryption at rest.
Client-side encryption (Enterprise): The Enterprise edition supports client-side AES-256 encryption, where data is encrypted before it leaves the backup tool. This ensures data is encrypted in transit to storage and at rest, regardless of the storage provider's encryption settings.
How do I configure TLS or mTLS?
Set the security protocol and certificate paths in your configuration:
kafka:
bootstrap_servers: "kafka:9093"
security_protocol: "SSL" # or "SASL_SSL" for SASL + TLS
ssl_ca_location: "/certs/ca.pem"
ssl_certificate_location: "/certs/client.pem"
ssl_key_location: "/certs/client-key.pem"
For mTLS, provide both the client certificate and key. The CA certificate is used to verify the broker's identity.
What SASL authentication mechanisms are supported?
OSO Kafka Backup supports the following SASL mechanisms:
- PLAIN -- Username and password (use with TLS)
- SCRAM-SHA-256 -- Salted Challenge Response Authentication
- SCRAM-SHA-512 -- Salted Challenge Response Authentication (stronger hash)
kafka:
security_protocol: "SASL_SSL"
sasl_mechanism: "SCRAM-SHA-512"
sasl_username: "backup-user"
sasl_password: "${KAFKA_SASL_PASSWORD}"
How does OSO Kafka Backup support GDPR compliance?
The Enterprise edition provides GDPR compliance tools:
- Data masking: Redact or mask personally identifiable information (PII) during backup
- Right to be forgotten: Delete specific records from backups by key
- Field-level redaction: Selectively redact fields within messages while preserving the rest of the record
These features enable compliance with data protection regulations without sacrificing backup completeness.
How do I restrict who can perform restore operations?
Multiple layers of access control are available:
- Enterprise RBAC: Define roles (backup-operator, restore-operator, admin) with fine-grained permissions
- IAM policies: Restrict access to storage buckets using AWS IAM, Azure RBAC, or GCP IAM
- Kubernetes RBAC: Limit which service accounts can create
KafkaRestorecustom resources
Kubernetes & Deployment
How do I deploy on Kubernetes?
Install the Kafka Backup Operator via Helm:
helm repo add oso https://charts.oso.sh
helm repo update
helm install kafka-backup-operator oso/kafka-backup-operator \
--namespace kafka-backup \
--create-namespace
Then create backup and restore resources using CRDs:
apiVersion: kafkabackup.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: production-backup
spec:
configRef:
name: backup-config
What Kubernetes versions are supported?
OSO Kafka Backup Operator requires Kubernetes 1.24 or later. It is tested against the latest three minor versions of Kubernetes.
Can I use ArgoCD or Flux for GitOps deployments?
Yes. Store your KafkaBackup, KafkaRestore, and KafkaBackupSchedule CRD manifests in a Git repository. Configure an ArgoCD Application or Flux Kustomization pointing to the manifests directory:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: kafka-backup
spec:
source:
repoURL: https://github.com/myorg/k8s-manifests
path: kafka-backup/
targetRevision: main
destination:
server: https://kubernetes.default.svc
namespace: kafka-backup
Can I deploy outside of Kubernetes?
Yes. OSO Kafka Backup ships as a standalone static binary that runs on bare metal, virtual machines, and Docker containers. No Kubernetes or container orchestration is required:
# Download the binary
curl -LO https://github.com/osodevops/kafka-backup/releases/latest/download/kafka-backup-linux-amd64
# Run directly
./kafka-backup-linux-amd64 backup --config /etc/kafka-backup/config.yaml
How do I monitor OSO Kafka Backup in Kubernetes?
The operator exposes Prometheus metrics on port 8080. Create a ServiceMonitor to scrape them:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kafka-backup-metrics
spec:
selector:
matchLabels:
app: kafka-backup
endpoints:
- port: metrics
interval: 15s
Pair this with the provided Grafana dashboards from the kafka-backup-demos repository for comprehensive visibility.
Enterprise
What features are included in the Enterprise edition?
The Enterprise edition extends the OSS version with:
- AES-256 client-side encryption for backup data
- Role-based access control (RBAC) for backup and restore operations
- Audit logging for all operations with tamper-proof log storage
- GDPR compliance tools including data masking, right to be forgotten, and field-level redaction
- Schema Registry backup and restore for Avro, Protobuf, and JSON Schema
- Priority support with defined SLAs
How do I get an Enterprise licence?
Contact the OSO sales team at oso.sh to discuss your requirements and obtain a licence key.
Is there a trial available?
Yes. A 30-day evaluation licence is available that provides full access to all Enterprise features. Contact the sales team to request a trial.
What support is included with Enterprise?
Enterprise support includes:
- Critical issues (P1): 24/7 response with a 1-hour initial response time
- Standard issues (P2-P4): Business hours support with response times based on severity
- Dedicated Slack channel for direct communication with the engineering team
- Quarterly architecture reviews to ensure your deployment follows best practices
Troubleshooting
How do I enable debug logging?
Use the -v flag for debug-level logging or -vv for trace-level:
# Debug logging
kafka-backup -v backup --config /path/to/config.yaml
# Trace logging (very verbose)
kafka-backup -vv backup --config /path/to/config.yaml
Alternatively, set the RUST_LOG environment variable:
RUST_LOG=debug kafka-backup backup --config /path/to/config.yaml
My backup is running slowly. How do I diagnose this?
Check the following, in order:
- Network latency: Measure latency between the backup tool and both the Kafka cluster and storage backend
- Storage write latency: Monitor the
kafka_backup_storage_write_duration_secondsPrometheus metric - Compression overhead: Try a faster compression level or algorithm (e.g., lz4 instead of zstd)
- Resource utilisation: Check CPU and memory usage on the host running the backup
- Consumer lag: Monitor
kafka_backup_consumer_lagto see if the tool is keeping up with producers
I am getting a connection error. What should I check?
Verify the following:
- Bootstrap servers: Ensure the
bootstrap_serversaddress is correct and resolvable - TLS certificates: Verify certificates are valid, not expired, and the CA chain is complete
- Network connectivity: Confirm the backup tool can reach the Kafka brokers on the configured port (e.g.,
telnet kafka-broker 9093) - Firewall rules: Check that security groups, NACLs, or firewall rules allow traffic on the Kafka port
- Kafka ACLs: Ensure the backup user has
READandDESCRIBEpermissions on the target topics and consumer group
My restore is failing. How do I troubleshoot?
Follow these steps:
- Validate the backup first:
kafka-backup validate --deep --config /path/to/config.yaml - Check target connectivity: Verify the restore tool can reach the target Kafka cluster
- Verify IAM/storage permissions: Ensure the restore process has read access to the backup storage location
- Check disk space: Ensure sufficient local disk space for temporary decompression buffers
- Review error logs: Enable debug logging (
-v) and check for specific error messages
How do I report a bug or get community support?
- Bug reports: Open an issue on GitHub at github.com/osodevops/kafka-backup/issues
- Community support: Start a discussion at GitHub Discussions
- Enterprise support: Use your dedicated Slack channel or contact the support team directly