Frequently Asked Questions
Common questions about OSO Kafka Backup, organized by category.
General
What is OSO Kafka Backup?
OSO Kafka Backup is an open-source, high-performance backup and restore tool for Apache Kafka, written in Rust. It provides point-in-time recovery (PITR) for Kafka topics and consumer group offsets, supports multi-cloud storage backends (S3, Azure Blob Storage, GCS), and ships as a single static binary. The project is licensed under the MIT License.
How does OSO Kafka Backup differ from MirrorMaker 2?
MirrorMaker 2 is a replication tool designed to mirror data between live Kafka clusters in real time. OSO Kafka Backup is a backup and recovery tool designed to create durable, versioned copies of your Kafka data in external object storage. Key differences:
| Capability | MirrorMaker 2 | OSO Kafka Backup |
|---|---|---|
| Primary purpose | Cross-cluster replication | Backup and restore |
| Storage target | Another Kafka cluster | Object storage (S3, GCS, Azure Blob) |
| Point-in-time recovery | No | Yes |
| Offset recovery | Limited | Full consumer group offset restore |
| Independent of Kafka | No (requires target cluster) | Yes (stores to object storage) |
Use MirrorMaker 2 for active-active or active-passive cluster topologies. Use OSO Kafka Backup for disaster recovery, compliance archival, and point-in-time restore scenarios.
Why should I use OSO Kafka Backup instead of running a mirrored standby cluster?
Running a standby cluster with MirrorMaker 2 or Confluent Replicator means paying for two full Kafka clusters around the clock, plus the operational burden of maintaining the replication pipeline. Mirror setups are fragile — they can break during Kafka version upgrades, require ongoing certificate and configuration management, and add significant infrastructure cost.
More critically, mirroring propagates deletes. If data is accidentally deleted or a bad producer pushes destructive events at 9:00 AM, that deletion is faithfully replicated to your standby cluster. You have no way to go back.
OSO Kafka Backup stores immutable, incremental snapshots in object storage (S3, GCS, Azure Blob). This gives you:
- Point-in-time recovery — restore to any previous backup, not just "current state"
- Delete protection — deletions are never propagated to your backups
- Dramatic cost reduction — object storage is 10–100× cheaper than running a second Kafka cluster
- Simpler operations — no MirrorMaker to configure, monitor, or fix after upgrades
If you need active-active replication for low-latency failover, mirroring is the right tool. If your goal is disaster recovery with the ability to restore within hours, backup to object storage is simpler, safer, and cheaper.
How does OSO Kafka Backup compare to Confluent Replicator?
Unlike Confluent Replicator, OSO Kafka Backup:
- Stores backups in external object storage rather than requiring a destination Kafka cluster
- Supports point-in-time recovery (PITR) to restore data to any arbitrary timestamp
- Recovers consumer group offsets so applications resume from the correct position after restore
- Ships as a single binary with no dependencies on the Confluent Platform or Connect framework
- Is open source under the MIT License, with no per-broker licensing costs
Is OSO Kafka Backup production-ready?
Yes. OSO Kafka Backup is built in Rust for memory safety and high performance. In production environments it achieves throughput exceeding 100 MB/s and operates with less than 500 MB of memory. It includes built-in checkpointing for crash resilience, Prometheus metrics for observability, and has been validated across enterprise workloads.
What Kafka versions are supported?
OSO Kafka Backup supports any Kafka cluster that implements the Kafka protocol version 0.10 or later. This includes clusters running in both ZooKeeper mode and KRaft mode. The tool uses the standard Kafka consumer and producer APIs, so it is compatible with all Kafka distributions that adhere to the protocol.
What managed Kafka services are supported?
OSO Kafka Backup works with all major managed Kafka services, including:
- Amazon MSK (both provisioned and serverless)
- Confluent Cloud
- Aiven for Apache Kafka
- Redpanda (Kafka API-compatible)
- Azure Event Hubs for Kafka (Kafka protocol endpoint)
Any service that exposes a standard Kafka protocol endpoint is supported.
OSO Kafka Backup also works with Kafka-compatible platforms that implement the Kafka wire protocol, including:
- AutoMQ (cloud-native Kafka with tiered storage)
- WarpStream (Kafka-compatible, zero-disk architecture)
If the platform speaks the Kafka protocol, OSO Kafka Backup can back it up.
Can I start with open source and upgrade to Enterprise later?
Yes. The open source and Enterprise editions use the same backup storage format. You can start with the OSS edition, build your backup infrastructure, and upgrade to Enterprise at any time without migrating or re-creating existing backups.
Typical triggers for upgrading to Enterprise:
- You adopt a Schema Registry and need schema backup and restore
- You need data masking or GDPR compliance tools (field-level redaction, right to be forgotten)
- You require RBAC to control who can perform backup and restore operations
- You want priority support with SLAs and a dedicated Slack channel
The Enterprise licence is applied as a configuration change — no re-deployment or data migration is required.
What is the difference between the OSS and Enterprise editions?
OSS edition includes:
- Full backup and restore functionality
- Point-in-time recovery (PITR)
- Compression (zstd, gzip, snappy, lz4)
- Prometheus metrics and monitoring
- Consumer group offset backup and restore
Enterprise edition adds:
- Client-side AES-256 encryption
- Role-based access control (RBAC)
- Audit logging
- GDPR compliance tools (data masking, right to be forgotten, field-level redaction)
- Schema Registry backup and restore
- Priority support with SLAs
Backup Operations
How do I schedule automated backups?
There are two primary approaches:
Kubernetes CronJob -- Run kafka-backup backup with stop_at_current_offsets: true on a schedule:
apiVersion: batch/v1
kind: CronJob
metadata:
name: kafka-backup-scheduled
spec:
schedule: "0 */6 * * *" # Every 6 hours
jobTemplate:
spec:
template:
spec:
containers:
- name: kafka-backup
image: ghcr.io/osodevops/kafka-backup:latest
args: ["backup", "--config", "/etc/kafka-backup/config.yaml"]
restartPolicy: OnFailure
Kafka Backup Operator -- Use the KafkaBackupSchedule CRD to define schedules declaratively:
apiVersion: kafkabackup.oso.sh/v1alpha1
kind: KafkaBackupSchedule
metadata:
name: daily-backup
spec:
schedule: "0 2 * * *"
backupSpec:
configRef:
name: backup-config
Do I need continuous backup, or are scheduled (cron) backups sufficient?
It depends on your Recovery Point Objective (RPO) — how much data you can afford to lose in a disaster:
| Approach | RPO | Best for |
|---|---|---|
| Scheduled (cron) — e.g., hourly | Up to 1 hour of data loss | Most disaster recovery scenarios |
| Continuous | Near-zero data loss | Mission-critical streams where every message matters |
For most organisations, hourly scheduled backups provide a practical balance between protection and simplicity. If your Kafka cluster fails, you lose at most one hour of data and can restore the rest from the last backup.
Use continuous mode when your data stream is the source of truth (e.g., event-sourced architectures) and any data loss is unacceptable.
Both modes use the same incremental mechanism — only new messages since the last checkpoint are captured — so the operational cost difference is minimal.
How does OSO Kafka Backup protect against accidental deletions?
Unlike cluster mirroring, OSO Kafka Backup does not propagate deletes to your backups. Backups are immutable snapshots stored in object storage. If a bad producer pushes destructive events, a topic is accidentally deleted, or a compaction policy removes data unexpectedly, your backups remain intact.
To recover from an accidental deletion:
- Identify the timestamp just before the deletion occurred
- Configure a point-in-time restore with
time_window_endset to that timestamp - Restore to the original or a new cluster
This is a fundamental advantage over mirrored standby clusters, where deletes are faithfully replicated and there is no way to roll back.
How long are backup snapshots retained?
All incremental backup snapshots are retained in your object storage bucket indefinitely by default. You can restore to any previous backup point, not just the latest.
Retention is governed by your storage provider's lifecycle policies. For example, you can configure S3 Lifecycle Rules to:
- Transition older backups to cheaper storage tiers (e.g., S3 Glacier after 90 days)
- Automatically delete backups older than a defined retention period (e.g., 1 year)
OSO Kafka Backup does not delete previous snapshots when creating new ones. Your backup history grows incrementally over time, and you control how long it is kept.
Can I back up specific topics?
Yes. Use topics.include and topics.exclude with wildcard patterns:
topics:
include:
- "orders.*"
- "payments.*"
- "inventory.updates"
exclude:
- "*.test"
- "*.staging"
Patterns use glob-style matching. If include is not specified, all topics are backed up. The exclude list takes precedence over include.
How does incremental backup work?
OSO Kafka Backup uses checkpoint-based incremental backups. A local SQLite database tracks the last committed offset for each topic-partition. On each backup run, the tool resumes consuming from the last checkpointed offset, so only new messages are read and stored. This makes subsequent backup runs significantly faster and reduces storage costs.
How to enable it:
- Continuous mode (
continuous: true): Incremental tracking is automatic — the offset store is created and managed for you. - One-shot or snapshot mode (v0.13.5+): Add the
offset_storagesection to your config to enable incremental behavior:
offset_storage:
db_path: /data/offsets.db
sync_interval_secs: 30
Without offset_storage, one-shot and snapshot backups start from start_offset (default: earliest) on every run, producing a full backup each time. With it, each run resumes from the last checkpoint.
See the Incremental Backups Guide for a step-by-step walkthrough.
What happens if a backup fails mid-run?
The checkpoint mechanism ensures crash resilience. If a backup run fails or is interrupted, the checkpoint database retains the last successfully committed offset for each partition. The next backup run automatically resumes from that point. No data is lost and no duplicate data is written to storage.
How are consumer group offsets backed up?
Consumer group offsets are stored as x-original-offset headers within the backed-up messages. During restore, a three-phase process recovers them:
- Restore messages to the target cluster
- Plan offset reset using
kafka-backup offset-reset planto compute the mapping between original and new offsets - Execute offset reset using
kafka-backup offset-reset executeto commit the mapped offsets to the target cluster's consumer groups
Can I run multiple backup instances simultaneously?
Yes. You can run multiple instances of OSO Kafka Backup concurrently, provided each instance is configured to back up a different set of topics. Use non-overlapping topics.include patterns to partition the workload. Do not configure multiple instances to back up the same topic-partition, as this will result in duplicate data in storage.
How do I verify a backup?
Use the built-in validation command:
kafka-backup validate --deep --config /path/to/config.yaml
The --deep flag performs a full integrity check, verifying that all segments are present, checksums are valid, and the manifest is consistent with the stored data.
What is the maximum supported message size?
The maximum message size is governed by the Kafka cluster's max.message.bytes configuration, which defaults to 1 MB. OSO Kafka Backup has been tested with messages up to 10 MB. If your cluster uses a non-default maximum, ensure the backup tool's consumer configuration matches (via message.max.bytes in the consumer properties).
Restore & Recovery
How do I restore to a specific point in time?
Use the time_window_start and time_window_end parameters in your restore configuration, specified in epoch milliseconds:
restore:
time_window_start: 1742817600000 # 2026-03-24 12:00:00 UTC
time_window_end: 1742846400000 # 2026-03-24 20:00:00 UTC
source:
storage:
type: s3
bucket: my-kafka-backups
target:
bootstrap_servers: "target-kafka:9092"
Only messages with timestamps within the specified window will be restored.
How do I convert a date to epoch milliseconds?
Bash:
date -d "2026-03-24 12:00:00 UTC" +%s%3N
# Output: 1742817600000
Python:
from datetime import datetime
int(datetime(2026, 3, 24, 12).timestamp() * 1000)
# Output: 1742817600000
macOS (BSD date):
date -j -u -f "%Y-%m-%d %H:%M:%S" "2026-03-24 12:00:00" +%s000
Can I restore production data into non-production environments?
Yes. You can restore any backup — or a subset of it — to a completely different cluster. This is useful for:
- Reproducing production bugs in a staging environment with real data
- Populating test environments with representative datasets
- Data analysis on a separate cluster without impacting production
Use topic_mapping to restore to different topic names and source_partitions to restore only specific partitions. You can also use time_window_start and time_window_end to restore a specific time slice of data rather than the full history.
restore:
time_window_start: 1742817600000
time_window_end: 1742846400000
topic_mapping:
"articles.production": "articles.staging"
source:
storage:
type: s3
bucket: prod-kafka-backups
target:
bootstrap_servers: "staging-kafka:9092"
How does OSO Kafka Backup handle large topics with unlimited retention?
OSO Kafka Backup handles large, unbounded topics efficiently thanks to its incremental backup mechanism. Only new messages since the last checkpoint are read and stored on each backup run, regardless of the total topic size.
Real-world characteristics:
- Tested with topics exceeding 800 GB and 20+ million messages
- Single-partition topics are fully supported, including those where message ordering is critical
- Restore time scales with data volume — under optimal conditions, an 800 GB topic restores in approximately 20 minutes, though actual performance depends on storage backend throughput, network bandwidth, and target cluster write capacity
For very large topics, consider:
- Using
zstdcompression to reduce storage footprint (5–7× compression for JSON data) - Deploying the backup tool in the same region and availability zone as your Kafka cluster and storage backend
- Monitoring the
kafka_backup_consumer_lagmetric to ensure backups keep pace with producers
Can I restore to a different cluster?
Yes. Specify the target cluster's bootstrap_servers in your restore configuration. The source and target clusters are completely independent. This is a core use case for disaster recovery -- restoring data to a standby cluster in a different region or cloud provider.
Can I restore to a different topic name?
Yes. Use the topic_mapping configuration to remap topic names during restore:
restore:
topic_mapping:
"orders.production": "orders.restored"
"payments.production": "payments.restored"
How do I recover consumer offsets after a restore?
Use the two-step offset reset workflow:
# Step 1: Generate the offset mapping plan
kafka-backup offset-reset plan \
--config /path/to/config.yaml \
--output offset-plan.json
# Step 2: Review and execute the plan
kafka-backup offset-reset execute \
--plan offset-plan.json \
--target-bootstrap-servers target-kafka:9092
The plan maps original offsets to the corresponding offsets in the restored topic, accounting for any gaps or reordering.
How long does a restore take?
Restore duration depends on the data volume, storage backend read throughput, network bandwidth, and target cluster write capacity. Under optimal conditions, OSO Kafka Backup achieves approximately 100 MB/s restore throughput. For example, restoring 1 TB of data takes roughly 2.5 to 3 hours.
Can I restore a subset of partitions?
Yes. Use the source_partitions configuration to specify which partitions to restore:
restore:
source_partitions: [0, 1, 2, 5]
Only the specified partitions will be restored from the backup.
What happens if the target topic already has data?
OSO Kafka Backup appends data to the target topic; it does not overwrite or truncate existing data. If you need a clean restore, create a new topic (or use topic_mapping to restore to a different topic name) to avoid mixing existing and restored data.
Storage
What storage backends are supported?
OSO Kafka Backup supports:
- Amazon S3
- Azure Blob Storage
- Google Cloud Storage (GCS)
- S3-compatible storage (MinIO, Ceph, Wasabi, DigitalOcean Spaces)
- Local filesystem (for testing and development)
How much storage will my backups consume?
Estimate storage as:
storage_required = raw_data_size / compression_ratio
Compression ratios vary by data type:
| Data Type | Compression (zstd) | Example |
|---|---|---|
| JSON | 5:1 to 7:1 | 1 TB raw ≈ 200-300 GB compressed |
| Avro | 2:1 to 3:1 | 1 TB raw ≈ 350-500 GB compressed |
| Protobuf | 2:1 to 3:1 | 1 TB raw ≈ 350-500 GB compressed |
| Already compressed | ~1:1 | No significant reduction |
What is the backup storage format?
Backups are organized as follows:
backup-root/
├── manifest.json
├── state/
│ └── offsets.db
└── topics/
└── {topic-name}/
└── partition={id}/
├── segment-000000000000.zst
├── segment-000000001000.zst
└── ...
manifest.json-- Metadata about the backup (topics, partitions, offsets, timestamps)state/offsets.db-- SQLite checkpoint database tracking committed offsetstopics/{topic}/partition={id}/segment-NNNN.zst-- Compressed data segments
Can I access backup data without restoring?
Yes. Use the describe command to inspect backup metadata:
kafka-backup describe --config /path/to/config.yaml
You can also directly access objects in S3 (or other storage) using standard tools such as the AWS CLI, gsutil, or az storage blob. Segment files are compressed with the configured algorithm (e.g., zstd) and contain Kafka records in a binary format.
Can I migrate backups between storage backends?
Yes. Since backups are stored as standard objects, you can copy them between backends using tools like aws s3 sync, gsutil rsync, azcopy, or rclone. After copying, update your restore configuration to point to the new storage location.
Does OSO Kafka Backup work with S3-compatible storage?
Yes. Configure the endpoint URL to point to your S3-compatible service:
storage:
type: s3
bucket: my-backups
region: us-east-1
endpoint: "https://minio.internal:9000"
force_path_style: true
This works with MinIO, Ceph Object Gateway, Wasabi, DigitalOcean Spaces, and other S3-compatible services.
Performance
What throughput can I expect?
Under optimal conditions, OSO Kafka Backup achieves 100+ MB/s for both backup and restore operations. Actual throughput depends on:
- Network bandwidth between Kafka, the backup tool, and storage
- Storage backend write/read latency
- Compression algorithm and level
- Message size (larger messages achieve higher throughput)
- Number of partitions being processed concurrently
How much memory does OSO Kafka Backup use?
Typical memory usage is under 500 MB when processing 4 partitions concurrently. Memory consumption scales with the number of concurrent partitions and the configured segment size. For high-concurrency workloads, monitor RSS via the process_resident_memory_bytes Prometheus metric and adjust segment_max_bytes or concurrency settings accordingly.
How do I tune for maximum throughput?
Refer to PE-01: Throughput Optimisation in the Performance Efficiency pillar. Key tuning parameters:
- Segment size: Increase
segment_max_bytesto reduce the number of storage write operations - Fetch size: Increase
fetch.max.bytesandmax.partition.fetch.bytesin the consumer config - Compression level: Use a lower zstd compression level (e.g., 1-3) for faster compression at the cost of slightly larger files
- Co-location: Deploy the backup tool in the same region and availability zone as the Kafka cluster and storage backend
What impact does backup have on the Kafka cluster?
Minimal. OSO Kafka Backup operates as a standard Kafka consumer. It does not require any broker restarts, plugins, or configuration changes. The impact is equivalent to adding another consumer to the cluster. For latency-sensitive workloads, consider configuring a dedicated consumer group and using rack-aware replica fetching.
How do I benchmark performance?
The kafka-backup-demos repository includes a benchmark suite that generates synthetic workloads and measures backup/restore throughput under various configurations. Use it to establish baselines for your environment before deploying to production.
Security & Compliance
Is backup data encrypted?
Server-side encryption: All major cloud storage providers offer server-side encryption (SSE-S3, SSE-KMS, Azure Storage Service Encryption, GCS default encryption). Enable this on your storage bucket for encryption at rest.
Client-side encryption (Enterprise): The Enterprise edition supports client-side AES-256 encryption, where data is encrypted before it leaves the backup tool. This ensures data is encrypted in transit to storage and at rest, regardless of the storage provider's encryption settings.
How do I configure TLS or mTLS?
Set the security protocol and certificate paths in your configuration:
kafka:
bootstrap_servers: "kafka:9093"
security_protocol: "SSL" # or "SASL_SSL" for SASL + TLS
ssl_ca_location: "/certs/ca.pem"
ssl_certificate_location: "/certs/client.pem"
ssl_key_location: "/certs/client-key.pem"
For mTLS, provide both the client certificate and key. The CA certificate is used to verify the broker's identity.
What SASL authentication mechanisms are supported?
OSO Kafka Backup supports the following SASL mechanisms:
- PLAIN -- Username and password (use with TLS)
- SCRAM-SHA-256 -- Salted Challenge Response Authentication
- SCRAM-SHA-512 -- Salted Challenge Response Authentication (stronger hash)
kafka:
security_protocol: "SASL_SSL"
sasl_mechanism: "SCRAM-SHA-512"
sasl_username: "backup-user"
sasl_password: "${KAFKA_SASL_PASSWORD}"
How does OSO Kafka Backup support GDPR compliance?
The Enterprise edition provides GDPR compliance tools:
- Data masking: Redact or mask personally identifiable information (PII) during backup
- Right to be forgotten: Delete specific records from backups by key
- Field-level redaction: Selectively redact fields within messages while preserving the rest of the record
These features enable compliance with data protection regulations without sacrificing backup completeness.
How do I restrict who can perform restore operations?
Multiple layers of access control are available:
- Enterprise RBAC: Define roles (backup-operator, restore-operator, admin) with fine-grained permissions
- IAM policies: Restrict access to storage buckets using AWS IAM, Azure RBAC, or GCP IAM
- Kubernetes RBAC: Limit which service accounts can create
KafkaRestorecustom resources
Kubernetes & Deployment
How do I deploy on Kubernetes?
Install the Kafka Backup Operator via Helm:
helm repo add oso https://charts.oso.sh
helm repo update
helm install kafka-backup-operator oso/kafka-backup-operator \
--namespace kafka-backup \
--create-namespace
Then create backup and restore resources using CRDs:
apiVersion: kafkabackup.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: production-backup
spec:
configRef:
name: backup-config
What Kubernetes versions are supported?
OSO Kafka Backup Operator requires Kubernetes 1.24 or later. It is tested against the latest three minor versions of Kubernetes.
Can I deploy separate backup operators per team or product line?
Yes. You can deploy the Kafka Backup Operator into multiple Kubernetes namespaces, with each instance managing backups for a specific team, product line, or environment. This provides:
- Blast-radius isolation — a misconfiguration in one namespace does not affect others
- Fine-grained IAM — bind each operator to a dedicated IAM role with access to only its S3 bucket or prefix
- Independent lifecycle management — each team can manage their own backup schedules and retention policies
# Team A — news platform backups
apiVersion: kafkabackup.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: news-backup
namespace: team-news
spec:
configRef:
name: news-backup-config
---
# Team B — streaming platform backups
apiVersion: kafkabackup.oso.sh/v1alpha1
kind: KafkaBackup
metadata:
name: streaming-backup
namespace: team-streaming
spec:
configRef:
name: streaming-backup-config
Alternatively, a single operator instance can manage multiple backup configurations in one namespace if you prefer centralised management.
Can I use ArgoCD or Flux for GitOps deployments?
Yes. Store your KafkaBackup, KafkaRestore, and KafkaBackupSchedule CRD manifests in a Git repository. Configure an ArgoCD Application or Flux Kustomization pointing to the manifests directory:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: kafka-backup
spec:
source:
repoURL: https://github.com/myorg/k8s-manifests
path: kafka-backup/
targetRevision: main
destination:
server: https://kubernetes.default.svc
namespace: kafka-backup
Can I deploy outside of Kubernetes?
Yes. OSO Kafka Backup ships as a standalone static binary that runs on bare metal, virtual machines, and Docker containers. No Kubernetes or container orchestration is required:
# Download the binary
curl -LO https://github.com/osodevops/kafka-backup/releases/latest/download/kafka-backup-linux-amd64
# Run directly
./kafka-backup-linux-amd64 backup --config /etc/kafka-backup/config.yaml
How do I monitor OSO Kafka Backup in Kubernetes?
The operator exposes Prometheus metrics on port 8080. Create a ServiceMonitor to scrape them:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kafka-backup-metrics
spec:
selector:
matchLabels:
app: kafka-backup
endpoints:
- port: metrics
interval: 15s
Pair this with the provided Grafana dashboards from the kafka-backup-demos repository for comprehensive visibility.
Enterprise
What features are included in the Enterprise edition?
The Enterprise edition extends the OSS version with:
- AES-256 client-side encryption for backup data
- Role-based access control (RBAC) for backup and restore operations
- Audit logging for all operations with tamper-proof log storage
- GDPR compliance tools including data masking, right to be forgotten, and field-level redaction
- Schema Registry backup and restore for Avro, Protobuf, and JSON Schema
- Priority support with defined SLAs
How do I get an Enterprise licence?
Contact the OSO sales team at oso.sh to discuss your requirements and obtain a licence key.
Is there a trial available?
Yes. A 30-day evaluation licence is available that provides full access to all Enterprise features. Contact the sales team to request a trial.
What support is included with Enterprise?
Enterprise support includes:
- Critical issues (P1): 24/7 response with a 1-hour initial response time
- Standard issues (P2-P4): Business hours support with response times based on severity
- Dedicated Slack channel for direct communication with the engineering team
- Quarterly architecture reviews to ensure your deployment follows best practices
Troubleshooting
How do I enable debug logging?
Use the -v flag for debug-level logging or -vv for trace-level:
# Debug logging
kafka-backup -v backup --config /path/to/config.yaml
# Trace logging (very verbose)
kafka-backup -vv backup --config /path/to/config.yaml
Alternatively, set the RUST_LOG environment variable:
RUST_LOG=debug kafka-backup backup --config /path/to/config.yaml
My backup is running slowly. How do I diagnose this?
Check the following, in order:
- Network latency: Measure latency between the backup tool and both the Kafka cluster and storage backend
- Storage write latency: Monitor the
kafka_backup_storage_write_duration_secondsPrometheus metric - Compression overhead: Try a faster compression level or algorithm (e.g., lz4 instead of zstd)
- Resource utilisation: Check CPU and memory usage on the host running the backup
- Consumer lag: Monitor
kafka_backup_consumer_lagto see if the tool is keeping up with producers
I am getting a connection error. What should I check?
Verify the following:
- Bootstrap servers: Ensure the
bootstrap_serversaddress is correct and resolvable - TLS certificates: Verify certificates are valid, not expired, and the CA chain is complete
- Network connectivity: Confirm the backup tool can reach the Kafka brokers on the configured port (e.g.,
telnet kafka-broker 9093) - Firewall rules: Check that security groups, NACLs, or firewall rules allow traffic on the Kafka port
- Kafka ACLs: Ensure the backup user has
READandDESCRIBEpermissions on the target topics and consumer group
My restore is failing. How do I troubleshoot?
Follow these steps:
- Validate the backup first:
kafka-backup validate --deep --config /path/to/config.yaml - Check target connectivity: Verify the restore tool can reach the target Kafka cluster
- Verify IAM/storage permissions: Ensure the restore process has read access to the backup storage location
- Check disk space: Ensure sufficient local disk space for temporary decompression buffers
- Review error logs: Enable debug logging (
-v) and check for specific error messages
How do I report a bug or get community support?
- Bug reports: Open an issue on GitHub at github.com/osodevops/kafka-backup/issues
- Community support: Start a discussion at GitHub Discussions
- Enterprise support: Use your dedicated Slack channel or contact the support team directly