Skip to main content

Backup Validation & Compliance Evidence

Automatically validate that your Kafka backups can be restored correctly and generate cryptographically signed evidence reports for auditors.

Overview

The validation suite runs checks against a restored Kafka cluster and compares the results against the original backup manifest. It produces:

  • JSON evidence reports — machine-readable, deterministic, suitable for automation
  • PDF evidence reports — auditor-ready, branded, suitable for direct submission
  • Detached signatures — ECDSA-P256-SHA256 cryptographic proof of report integrity
  • Compliance mappings — automatic mapping to SOX ITGC, CMMC RE.3.139, and GDPR Article 32

Prerequisites

  • OSO Kafka Backup installed (v0.11.0+)
  • An existing backup in object storage or filesystem
  • A Kafka cluster with the backup data restored (the "target" cluster)
  • Optional: OpenSSL for generating signing keys
info

The validation tool does not perform the restore itself. Run kafka-backup restore first, then validate the result. This separation ensures the validation is an independent check.

Step 1: Create a Validation Config

Create validation.yaml:

validation.yaml
# Backup to validate against
backup_id: "production-daily-001"

# Where the backup is stored
storage:
backend: s3
bucket: my-kafka-backups
region: us-west-2
prefix: production/daily

# The restored Kafka cluster to validate
target:
bootstrap_servers:
- restored-kafka:9092

# Which checks to run
checks:
message_count:
enabled: true
mode: exact # exact | sample
offset_range:
enabled: true
consumer_group_offsets:
enabled: false # Enable if consumer groups were restored

# Evidence report settings
evidence:
formats:
- json
- pdf
storage:
prefix: "evidence-reports/"
retention_days: 2555 # ~7 years (SOX requirement)

Step 2: Run Validation

$ kafka-backup validation run --config validation.yaml

You'll see output like:

=== Validation Results ===
Overall: PASSED
Checks: 2/2 passed, 0 failed, 0 skipped
Duration: 18ms

[PASSED] MessageCountCheck — 3 topics; 1000 messages expected, 1000 restored; 0 discrepancies
[PASSED] OffsetRangeCheck — 9 partitions checked; 9 passed; 0 issues

JSON evidence uploaded: evidence-reports/validation-9275b4aa/2026/04/validation-9275b4aa.json
PDF evidence uploaded: evidence-reports/validation-9275b4aa/2026/04/validation-9275b4aa.pdf

The command exits with code 0 on success, code 1 if any check fails.

Ad-hoc Auditor-Triggered Runs

When an auditor requests a specific point-in-time validation:

$ kafka-backup validation run \
--config validation.yaml \
--pitr 1711929600000 \
--triggered-by "KPMG Q1 2026 audit"

The --triggered-by string is recorded in the evidence report, providing a clear chain of custody.

Step 3: Review the Evidence Report

The JSON evidence report contains:

  • Backup metadata — ID, source cluster, topics, partitions, record counts
  • Validation results — per-check pass/fail with machine-readable data
  • Integrity information — SHA-256 checksums, signature algorithm
  • Compliance mappings — which checks satisfy which regulatory controls
# List available evidence reports
$ kafka-backup validation evidence-list --path s3://my-kafka-backups

# Download a specific report
$ kafka-backup validation evidence-get \
--path s3://my-kafka-backups \
--report-id validation-9275b4aa \
--format json \
--output evidence-report.json

Step 4: Generate a PDF Report

Include pdf in the formats list:

evidence:
formats:
- json
- pdf

The PDF contains:

  • Page 1 — Cover page with overall result (PASSED/FAILED), report ID, timestamp
  • Page 2 — Validation check results table
  • Page 3 — Integrity details and compliance framework mappings (SOX, CMMC, GDPR)

Step 5: Sign the Evidence Report

See the Evidence Signing Guide for detailed key management instructions.

Quick setup:

# Generate an ECDSA-P256 key pair
$ openssl ecparam -genkey -name prime256v1 -noout | \
openssl pkcs8 -topk8 -nocrypt -out signing-key.pem
$ openssl ec -in signing-key.pem -pubout -out signing-key-pub.pem

Add to your config:

validation.yaml
evidence:
signing:
enabled: true
private_key_path: "/etc/kafka-backup/signing-key.pem"

The signed report produces a .sig file alongside the JSON and PDF.

Step 6: Verify the Signature

$ kafka-backup validation evidence-verify \
--report evidence-report.json \
--signature evidence-report.sig \
--public-key signing-key-pub.pem
Report ID: validation-9275b4aa-2aeb-4910-a3a6-9e4aa1dc016a
Algorithm: ECDSA-P256-SHA256
Report SHA-256: 2482bbdfa113146e39a4884767002554...
SHA-256 checksum: VALID
ECDSA signature: VALID

Evidence report integrity: VERIFIED

Step 7: Set Up Notifications

Get alerted when validation passes or fails:

validation.yaml
notifications:
slack:
webhook_url: "https://hooks.slack.com/services/T00/B00/xxxxx"
pagerduty:
integration_key: "your-pagerduty-integration-key"
severity: critical # Triggers on failure only

Slack receives a Block Kit message with the result, check summary, and a link to the evidence report. PagerDuty receives an Events API v2 trigger on failure and auto-resolves on the next success.

Validation Checks

MessageCountCheck

Compares per-partition record counts between the backup manifest and the restored cluster. Fails if any partition has more discrepancies than fail_threshold.

checks:
message_count:
enabled: true
mode: exact # exact: all partitions | sample: random subset
sample_percentage: 100
topics: [] # Empty = all topics in the backup
fail_threshold: 0 # 0 = fail on any discrepancy

OffsetRangeCheck

Verifies that the high watermark and low watermark for each partition in the restored cluster match the backup manifest's segment offset ranges.

checks:
offset_range:
enabled: true
verify_high_watermark: true
verify_low_watermark: true

ConsumerGroupOffsetCheck

Verifies that consumer group offsets are present and valid in the restored cluster.

checks:
consumer_group_offsets:
enabled: true
verify_all_groups: true # false = only check groups listed below
groups: [] # Empty + verify_all_groups = all groups

CustomWebhookCheck

Call your own validation endpoint. The tool POSTs a JSON payload with the backup ID and restored cluster details, and expects a pass/fail response.

checks:
custom_webhooks:
- name: application-health-check
url: "https://internal.example.com/kafka-validation-hook"
timeout_seconds: 120
expected_status_code: 200
fail_on_timeout: true

Complete Configuration Example

validation-full.yaml
backup_id: "production-daily-001"

storage:
backend: s3
bucket: my-kafka-backups
region: us-west-2
prefix: production/daily

target:
bootstrap_servers:
- restored-kafka-0:9092
- restored-kafka-1:9092
security:
security_protocol: SASL_SSL
sasl_mechanism: SCRAM-SHA-512
sasl_username: backup-validator
sasl_password: "${KAFKA_PASSWORD}"

checks:
message_count:
enabled: true
mode: exact
fail_threshold: 0
offset_range:
enabled: true
consumer_group_offsets:
enabled: true
verify_all_groups: true
custom_webhooks:
- name: order-service-check
url: "https://internal.example.com/validation/orders"
timeout_seconds: 120

evidence:
formats: [json, pdf]
signing:
enabled: true
private_key_path: "/etc/kafka-backup/signing-key.pem"
storage:
prefix: "evidence-reports/"
retention_days: 2555

notifications:
slack:
webhook_url: "https://hooks.slack.com/services/T00/B00/xxxxx"
pagerduty:
integration_key: "abc123def456"
severity: critical

triggered_by: "weekly-cron-job"

Next Steps