Skip to main content

Validation Config Reference

Complete reference for the validation.yaml configuration file used with kafka-backup validation run.

Configuration Structure

# Required: Backup to validate against
backup_id: "my-backup-001"

# Required: Where the backup is stored
storage:
backend: s3
bucket: my-bucket

# Required: The restored Kafka cluster
target:
bootstrap_servers: []

# Optional: Which checks to run
checks: {}

# Optional: Evidence report settings
evidence: {}

# Optional: Notification settings
notifications: {}

# Optional: PITR timestamp and trigger metadata
pitr_timestamp: null
triggered_by: null

Top-Level Fields

backup_id

Required. The backup ID to load the manifest from.

backup_id: "production-daily-001"

storage

Required. Storage backend configuration. Uses the same format as backup/restore configs.

storage:
backend: s3
bucket: my-kafka-backups
region: us-west-2
prefix: production/daily

See the Storage Configuration section for all backend options (S3, Azure, GCS, Filesystem).

target

Required. The Kafka cluster where the backup was restored (the cluster to validate).

target:
bootstrap_servers:
- restored-kafka:9092
security:
security_protocol: SASL_SSL
sasl_mechanism: SCRAM-SHA-512
sasl_username: validator
sasl_password: "${KAFKA_PASSWORD}"

Supports the same authentication options as the backup/restore source/target config. See Kafka Cluster Configuration.

pitr_timestamp

Optional. PITR timestamp (epoch milliseconds) used during the restore. Included in the evidence report metadata.

pitr_timestamp: 1711929600000

Can also be set via --pitr on the CLI.

triggered_by

Optional. Human-readable string recording who or what triggered this validation run. Appears in the evidence report for chain of custody.

triggered_by: "weekly-cron-job"
triggered_by: "External auditor KPMG - Q1 2026 review"

Can also be set via --triggered-by on the CLI.


Checks Configuration

checks.message_count

Compares per-topic/partition record counts between the backup manifest and the restored cluster.

OptionTypeDefaultDescription
enabledbooltrueEnable this check
modestringexactexact (all partitions) or sample (random subset)
sample_percentageint100Percentage of partitions to sample (1-100). Only used in sample mode
topicslist[]Topic filter. Empty = all topics in the backup
fail_thresholdint0Number of records difference allowed before failing. 0 = exact match
checks:
message_count:
enabled: true
mode: exact
topics:
- orders
- payments
fail_threshold: 0

checks.offset_range

Verifies that high watermark and low watermark for each partition match the backup manifest.

OptionTypeDefaultDescription
enabledbooltrueEnable this check
verify_high_watermarkbooltrueVerify the high watermark matches
verify_low_watermarkbooltrueVerify the low watermark matches
checks:
offset_range:
enabled: true
verify_high_watermark: true
verify_low_watermark: true

checks.consumer_group_offsets

Verifies that consumer group offsets are present and valid in the restored cluster.

OptionTypeDefaultDescription
enabledbooltrueEnable this check
verify_all_groupsbooltrueVerify all groups found on the cluster
groupslist[]Specific groups to check. Empty + verify_all_groups=true = all
checks:
consumer_group_offsets:
enabled: true
verify_all_groups: false
groups:
- order-processor
- payment-service

checks.custom_webhooks

Call external HTTP endpoints for custom validation logic. Each webhook is a separate check in the report.

OptionTypeDefaultDescription
namestringrequiredDisplay name for this check in the report
urlstringrequiredURL to POST the validation payload to
timeout_secondsint120Request timeout in seconds
expected_status_codeint200Expected HTTP status code for success
fail_on_timeoutbooltrueWhether to treat a timeout as failure
checks:
custom_webhooks:
- name: application-health-check
url: "https://internal.example.com/kafka-validation-hook"
timeout_seconds: 120
expected_status_code: 200
fail_on_timeout: true

The webhook receives a JSON POST body:

{
"event": "kafka_backup_validation",
"backup_id": "production-daily-001",
"pitr_timestamp": null,
"restored_cluster": {
"bootstrap_servers": ["restored-kafka:9092"]
}
}

Expected response:

{
"result": "passed",
"detail": "All health checks passed",
"data": {}
}

Valid result values: passed, failed, warning, skipped.


Evidence Configuration

evidence.formats

List of output formats to generate.

ValueDescription
jsonMachine-readable JSON evidence report (canonical format when signing enabled)
pdfAuditor-ready branded PDF report
evidence:
formats:
- json
- pdf

evidence.signing

Cryptographic signing configuration.

OptionTypeDefaultDescription
enabledboolfalseEnable ECDSA-P256-SHA256 signing
private_key_pathstringnullPath to PEM-encoded PKCS#8 private key
public_key_pathstringnullPath to PEM-encoded public key (optional, for reference)
evidence:
signing:
enabled: true
private_key_path: "/etc/kafka-backup/signing-key.pem"
warning

The private key must be in PKCS#8 format (header: -----BEGIN PRIVATE KEY-----). See the Evidence Signing Guide for key generation instructions.

evidence.storage

Controls where evidence reports are uploaded in object storage.

OptionTypeDefaultDescription
prefixstringevidence-reports/Storage key prefix for evidence files
retention_daysint2555Retention period in days (~7 years for SOX)
evidence:
storage:
prefix: "evidence-reports/"
retention_days: 2555 # ~7 years

Evidence files are stored at: {prefix}{run-id}/{YYYY}/{MM}/{run-id}.{json|pdf|sig}


Notifications Configuration

notifications.slack

Send Slack notifications via incoming webhook.

OptionTypeRequiredDescription
webhook_urlstringYesSlack incoming webhook URL
notifications:
slack:
webhook_url: "https://hooks.slack.com/services/T00/B00/xxxxx"

notifications.pagerduty

Send PagerDuty alerts via Events API v2.

OptionTypeDefaultDescription
integration_keystringrequiredPagerDuty Events API v2 integration key
severitystringcriticalAlert severity: critical, error, warning, info
notifications:
pagerduty:
integration_key: "your-integration-key"
severity: critical

On failure, a trigger event is sent. On the next success, a resolve event is sent (using the backup ID as dedup key).

Next Steps