Validation Config Reference
Complete reference for the validation.yaml configuration file used with kafka-backup validation run.
Configuration Structure
# Required: Backup to validate against
backup_id: "my-backup-001"
# Required: Where the backup is stored
storage:
backend: s3
bucket: my-bucket
# Required: The restored Kafka cluster
target:
bootstrap_servers: []
# Optional: Which checks to run
checks: {}
# Optional: Evidence report settings
evidence: {}
# Optional: Notification settings
notifications: {}
# Optional: PITR timestamp and trigger metadata
pitr_timestamp: null
triggered_by: null
Top-Level Fields
backup_id
Required. The backup ID to load the manifest from.
backup_id: "production-daily-001"
storage
Required. Storage backend configuration. Uses the same format as backup/restore configs.
storage:
backend: s3
bucket: my-kafka-backups
region: us-west-2
prefix: production/daily
See the Storage Configuration section for all backend options (S3, Azure, GCS, Filesystem).
target
Required. The Kafka cluster where the backup was restored (the cluster to validate).
target:
bootstrap_servers:
- restored-kafka:9092
security:
security_protocol: SASL_SSL
sasl_mechanism: SCRAM-SHA-512
sasl_username: validator
sasl_password: "${KAFKA_PASSWORD}"
Supports the same authentication options as the backup/restore source/target config. See Kafka Cluster Configuration.
pitr_timestamp
Optional. PITR timestamp (epoch milliseconds) used during the restore. Included in the evidence report metadata.
pitr_timestamp: 1711929600000
Can also be set via --pitr on the CLI.
triggered_by
Optional. Human-readable string recording who or what triggered this validation run. Appears in the evidence report for chain of custody.
triggered_by: "weekly-cron-job"
triggered_by: "External auditor KPMG - Q1 2026 review"
Can also be set via --triggered-by on the CLI.
Checks Configuration
checks.message_count
Compares per-topic/partition record counts between the backup manifest and the restored cluster.
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable this check |
mode | string | exact | exact (all partitions) or sample (random subset) |
sample_percentage | int | 100 | Percentage of partitions to sample (1-100). Only used in sample mode |
topics | list | [] | Topic filter. Empty = all topics in the backup |
fail_threshold | int | 0 | Number of records difference allowed before failing. 0 = exact match |
checks:
message_count:
enabled: true
mode: exact
topics:
- orders
- payments
fail_threshold: 0
checks.offset_range
Verifies that high watermark and low watermark for each partition match the backup manifest.
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable this check |
verify_high_watermark | bool | true | Verify the high watermark matches |
verify_low_watermark | bool | true | Verify the low watermark matches |
checks:
offset_range:
enabled: true
verify_high_watermark: true
verify_low_watermark: true
checks.consumer_group_offsets
Verifies that consumer group offsets are present and valid in the restored cluster.
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable this check |
verify_all_groups | bool | true | Verify all groups found on the cluster |
groups | list | [] | Specific groups to check. Empty + verify_all_groups=true = all |
checks:
consumer_group_offsets:
enabled: true
verify_all_groups: false
groups:
- order-processor
- payment-service
checks.custom_webhooks
Call external HTTP endpoints for custom validation logic. Each webhook is a separate check in the report.
| Option | Type | Default | Description |
|---|---|---|---|
name | string | required | Display name for this check in the report |
url | string | required | URL to POST the validation payload to |
timeout_seconds | int | 120 | Request timeout in seconds |
expected_status_code | int | 200 | Expected HTTP status code for success |
fail_on_timeout | bool | true | Whether to treat a timeout as failure |
checks:
custom_webhooks:
- name: application-health-check
url: "https://internal.example.com/kafka-validation-hook"
timeout_seconds: 120
expected_status_code: 200
fail_on_timeout: true
The webhook receives a JSON POST body:
{
"event": "kafka_backup_validation",
"backup_id": "production-daily-001",
"pitr_timestamp": null,
"restored_cluster": {
"bootstrap_servers": ["restored-kafka:9092"]
}
}
Expected response:
{
"result": "passed",
"detail": "All health checks passed",
"data": {}
}
Valid result values: passed, failed, warning, skipped.
Evidence Configuration
evidence.formats
List of output formats to generate.
| Value | Description |
|---|---|
json | Machine-readable JSON evidence report (canonical format when signing enabled) |
pdf | Auditor-ready branded PDF report |
evidence:
formats:
- json
- pdf
evidence.signing
Cryptographic signing configuration.
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | Enable ECDSA-P256-SHA256 signing |
private_key_path | string | null | Path to PEM-encoded PKCS#8 private key |
public_key_path | string | null | Path to PEM-encoded public key (optional, for reference) |
evidence:
signing:
enabled: true
private_key_path: "/etc/kafka-backup/signing-key.pem"
The private key must be in PKCS#8 format (header: -----BEGIN PRIVATE KEY-----). See the Evidence Signing Guide for key generation instructions.
evidence.storage
Controls where evidence reports are uploaded in object storage.
| Option | Type | Default | Description |
|---|---|---|---|
prefix | string | evidence-reports/ | Storage key prefix for evidence files |
retention_days | int | 2555 | Retention period in days (~7 years for SOX) |
evidence:
storage:
prefix: "evidence-reports/"
retention_days: 2555 # ~7 years
Evidence files are stored at: {prefix}{run-id}/{YYYY}/{MM}/{run-id}.{json|pdf|sig}
Notifications Configuration
notifications.slack
Send Slack notifications via incoming webhook.
| Option | Type | Required | Description |
|---|---|---|---|
webhook_url | string | Yes | Slack incoming webhook URL |
notifications:
slack:
webhook_url: "https://hooks.slack.com/services/T00/B00/xxxxx"
notifications.pagerduty
Send PagerDuty alerts via Events API v2.
| Option | Type | Default | Description |
|---|---|---|---|
integration_key | string | required | PagerDuty Events API v2 integration key |
severity | string | critical | Alert severity: critical, error, warning, info |
notifications:
pagerduty:
integration_key: "your-integration-key"
severity: critical
On failure, a trigger event is sent. On the next success, a resolve event is sent (using the backup ID as dedup key).
Next Steps
- Backup Validation Guide — step-by-step walkthrough
- Evidence Report Schema — JSON report structure
- CLI Reference —
validationcommand options