Schema Registry Backup & Restore
OSO Kafka Backup Enterprise backs up Confluent Schema Registry alongside your Kafka data, ensuring restored messages remain readable.
The Problem
Kafka messages serialized with Avro, JSON Schema, or Protobuf embed a 4-byte schema ID that references an external Schema Registry. Without backing up schemas, restored messages are useless:
Without Schema Backup:
Kafka data ──── Backed up ✓
Schema IDs ──── NOT backed up ✗
Result: "Schema ID 42 not found"
With Schema Backup:
Kafka data ──── Backed up ✓
Schema IDs ──── Backed up ✓
Result: Consumers work correctly
Quick Start
Add the enterprise.schema_registry section to your existing backup config:
mode: backup
backup_id: daily-2026-04-06
source:
bootstrap_servers: ["kafka:9092"]
storage:
backend: s3
bucket: kafka-backups
enterprise:
schema_registry:
url: "https://schema-registry:8081"
auth:
type: basic
username: ${SR_USERNAME}
password: ${SR_PASSWORD}
Run the backup — schemas are captured automatically alongside Kafka data:
kafka-backup backup --config backup-config.yaml
Or back up schemas only (no Kafka data):
kafka-backup backup --config backup-config.yaml --schema-only
What Gets Backed Up
| Item | Description |
|---|---|
| Subjects | All subject names matching filter patterns |
| Schema versions | Every version of every subject: schema definition, type, ID, references |
| Global compatibility | Global compatibility level (BACKWARD, FORWARD, FULL, NONE, etc.) |
| Per-subject compatibility | Subject-specific compatibility overrides |
| Global mode | Registry mode (READWRITE, READONLY, IMPORT) |
| Per-subject mode | Subject-specific mode overrides |
| Schema references | Cross-schema dependencies (Protobuf imports, Avro references) |
Configuration Reference
Authentication
Three authentication methods are supported:
enterprise:
schema_registry:
url: "https://schema-registry:8081"
auth:
type: basic
username: ${SR_USERNAME}
password: ${SR_PASSWORD}
enterprise:
schema_registry:
url: "https://schema-registry:8081"
auth:
type: mtls
ca_cert: /certs/ca.crt
client_cert: /certs/client.crt
client_key: /certs/client.key
enterprise:
schema_registry:
url: "http://schema-registry:8081"
# auth section omitted = no authentication
TLS Server Verification
For HTTPS endpoints with custom CA certificates:
enterprise:
schema_registry:
url: "https://schema-registry:8081"
tls:
ca_cert: /certs/custom-ca.crt
Subject Filtering
Control which subjects are backed up using glob patterns:
enterprise:
schema_registry:
backup:
# Include patterns (default: ["*"] = all subjects)
subjects:
- "orders-*"
- "payments-*"
# Exclude patterns (applied after include)
exclude:
- "*-test"
- "*-internal"
# Include soft-deleted subjects (default: false)
include_soft_deleted: false
# Back up all versions or only latest (default: all)
include_versions: all # or "latest"
# Auto-include referenced subjects outside filter (default: true)
include_references: true
Restore Options
enterprise:
schema_registry:
restore:
# Restore strategy (default: preserve)
strategy: preserve # preserve | overwrite | skip
# Force original schema IDs using IMPORT mode (default: false)
# Requires: empty target registry + mode.mutability=true
force_ids: false
# Rewrite schema IDs in Kafka message bytes (default: false)
# Use when target registry assigns different IDs
rewrite_ids: false
# Subject name mapping for environment cloning
subject_mapping:
"orders-value": "staging-orders-value"
# Dry-run mode (default: false)
dry_run: false
Restore strategies:
| Strategy | Behavior | Use Case |
|---|---|---|
| preserve (default) | Keep existing schemas on target; add missing | Merge / partial recovery |
| overwrite | Replace existing schemas from backup | Full DR to clean target |
| skip | Only restore subjects that don't exist on target | Add new schemas only |
Connection Tuning
enterprise:
schema_registry:
connection:
timeout_ms: 30000 # Request timeout (default: 30s)
max_retries: 3 # Retry attempts on 429/5xx
retry_backoff_ms: 1000 # Initial retry backoff
rate_limit_rps: 25 # Max requests per second
concurrent_requests: 4 # Max parallel API calls
Storage Format
Schema backups are stored alongside Kafka data in the same storage backend:
{backup_id}/
manifest.json # Main backup manifest
topics/... # Kafka data segments
schema-registry/ # Schema Registry backup
_manifest.json # Schema manifest (subjects, counts, dependency order)
_global_config.json # Global compatibility and mode
subjects/
orders-value/
_metadata.json # Subject config, mode, version list
v1.json # Schema version 1 (full definition)
v2.json # Schema version 2
payments-value/
_metadata.json
v1.json
Schema References & Dependencies
Schemas can reference other schemas (e.g., Protobuf imports). The backup engine automatically:
- Discovers all cross-subject references
- Builds a dependency graph (DAG)
- Performs a topological sort
- Records the correct restore order in the manifest
On restore, schemas are registered in dependency order — referenced schemas first, then schemas that reference them. Circular references are detected and reported as errors.
Use Cases
Disaster Recovery
Back up everything, restore to a fresh environment:
enterprise:
schema_registry:
url: "https://dr-schema-registry:8081"
restore:
strategy: preserve
force_ids: true # Empty target — preserve original IDs
Cross-Environment Migration (On-Prem to Cloud)
Schema IDs will differ on the target. Use ID rewriting:
enterprise:
schema_registry:
url: "https://psrc-xxxxx.confluent.cloud"
auth:
type: basic
username: ${CCLOUD_SR_API_KEY}
password: ${CCLOUD_SR_API_SECRET}
restore:
strategy: preserve
rewrite_ids: true # Rewrite IDs in Kafka message bytes
Compliance Archive (Schema-Only)
Periodic schema snapshots for regulatory requirements:
kafka-backup backup --config schema-archive.yaml --schema-only
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
Schema ID not found after restore | Schemas not restored before data | Schema restore runs automatically before data restore |
Subject not found during backup | Subject was deleted | Enable include_soft_deleted: true |
| Circular reference error | Schemas reference each other in a cycle | Fix the cycle in Schema Registry, then re-backup |
| 429 rate limit errors | Too many API calls | Reduce rate_limit_rps or increase retry_backoff_ms |
| Authentication failed | Invalid credentials | Check SR_USERNAME/SR_PASSWORD env vars |
Requirements
- Confluent Schema Registry (Platform or Cloud)
- HTTP(S) access to Schema Registry REST API
- Enterprise license with
schema_registryfeature enabled