Skip to main content

Schema Registry Backup & Restore

OSO Kafka Backup Enterprise backs up Confluent Schema Registry alongside your Kafka data, ensuring restored messages remain readable.

The Problem

Kafka messages serialized with Avro, JSON Schema, or Protobuf embed a 4-byte schema ID that references an external Schema Registry. Without backing up schemas, restored messages are useless:

Without Schema Backup:
Kafka data ──── Backed up ✓
Schema IDs ──── NOT backed up ✗
Result: "Schema ID 42 not found"

With Schema Backup:
Kafka data ──── Backed up ✓
Schema IDs ──── Backed up ✓
Result: Consumers work correctly

Quick Start

Add the enterprise.schema_registry section to your existing backup config:

backup-config.yaml
mode: backup
backup_id: daily-2026-04-06
source:
bootstrap_servers: ["kafka:9092"]
storage:
backend: s3
bucket: kafka-backups

enterprise:
schema_registry:
url: "https://schema-registry:8081"
auth:
type: basic
username: ${SR_USERNAME}
password: ${SR_PASSWORD}

Run the backup — schemas are captured automatically alongside Kafka data:

kafka-backup backup --config backup-config.yaml

Or back up schemas only (no Kafka data):

kafka-backup backup --config backup-config.yaml --schema-only

What Gets Backed Up

ItemDescription
SubjectsAll subject names matching filter patterns
Schema versionsEvery version of every subject: schema definition, type, ID, references
Global compatibilityGlobal compatibility level (BACKWARD, FORWARD, FULL, NONE, etc.)
Per-subject compatibilitySubject-specific compatibility overrides
Global modeRegistry mode (READWRITE, READONLY, IMPORT)
Per-subject modeSubject-specific mode overrides
Schema referencesCross-schema dependencies (Protobuf imports, Avro references)

Configuration Reference

Authentication

Three authentication methods are supported:

Basic Auth (most common)
enterprise:
schema_registry:
url: "https://schema-registry:8081"
auth:
type: basic
username: ${SR_USERNAME}
password: ${SR_PASSWORD}
Mutual TLS (mTLS)
enterprise:
schema_registry:
url: "https://schema-registry:8081"
auth:
type: mtls
ca_cert: /certs/ca.crt
client_cert: /certs/client.crt
client_key: /certs/client.key
No Authentication
enterprise:
schema_registry:
url: "http://schema-registry:8081"
# auth section omitted = no authentication

TLS Server Verification

For HTTPS endpoints with custom CA certificates:

enterprise:
schema_registry:
url: "https://schema-registry:8081"
tls:
ca_cert: /certs/custom-ca.crt

Subject Filtering

Control which subjects are backed up using glob patterns:

enterprise:
schema_registry:
backup:
# Include patterns (default: ["*"] = all subjects)
subjects:
- "orders-*"
- "payments-*"
# Exclude patterns (applied after include)
exclude:
- "*-test"
- "*-internal"
# Include soft-deleted subjects (default: false)
include_soft_deleted: false
# Back up all versions or only latest (default: all)
include_versions: all # or "latest"
# Auto-include referenced subjects outside filter (default: true)
include_references: true

Restore Options

enterprise:
schema_registry:
restore:
# Restore strategy (default: preserve)
strategy: preserve # preserve | overwrite | skip
# Force original schema IDs using IMPORT mode (default: false)
# Requires: empty target registry + mode.mutability=true
force_ids: false
# Rewrite schema IDs in Kafka message bytes (default: false)
# Use when target registry assigns different IDs
rewrite_ids: false
# Subject name mapping for environment cloning
subject_mapping:
"orders-value": "staging-orders-value"
# Dry-run mode (default: false)
dry_run: false

Restore strategies:

StrategyBehaviorUse Case
preserve (default)Keep existing schemas on target; add missingMerge / partial recovery
overwriteReplace existing schemas from backupFull DR to clean target
skipOnly restore subjects that don't exist on targetAdd new schemas only

Connection Tuning

enterprise:
schema_registry:
connection:
timeout_ms: 30000 # Request timeout (default: 30s)
max_retries: 3 # Retry attempts on 429/5xx
retry_backoff_ms: 1000 # Initial retry backoff
rate_limit_rps: 25 # Max requests per second
concurrent_requests: 4 # Max parallel API calls

Storage Format

Schema backups are stored alongside Kafka data in the same storage backend:

{backup_id}/
manifest.json # Main backup manifest
topics/... # Kafka data segments
schema-registry/ # Schema Registry backup
_manifest.json # Schema manifest (subjects, counts, dependency order)
_global_config.json # Global compatibility and mode
subjects/
orders-value/
_metadata.json # Subject config, mode, version list
v1.json # Schema version 1 (full definition)
v2.json # Schema version 2
payments-value/
_metadata.json
v1.json

Schema References & Dependencies

Schemas can reference other schemas (e.g., Protobuf imports). The backup engine automatically:

  1. Discovers all cross-subject references
  2. Builds a dependency graph (DAG)
  3. Performs a topological sort
  4. Records the correct restore order in the manifest

On restore, schemas are registered in dependency order — referenced schemas first, then schemas that reference them. Circular references are detected and reported as errors.

Use Cases

Disaster Recovery

Back up everything, restore to a fresh environment:

enterprise:
schema_registry:
url: "https://dr-schema-registry:8081"
restore:
strategy: preserve
force_ids: true # Empty target — preserve original IDs

Cross-Environment Migration (On-Prem to Cloud)

Schema IDs will differ on the target. Use ID rewriting:

enterprise:
schema_registry:
url: "https://psrc-xxxxx.confluent.cloud"
auth:
type: basic
username: ${CCLOUD_SR_API_KEY}
password: ${CCLOUD_SR_API_SECRET}
restore:
strategy: preserve
rewrite_ids: true # Rewrite IDs in Kafka message bytes

Compliance Archive (Schema-Only)

Periodic schema snapshots for regulatory requirements:

kafka-backup backup --config schema-archive.yaml --schema-only

Troubleshooting

IssueCauseSolution
Schema ID not found after restoreSchemas not restored before dataSchema restore runs automatically before data restore
Subject not found during backupSubject was deletedEnable include_soft_deleted: true
Circular reference errorSchemas reference each other in a cycleFix the cycle in Schema Registry, then re-backup
429 rate limit errorsToo many API callsReduce rate_limit_rps or increase retry_backoff_ms
Authentication failedInvalid credentialsCheck SR_USERNAME/SR_PASSWORD env vars

Requirements

  • Confluent Schema Registry (Platform or Cloud)
  • HTTP(S) access to Schema Registry REST API
  • Enterprise license with schema_registry feature enabled