MSK ZooKeeper to KRaft Migration

Try it free — no license needed

plan and precheck are completely free. Run them against your production clusters today to see exactly what a migration looks like — generated runbook, cost estimate, IAM policies, and infrastructure readiness report. No signup, no trial activation.

kafka-backup migrate msk-kraft plan --config migration.yaml --format all --out-dir ./migration-plan
kafka-backup migrate msk-kraft precheck --config migration.yaml

Migrate your AWS MSK clusters from ZooKeeper to KRaft mode with a short coordinated producer freeze, validated offset continuity, and a cryptographically signed evidence bundle that proves the migration succeeded. Consumers resume from translated target offsets so message continuity is preserved across the switch.

Why Migrate from ZooKeeper to KRaft?

Apache Kafka 4.0 removes ZooKeeper entirely. KRaft (Kafka Raft) replaces ZooKeeper as the metadata management layer, bringing:

Faster controller failover — seconds instead of minutes
Simplified operations — one system to manage instead of two
Better scalability — millions of partitions per cluster
Reduced infrastructure — no ZooKeeper ensemble to provision, monitor, or patch

AWS MSK supports KRaft from version 3.7.x onward. ZooKeeper-mode clusters on MSK will reach end of extended support as Kafka 4.x becomes the default. The migration window is now.

Is ZooKeeper Deprecated?

Yes. ZooKeeper was deprecated in Apache Kafka 3.5 (KIP-833) and removed in Kafka 4.0. AWS MSK's latest versions already support KRaft, and new clusters should be provisioned in KRaft mode.

The AWS MSK Migration Problem

AWS MSK does not support in-place ZooKeeper-to-KRaft conversion. You must create a new KRaft cluster and move everything over:

What needs to migrate	What happens without tooling
Topic data (every partition, every record)	Manual MirrorMaker setup, ongoing maintenance
Topic configurations (retention, compaction, replication)	Manual recreation, error-prone
Consumer group offsets	Lost — consumers restart from earliest or latest
ACL bindings	Manual recreation, security gaps during transition
Proof that migration succeeded	Nothing — hope and prayer

The gap between "data is on the new cluster" and "consumers resume from the right place" is where migrations fail. A single incorrect offset means lost messages or reprocessed duplicates — silent data corruption that surfaces days later in downstream systems.

How kafka-backup Enterprise Solves It

Capability	kafka-backup Enterprise	MirrorMaker 2	Manual
Controlled cutover	Yes (coordinated producer freeze)	Partial	No
Offset continuity (exact message resume)	Yes (offset-map translation)	No	No
ACL migration with drift handling	Yes (merge/replace/refuse)	No	Manual
Topic config preservation	Yes (automatic)	Partial	Manual
Cryptographic evidence bundle	Yes (Ed25519-signed)	No	No
5-check automated validation	Yes	No	No
Rollback capability	Yes (pre-cutover)	No	No
Resume after failure	Yes (journal-based)	Restart from scratch	Restart from scratch
Cross-auth support (SCRAM → IAM)	Yes	No	Manual

Migration Lifecycle

The migration runs through a deterministic 11-state machine. Every state transition is journaled and included in the final evidence bundle.

PLANNED → PRECHECK → TOPOLOGY_COPY → SEED → TAIL → DRAIN_READY
                                                        ↓
         FINALIZED ← VALIDATING ← AWAITING_CLIENT_SWITCH ← CUTOVER

Phase	State	What happens
Plan & Precheck	`planned` → `precheck`	Read-only analysis of both clusters. Detects blockers (incompatible versions, unreachable brokers, S3 permission issues) and warnings (cross-region egress, compacted topics, static members).
Topology Copy	`topology_copy`	Creates missing topics on target with matching partition counts and configurations. Copies ACL bindings (filtering MSK internals like `User:ANONYMOUS`).
Seed	`seed`	Bulk-copies all existing data through S3 — source → backup → S3 → restore → target. Builds the offset map that enables consumer group translation.
Tail	`tail`	Continuously bridges the gap between seed and cutover. Replays new records as they arrive on source. Tracks per-partition lag.
Drain Ready	`drain_ready`	All partitions within lag tolerance. Execution halts. Operator decides when to proceed.
Cutover	`cutover`	Freezes producers (via webhook or manual), publishes sentinel records, drains final records, translates all consumer group offsets, commits translated offsets on target.
Client Switch	`awaiting_client_switch`	Operator updates application configs to point to new KRaft cluster bootstrap servers.
Validation	`validating`	Runs 5 automated checks: topic parity, record counts, spot-check record equality, sentinel presence, consumer group reconciliation.
Finalize	`finalized`	Signs the evidence bundle with Ed25519 and uploads to S3. Migration complete.

At any point before cutover, you can rollback — the source cluster is never modified.

Authentication Matrix

kafka-backup supports every MSK authentication mode and cross-auth migration:

Source Auth	Target Auth	Supported	Notes
IAM	IAM	Yes	Most common MSK configuration
SCRAM-SHA-512	SCRAM-SHA-512	Yes	Pre-provision SCRAM users on target
SCRAM-SHA-512	IAM	Yes	Auth modernization — ACLs emitted as `access-map.json`
IAM	SCRAM-SHA-512	Yes
mTLS	IAM	Yes
mTLS	mTLS	Yes
PLAINTEXT	Any	Yes	Dev/test environments

Cross-auth migration (e.g., SCRAM source → IAM target) is a first-class feature. When the target uses IAM, Kafka ACLs don't apply — instead, the tool generates an access-map.json that maps each principal's permissions to the IAM policies you need to create.

5-Check Automated Validation

Before finalizing, the tool runs five independent validation checks:

Check	What it verifies	Pass criteria
Topic Parity	Partition counts match between source and target	All topics match
Counts & Offsets	Record counts within tolerance (default ±1 for sentinel)	Per-partition span difference ≤ `count_tolerance`
Spot-Check Records	Sampled records byte-equal between source and target	All samples match (compacted topics allow warnings)
Sentinel Presence	Cutover marker records landed on target	All sentinels found
Consumer Group Reconciliation	Translated offsets committed correctly on target	All offsets match expected values

The overall outcome is PASSED, WARNING (compacted topic drift — expected), or FAILED. Failed validation blocks finalization — you must investigate and remediate before the migration can complete.

Cryptographic Evidence Bundle

Every migration produces an Ed25519-signed JSON evidence bundle uploaded to S3. This is your auditable proof that the migration succeeded:

Complete state transition journal (every phase, with timestamps)
Source and target cluster metadata snapshots
Topology diff (topics created, configs applied)
ACL plan (bindings copied, internals filtered)
Seed and tail statistics (records, bytes, partitions)
Full validation report with per-partition detail
Offset translation map
Cutover report (sentinel positions, freeze timing)

The signature is verifiable offline with the Ed25519 public key. For regulated environments, the evidence bucket supports S3 Object Lock (COMPLIANCE mode) to prevent tampering.

For compliance teams

The evidence bundle answers: "Prove that every record made it to the new cluster and every consumer will resume from the right place." It's the difference between "we think it worked" and "here's the cryptographic proof."

Quick Start

1. Create a minimal config

migration.yaml
enterprise:
  msk_kraft_migration:
    source:
      cluster_arn: arn:aws:kafka:us-east-1:123456789012:cluster/my-zk-cluster/abc-123
      auth:
        mode: iam
    target:
      cluster_arn: arn:aws:kafka:us-east-1:123456789012:cluster/my-kraft-cluster/def-456
      auth:
        mode: iam
    backup:
      s3_bucket: my-migration-segments
      s3_prefix: migrations/
    evidence:
      s3_bucket: my-migration-evidence
      s3_prefix: evidence/

2. Generate the migration plan (free)

kafka-backup migrate msk-kraft plan \
  --config migration.yaml \
  --format all \
  --out-dir ./migration-plan

This generates:

plan.json — machine-readable migration plan
runbook.md — step-by-step operator runbook
aws-cli.sh — AWS CLI commands for infrastructure setup
iam-policy-templated.json — IAM policy template
iam-policy-concrete.json — IAM policy with your ARNs filled in
cost-estimate.json — estimated S3 and data transfer costs

3. Run precheck (free)

kafka-backup migrate msk-kraft precheck --config migration.yaml

Precheck analyzes both clusters and reports blockers, warnings, and informational findings. See the Precheck Codes Reference for remediation guidance.

4. Execute the migration (license required)

kafka-backup migrate msk-kraft execute \
  --config migration.yaml \
  --journal-dir ./journal

See the Production Migration Runbook for the complete step-by-step process.

Pricing and Licensing

MSK KRaft migration requires the migrations:msk-kraft feature in your enterprise license. A 14-day free trial activates automatically on first run — no signup, no credit card.

plan and precheck are always free, even without a license
execute, cutover, finalize, and other mutation commands require an active license
Licenses are Ed25519-signed files validated offline — no license server, no phone-home

Learn more about licensing | Get a license

Frequently Asked Questions

Does Kafka still need ZooKeeper?

No. Apache Kafka 3.3+ supports KRaft mode (ZooKeeper-free). Kafka 4.0 removes ZooKeeper entirely. AWS MSK supports KRaft from version 3.7.x.

Can Kafka run without ZooKeeper?

Yes. KRaft mode replaces ZooKeeper with an internal Raft-based metadata quorum. New clusters should be provisioned in KRaft mode.

Is ZooKeeper removed from Kafka?

ZooKeeper was deprecated in Kafka 3.5 and removed in Kafka 4.0. Existing ZooKeeper-mode clusters must migrate to KRaft before upgrading to Kafka 4.x.

Is Kafka KRaft production ready?

Yes. KRaft has been production-ready since Kafka 3.3 (KIP-833). AWS MSK supports KRaft in production from version 3.7.x. Major organizations have been running KRaft in production since 2024.

What is KRaft in Kafka?

KRaft (Kafka Raft) is the consensus protocol that replaces ZooKeeper for Kafka metadata management. It uses the Raft algorithm to elect a controller and replicate metadata across the cluster, eliminating the need for a separate ZooKeeper ensemble. See KRaft Architecture for a deep dive.

How long does a migration take?

Migration time depends on data volume and network bandwidth. Rough estimates for a 3-broker cluster:

Data volume	Seed phase	Total (including tail + cutover)
10 GB	~5 minutes	~10 minutes
100 GB	~30 minutes	~45 minutes
1 TB	~4 hours	~5 hours
10 TB	~36 hours	~40 hours

The producer freeze window during cutover is typically under 60 seconds regardless of data volume.

Next Steps

Production Migration Runbook — step-by-step guide
Configuration Reference — every YAML field explained
Precheck Codes Reference — blocker and warning remediation
CLI Reference — all 9 migration commands
Architecture Deep Dive — how offset continuity works
IAM-to-IAM Example — complete worked example
Cross-Auth Example — SCRAM to IAM migration

Why Migrate from ZooKeeper to KRaft?​

Is ZooKeeper Deprecated?​

The AWS MSK Migration Problem​

How kafka-backup Enterprise Solves It​

Migration Lifecycle​

Authentication Matrix​

5-Check Automated Validation​

Cryptographic Evidence Bundle​

Quick Start​

1. Create a minimal config​

2. Generate the migration plan (free)​

3. Run precheck (free)​

4. Execute the migration (license required)​

Pricing and Licensing​

Frequently Asked Questions​

Does Kafka still need ZooKeeper?​

Can Kafka run without ZooKeeper?​

Is ZooKeeper removed from Kafka?​

Is Kafka KRaft production ready?​

What is KRaft in Kafka?​

How long does a migration take?​

Next Steps​