Example: MSK KRaft Migration (IAM to IAM)
This example walks through a complete migration of a 3-broker MSK ZooKeeper cluster to a 3-broker MSK KRaft cluster, both using IAM authentication in us-east-1.
Scenario
| Source | Target | |
|---|---|---|
| Cluster type | MSK Provisioned | MSK Provisioned |
| Metadata mode | ZooKeeper | KRaft |
| Kafka version | 3.6.0 | 3.9.0 |
| Authentication | IAM | IAM |
| Brokers | 3 (kafka.m5.large) | 3 (kafka.m5.large) |
| Topics | 50 | 0 (empty target) |
| Data volume | ~500 GB | — |
| Consumer groups | 12 | — |
Infrastructure Setup
1. Create the target KRaft cluster
# Create MSK configuration
aws kafka create-configuration \
--name "prod-kraft-config" \
--kafka-versions "3.9.0" \
--server-properties "$(cat <<'EOF'
auto.create.topics.enable=false
default.replication.factor=3
min.insync.replicas=2
num.partitions=6
EOF
)"
# Create the KRaft cluster
aws kafka create-cluster-v2 \
--cluster-name "prod-kraft" \
--provisioned '{
"brokerNodeGroupInfo": {
"instanceType": "kafka.m5.large",
"clientSubnets": ["subnet-aaa", "subnet-bbb", "subnet-ccc"],
"securityGroups": ["sg-migration"],
"storageInfo": {"ebsStorageInfo": {"volumeSize": 1000}}
},
"numberOfBrokerNodes": 3,
"clientAuthentication": {"sasl": {"iam": {"enabled": true}}},
"encryptionInfo": {
"encryptionInTransit": {"clientBroker": "TLS", "inCluster": true}
},
"kafkaVersion": "3.9.0"
}'
2. Create S3 buckets
aws s3 mb s3://prod-migration-segments --region us-east-1
aws s3 mb s3://prod-migration-evidence --region us-east-1
# Optional: enable Object Lock on the evidence bucket for compliance
aws s3api put-object-lock-configuration \
--bucket prod-migration-evidence \
--object-lock-configuration '{
"ObjectLockEnabled": "Enabled",
"Rule": {"DefaultRetention": {"Mode": "COMPLIANCE", "Years": 7}}
}'
3. Configure IAM permissions
The migration runner's IAM role needs access to both clusters and both buckets. Use the generated IAM policy from the plan command (Step 2 below) for the exact permissions.
Migration Config
enterprise:
msk_kraft_migration:
source:
cluster_arn: arn:aws:kafka:us-east-1:123456789012:cluster/prod-zk/a1b2c3d4-5678-90ab-cdef-111111111111
auth:
mode: iam
target:
cluster_arn: arn:aws:kafka:us-east-1:123456789012:cluster/prod-kraft/a1b2c3d4-5678-90ab-cdef-222222222222
auth:
mode: iam
backup:
s3_bucket: prod-migration-segments
s3_prefix: zk-to-kraft/
evidence:
s3_bucket: prod-migration-evidence
s3_prefix: migrations/
retention: 7y
cutover:
drain_timeout: 30m
drain_max_partition_lag: 100
max_producer_freeze: 60s
producer_freeze_webhook: https://internal-api.example.com/kafka/freeze
validation:
count_tolerance: 1
spot_check_records_per_partition: 5
seed:
max_concurrent_partitions: 8
acl:
on_drift: merge
Run the Migration
Step 1: Generate the plan
kafka-backup migrate msk-kraft plan \
--config migration.yaml \
--format all \
--out-dir ./migration-plan
Review the generated artifacts:
migration-plan/runbook.md— customized step-by-step runbookmigration-plan/iam-policy-concrete.json— attach this to the runner's IAM rolemigration-plan/cost-estimate.json— estimated S3 costs (~$12 for 500GB)
Step 2: Attach IAM policy
aws iam put-role-policy \
--role-name migration-runner-role \
--policy-name kafka-migration \
--policy-document file://migration-plan/iam-policy-concrete.json
Step 3: Run precheck
kafka-backup migrate msk-kraft precheck --config migration.yaml
Expected output: no blockers. You may see:
- I01: Target is IAM-auth — ACLs emitted as access-map.json (expected for IAM targets)
- W04: Target message-size floor could not be verified from dynamic broker config (manually verify
message.max.bytesandreplica.fetch.max.bytes)
Example precheck output from an IAM migration:
W04 warn: could not verify target message-size floor (target broker DescribeConfigs returned no message.max.bytes or replica.fetch.max.bytes (dynamic-config only on this broker)) — ensure target `message.max.bytes` and `replica.fetch.max.bytes` ≥ largest source topic's effective max.message.bytes
W03 info: KMS key ARN set on backup channel — CMK access is not verified by this precheck phase; ensure the caller has kms:Encrypt/Decrypt/GenerateDataKey
I01 info: target is IAM-auth — ACLs will be emitted as access-map.json for customer IaC to translate to IAM policies (tool does not apply IAM)
Step 4: Execute
kafka-backup migrate msk-kraft execute \
--config migration.yaml \
--journal-dir ./journal
For 500GB, expect:
- Seed phase: ~3-4 hours
- Tail convergence: ~10-15 minutes
- Total to drain-ready: ~4 hours
Example drain-ready output:
2026-04-25T06:15:07.512121Z topology_copy -> seed
2026-04-25T06:25:30.777840Z seed -> tail
2026-04-25T06:26:19.079610Z tail -> drain_ready drain ready: max_partition_lag=0 records_replayed=0 bytes_replayed=0
Step 5: Cutover
Coordinate with application teams, then:
kafka-backup migrate msk-kraft cutover \
--config migration.yaml \
--migration-id <ID> \
--journal-dir ./journal
The webhook at https://internal-api.example.com/kafka/freeze receives a POST request. Your application pauses producers for ~30 seconds while cutover completes.
Example cutover-ready output:
2026-04-25T06:40:46.854037Z cutover -> awaiting_client_switch READY_FOR_CLIENT_SWITCH: groups_translated=0 offsets_committed=0 warnings=0
Step 6: Switch clients
Update application configs to point to the KRaft cluster bootstrap servers:
# Get new bootstrap servers
aws kafka get-bootstrap-brokers \
--cluster-arn arn:aws:kafka:us-east-1:123456789012:cluster/prod-kraft/a1b2c3d4-5678-90ab-cdef-222222222222
Roll your deployments. Consumers resume from the translated target offsets, preserving message continuity across the switch.
Step 7: Acknowledge and finalize
# Acknowledge client switch
kafka-backup migrate msk-kraft cutover-ack \
--config migration.yaml \
--migration-id <ID> \
--journal-dir ./journal
# Finalize (runs validation + uploads evidence)
kafka-backup migrate msk-kraft finalize \
--config migration.yaml \
--migration-id <ID> \
--journal-dir ./journal