Skip to main content

Example: MSK KRaft Migration (IAM to IAM)

This example walks through a complete migration of a 3-broker MSK ZooKeeper cluster to a 3-broker MSK KRaft cluster, both using IAM authentication in us-east-1.

Release evidence

This page is a worked IAM configuration example. The May 8, 2026 full AWS release qualification proved the same migration flow on SCRAM-SHA-512 source and SCRAM-SHA-512 target across four KRaft targets. Run this IAM path in staging before using the result as production change evidence.

Scenario

SourceTarget
Cluster typeMSK ProvisionedMSK Provisioned
Metadata modeZooKeeperKRaft
Kafka version3.6.03.9.0
AuthenticationIAMIAM
Brokers3 (kafka.m5.large)3 (kafka.m5.large)
Topics500 (empty target)
Data volume~500 GB
Consumer groups12

Infrastructure Setup

1. Create the target KRaft cluster

# Create MSK configuration
aws kafka create-configuration \
--name "prod-kraft-config" \
--kafka-versions "3.9.0" \
--server-properties "$(cat <<'EOF'
auto.create.topics.enable=false
default.replication.factor=3
min.insync.replicas=2
num.partitions=6
EOF
)"

# Create the KRaft cluster
aws kafka create-cluster-v2 \
--cluster-name "prod-kraft" \
--provisioned '{
"brokerNodeGroupInfo": {
"instanceType": "kafka.m5.large",
"clientSubnets": ["subnet-aaa", "subnet-bbb", "subnet-ccc"],
"securityGroups": ["sg-migration"],
"storageInfo": {"ebsStorageInfo": {"volumeSize": 1000}}
},
"numberOfBrokerNodes": 3,
"clientAuthentication": {"sasl": {"iam": {"enabled": true}}},
"encryptionInfo": {
"encryptionInTransit": {"clientBroker": "TLS", "inCluster": true}
},
"kafkaVersion": "3.9.0"
}'

2. Create S3 buckets

aws s3 mb s3://prod-migration-segments --region us-east-1
aws s3 mb s3://prod-migration-evidence --region us-east-1

# Optional: enable Object Lock on the evidence bucket for compliance
aws s3api put-object-lock-configuration \
--bucket prod-migration-evidence \
--object-lock-configuration '{
"ObjectLockEnabled": "Enabled",
"Rule": {"DefaultRetention": {"Mode": "COMPLIANCE", "Years": 7}}
}'

3. Configure IAM permissions

The migration runner's IAM role needs access to both clusters and both buckets. Use the generated IAM policy from the plan command (Step 2 below) for the exact permissions.

Migration Config

migration.yaml
enterprise:
msk_kraft_migration:
source:
cluster_arn: arn:aws:kafka:us-east-1:123456789012:cluster/prod-zk/a1b2c3d4-5678-90ab-cdef-111111111111
auth:
mode: iam
target:
cluster_arn: arn:aws:kafka:us-east-1:123456789012:cluster/prod-kraft/a1b2c3d4-5678-90ab-cdef-222222222222
auth:
mode: iam
backup:
s3_bucket: prod-migration-segments
s3_prefix: zk-to-kraft/
evidence:
s3_bucket: prod-migration-evidence
s3_prefix: migrations/
retention: 7y
cutover:
drain_timeout: 30m
drain_max_partition_lag: 100
max_producer_freeze: 60s
producer_freeze_webhook: https://internal-api.example.com/kafka/freeze
validation:
count_tolerance: 1
spot_check_records_per_partition: 5
seed:
max_concurrent_partitions: 8
acl:
on_drift: merge

Run the Migration

Step 1: Generate the plan

kafka-backup migrate msk-kraft plan \
--config migration.yaml \
--format all \
--out-dir ./migration-plan

Review the generated artifacts:

  • migration-plan/runbook.md — customized step-by-step runbook
  • migration-plan/iam-policy-concrete.json — attach this to the runner's IAM role
  • migration-plan/cost-estimate.json — estimated S3 costs (~$12 for 500GB)

Step 2: Attach IAM policy

aws iam put-role-policy \
--role-name migration-runner-role \
--policy-name kafka-migration \
--policy-document file://migration-plan/iam-policy-concrete.json

Step 3: Run precheck

kafka-backup migrate msk-kraft precheck --config migration.yaml

Expected output: no blockers. You may see:

  • I01: Target is IAM-auth — ACLs emitted as access-map.json (expected for IAM targets)
  • W04: Target message-size floor could not be verified from dynamic broker config (manually verify message.max.bytes and replica.fetch.max.bytes)

Example precheck output from an IAM migration:

W04 warn: could not verify target message-size floor (target broker DescribeConfigs returned no message.max.bytes or replica.fetch.max.bytes (dynamic-config only on this broker)) — ensure target `message.max.bytes` and `replica.fetch.max.bytes` ≥ largest source topic's effective max.message.bytes
W03 info: KMS key ARN set on backup channel — CMK access is not verified by this precheck phase; ensure the caller has kms:Encrypt/Decrypt/GenerateDataKey
I01 info: target is IAM-auth — ACLs will be emitted as access-map.json for customer IaC to translate to IAM policies (tool does not apply IAM)

Step 4: Execute

kafka-backup migrate msk-kraft execute \
--config migration.yaml \
--journal-dir ./journal

For 500GB, expect:

  • Seed phase: ~3-4 hours
  • Tail convergence: ~10-15 minutes
  • Total to drain-ready: ~4 hours

Example drain-ready output:

<timestamp> topology_copy -> seed
<timestamp> seed -> tail
<timestamp> tail -> drain_ready drain ready: max_partition_lag=<n> records_replayed=<n> bytes_replayed=<n>

Step 5: Cutover

Coordinate with application teams, then:

kafka-backup migrate msk-kraft cutover \
--config migration.yaml \
--migration-id <ID> \
--journal-dir ./journal

The webhook at https://internal-api.example.com/kafka/freeze receives a POST request. Your application pauses producers for ~30 seconds while cutover completes.

Example cutover-ready output:

<timestamp> cutover -> awaiting_client_switch READY_FOR_CLIENT_SWITCH: groups_translated=<n> offsets_committed=<n> warnings=0

Step 6: Switch clients

Update application configs to point to the KRaft cluster bootstrap servers:

# Get new bootstrap servers
aws kafka get-bootstrap-brokers \
--cluster-arn arn:aws:kafka:us-east-1:123456789012:cluster/prod-kraft/a1b2c3d4-5678-90ab-cdef-222222222222

Roll your deployments. Consumers resume from the translated target offsets, preserving message continuity across the switch.

Step 7: Acknowledge and finalize

# Acknowledge client switch
kafka-backup migrate msk-kraft cutover-ack \
--config migration.yaml \
--migration-id <ID> \
--journal-dir ./journal

# Finalize (runs validation + uploads evidence)
kafka-backup migrate msk-kraft finalize \
--config migration.yaml \
--migration-id <ID> \
--journal-dir ./journal

Verify the Evidence

aws s3 cp \
"s3://prod-migration-evidence/migrations/<MIGRATION_ID>/evidence.json" \
./evidence.json

# Check validation outcome
cat evidence.json | jq -r '.bundle_json' | jq '.validation.overall'
# Expected: "PASSED", or "WARNING" when the report explains an accepted warning such as empty partitions with no spot-check sample.

# Check per-check results
cat evidence.json | jq -r '.bundle_json' | jq '{
topic_parity: .validation.topic_parity.outcome,
counts_and_offsets: .validation.counts_and_offsets.outcome,
offset_floor_violations: .validation.counts_and_offsets.data.offset_floor_violations,
spot_check_records: .validation.spot_check_records.outcome,
sentinel_presence: .validation.sentinel_presence.outcome,
consumer_group_reconciliation: .validation.consumer_group_reconciliation.outcome
}'

Optional source/target comparison after finalize:

topic_parity=PASSED
counts_and_offsets=PASSED
offset_floor_violations=0
sentinel_presence=PASSED
consumer_group_reconciliation=PASSED

Clean Up

After verifying the migration is successful and all applications are stable on the KRaft cluster:

# Remove migration segments from S3
aws s3 rm s3://prod-migration-segments/zk-to-kraft/ --recursive

# Decommission the source ZK cluster (when confident)
aws kafka delete-cluster \
--cluster-arn arn:aws:kafka:us-east-1:123456789012:cluster/prod-zk/a1b2c3d4-5678-90ab-cdef-111111111111
Keep the evidence bucket

Do not delete the evidence bucket. The signed evidence bundle is your compliance proof that the migration succeeded. With Object Lock enabled, it's retained for 7 years automatically.

Next Steps