Skip to main content

Example: MSK KRaft Migration (IAM to IAM)

This example walks through a complete migration of a 3-broker MSK ZooKeeper cluster to a 3-broker MSK KRaft cluster, both using IAM authentication in us-east-1.

Scenario

SourceTarget
Cluster typeMSK ProvisionedMSK Provisioned
Metadata modeZooKeeperKRaft
Kafka version3.6.03.9.0
AuthenticationIAMIAM
Brokers3 (kafka.m5.large)3 (kafka.m5.large)
Topics500 (empty target)
Data volume~500 GB
Consumer groups12

Infrastructure Setup

1. Create the target KRaft cluster

# Create MSK configuration
aws kafka create-configuration \
--name "prod-kraft-config" \
--kafka-versions "3.9.0" \
--server-properties "$(cat <<'EOF'
auto.create.topics.enable=false
default.replication.factor=3
min.insync.replicas=2
num.partitions=6
EOF
)"

# Create the KRaft cluster
aws kafka create-cluster-v2 \
--cluster-name "prod-kraft" \
--provisioned '{
"brokerNodeGroupInfo": {
"instanceType": "kafka.m5.large",
"clientSubnets": ["subnet-aaa", "subnet-bbb", "subnet-ccc"],
"securityGroups": ["sg-migration"],
"storageInfo": {"ebsStorageInfo": {"volumeSize": 1000}}
},
"numberOfBrokerNodes": 3,
"clientAuthentication": {"sasl": {"iam": {"enabled": true}}},
"encryptionInfo": {
"encryptionInTransit": {"clientBroker": "TLS", "inCluster": true}
},
"kafkaVersion": "3.9.0"
}'

2. Create S3 buckets

aws s3 mb s3://prod-migration-segments --region us-east-1
aws s3 mb s3://prod-migration-evidence --region us-east-1

# Optional: enable Object Lock on the evidence bucket for compliance
aws s3api put-object-lock-configuration \
--bucket prod-migration-evidence \
--object-lock-configuration '{
"ObjectLockEnabled": "Enabled",
"Rule": {"DefaultRetention": {"Mode": "COMPLIANCE", "Years": 7}}
}'

3. Configure IAM permissions

The migration runner's IAM role needs access to both clusters and both buckets. Use the generated IAM policy from the plan command (Step 2 below) for the exact permissions.

Migration Config

migration.yaml
enterprise:
msk_kraft_migration:
source:
cluster_arn: arn:aws:kafka:us-east-1:123456789012:cluster/prod-zk/a1b2c3d4-5678-90ab-cdef-111111111111
auth:
mode: iam
target:
cluster_arn: arn:aws:kafka:us-east-1:123456789012:cluster/prod-kraft/a1b2c3d4-5678-90ab-cdef-222222222222
auth:
mode: iam
backup:
s3_bucket: prod-migration-segments
s3_prefix: zk-to-kraft/
evidence:
s3_bucket: prod-migration-evidence
s3_prefix: migrations/
retention: 7y
cutover:
drain_timeout: 30m
drain_max_partition_lag: 100
max_producer_freeze: 60s
producer_freeze_webhook: https://internal-api.example.com/kafka/freeze
validation:
count_tolerance: 1
spot_check_records_per_partition: 5
seed:
max_concurrent_partitions: 8
acl:
on_drift: merge

Run the Migration

Step 1: Generate the plan

kafka-backup migrate msk-kraft plan \
--config migration.yaml \
--format all \
--out-dir ./migration-plan

Review the generated artifacts:

  • migration-plan/runbook.md — customized step-by-step runbook
  • migration-plan/iam-policy-concrete.json — attach this to the runner's IAM role
  • migration-plan/cost-estimate.json — estimated S3 costs (~$12 for 500GB)

Step 2: Attach IAM policy

aws iam put-role-policy \
--role-name migration-runner-role \
--policy-name kafka-migration \
--policy-document file://migration-plan/iam-policy-concrete.json

Step 3: Run precheck

kafka-backup migrate msk-kraft precheck --config migration.yaml

Expected output: no blockers. You may see:

  • I01: Target is IAM-auth — ACLs emitted as access-map.json (expected for IAM targets)
  • W04: Target message-size floor could not be verified from dynamic broker config (manually verify message.max.bytes and replica.fetch.max.bytes)

Example precheck output from an IAM migration:

W04 warn: could not verify target message-size floor (target broker DescribeConfigs returned no message.max.bytes or replica.fetch.max.bytes (dynamic-config only on this broker)) — ensure target `message.max.bytes` and `replica.fetch.max.bytes` ≥ largest source topic's effective max.message.bytes
W03 info: KMS key ARN set on backup channel — CMK access is not verified by this precheck phase; ensure the caller has kms:Encrypt/Decrypt/GenerateDataKey
I01 info: target is IAM-auth — ACLs will be emitted as access-map.json for customer IaC to translate to IAM policies (tool does not apply IAM)

Step 4: Execute

kafka-backup migrate msk-kraft execute \
--config migration.yaml \
--journal-dir ./journal

For 500GB, expect:

  • Seed phase: ~3-4 hours
  • Tail convergence: ~10-15 minutes
  • Total to drain-ready: ~4 hours

Example drain-ready output:

2026-04-25T06:15:07.512121Z topology_copy -> seed
2026-04-25T06:25:30.777840Z seed -> tail
2026-04-25T06:26:19.079610Z tail -> drain_ready drain ready: max_partition_lag=0 records_replayed=0 bytes_replayed=0

Step 5: Cutover

Coordinate with application teams, then:

kafka-backup migrate msk-kraft cutover \
--config migration.yaml \
--migration-id <ID> \
--journal-dir ./journal

The webhook at https://internal-api.example.com/kafka/freeze receives a POST request. Your application pauses producers for ~30 seconds while cutover completes.

Example cutover-ready output:

2026-04-25T06:40:46.854037Z cutover -> awaiting_client_switch READY_FOR_CLIENT_SWITCH: groups_translated=0 offsets_committed=0 warnings=0

Step 6: Switch clients

Update application configs to point to the KRaft cluster bootstrap servers:

# Get new bootstrap servers
aws kafka get-bootstrap-brokers \
--cluster-arn arn:aws:kafka:us-east-1:123456789012:cluster/prod-kraft/a1b2c3d4-5678-90ab-cdef-222222222222

Roll your deployments. Consumers resume from the translated target offsets, preserving message continuity across the switch.

Step 7: Acknowledge and finalize

# Acknowledge client switch
kafka-backup migrate msk-kraft cutover-ack \
--config migration.yaml \
--migration-id <ID> \
--journal-dir ./journal

# Finalize (runs validation + uploads evidence)
kafka-backup migrate msk-kraft finalize \
--config migration.yaml \
--migration-id <ID> \
--journal-dir ./journal

Verify the Evidence

aws s3 cp \
"s3://prod-migration-evidence/migrations/<MIGRATION_ID>/evidence.json" \
./evidence.json

# Check validation outcome
cat evidence.json | jq -r '.bundle_json' | jq '.validation.overall'
# Expected: "PASSED"

# Check per-check results
cat evidence.json | jq -r '.bundle_json' | jq '{
topic_parity: .validation.topic_parity.outcome,
counts_and_offsets: .validation.counts_and_offsets.outcome,
spot_check_records: .validation.spot_check_records.outcome,
sentinel_presence: .validation.sentinel_presence.outcome,
consumer_group_reconciliation: .validation.consumer_group_reconciliation.outcome
}'

Optional source/target comparison after finalize:

partitions_checked=306
target_behind_or_missing=0
earliest_partitions_checked=306
earliest_mismatches=0
latest_partitions_checked=306
latest_mismatches=0

Clean Up

After verifying the migration is successful and all applications are stable on the KRaft cluster:

# Remove migration segments from S3
aws s3 rm s3://prod-migration-segments/zk-to-kraft/ --recursive

# Decommission the source ZK cluster (when confident)
aws kafka delete-cluster \
--cluster-arn arn:aws:kafka:us-east-1:123456789012:cluster/prod-zk/a1b2c3d4-5678-90ab-cdef-111111111111
Keep the evidence bucket

Do not delete the evidence bucket. The signed evidence bundle is your compliance proof that the migration succeeded. With Object Lock enabled, it's retained for 7 years automatically.

Next Steps