Example: MSK KRaft Migration (IAM to IAM)

This example walks through a complete migration of a 3-broker MSK ZooKeeper cluster to a 3-broker MSK KRaft cluster, both using IAM authentication in us-east-1.

Scenario

	Source	Target
Cluster type	MSK Provisioned	MSK Provisioned
Metadata mode	ZooKeeper	KRaft
Kafka version	3.6.0	3.9.0
Authentication	IAM	IAM
Brokers	3 (kafka.m5.large)	3 (kafka.m5.large)
Topics	50	0 (empty target)
Data volume	~500 GB	—
Consumer groups	12	—

Infrastructure Setup

1. Create the target KRaft cluster

# Create MSK configuration
aws kafka create-configuration \
  --name "prod-kraft-config" \
  --kafka-versions "3.9.0" \
  --server-properties "$(cat <<'EOF'
auto.create.topics.enable=false
default.replication.factor=3
min.insync.replicas=2
num.partitions=6
EOF
)"

# Create the KRaft cluster
aws kafka create-cluster-v2 \
  --cluster-name "prod-kraft" \
  --provisioned '{
    "brokerNodeGroupInfo": {
      "instanceType": "kafka.m5.large",
      "clientSubnets": ["subnet-aaa", "subnet-bbb", "subnet-ccc"],
      "securityGroups": ["sg-migration"],
      "storageInfo": {"ebsStorageInfo": {"volumeSize": 1000}}
    },
    "numberOfBrokerNodes": 3,
    "clientAuthentication": {"sasl": {"iam": {"enabled": true}}},
    "encryptionInfo": {
      "encryptionInTransit": {"clientBroker": "TLS", "inCluster": true}
    },
    "kafkaVersion": "3.9.0"
  }'

2. Create S3 buckets

aws s3 mb s3://prod-migration-segments --region us-east-1
aws s3 mb s3://prod-migration-evidence --region us-east-1

# Optional: enable Object Lock on the evidence bucket for compliance
aws s3api put-object-lock-configuration \
  --bucket prod-migration-evidence \
  --object-lock-configuration '{
    "ObjectLockEnabled": "Enabled",
    "Rule": {"DefaultRetention": {"Mode": "COMPLIANCE", "Years": 7}}
  }'

3. Configure IAM permissions

The migration runner's IAM role needs access to both clusters and both buckets. Use the generated IAM policy from the plan command (Step 2 below) for the exact permissions.

Migration Config

migration.yaml
enterprise:
  msk_kraft_migration:
    source:
      cluster_arn: arn:aws:kafka:us-east-1:123456789012:cluster/prod-zk/a1b2c3d4-5678-90ab-cdef-111111111111
      auth:
        mode: iam
    target:
      cluster_arn: arn:aws:kafka:us-east-1:123456789012:cluster/prod-kraft/a1b2c3d4-5678-90ab-cdef-222222222222
      auth:
        mode: iam
    backup:
      s3_bucket: prod-migration-segments
      s3_prefix: zk-to-kraft/
    evidence:
      s3_bucket: prod-migration-evidence
      s3_prefix: migrations/
      retention: 7y
    cutover:
      drain_timeout: 30m
      drain_max_partition_lag: 100
      max_producer_freeze: 60s
      producer_freeze_webhook: https://internal-api.example.com/kafka/freeze
    validation:
      count_tolerance: 1
      spot_check_records_per_partition: 5
    seed:
      max_concurrent_partitions: 8
    acl:
      on_drift: merge

Run the Migration

Step 1: Generate the plan

kafka-backup migrate msk-kraft plan \
  --config migration.yaml \
  --format all \
  --out-dir ./migration-plan

Review the generated artifacts:

migration-plan/runbook.md — customized step-by-step runbook
migration-plan/iam-policy-concrete.json — attach this to the runner's IAM role
migration-plan/cost-estimate.json — estimated S3 costs (~$12 for 500GB)

Step 2: Attach IAM policy

aws iam put-role-policy \
  --role-name migration-runner-role \
  --policy-name kafka-migration \
  --policy-document file://migration-plan/iam-policy-concrete.json

Step 3: Run precheck

kafka-backup migrate msk-kraft precheck --config migration.yaml

Expected output: no blockers. You may see:

I01: Target is IAM-auth — ACLs emitted as access-map.json (expected for IAM targets)
W04: Target message-size floor could not be verified from dynamic broker config (manually verify message.max.bytes and replica.fetch.max.bytes)

Example precheck output from an IAM migration:

W04 warn: could not verify target message-size floor (target broker DescribeConfigs returned no message.max.bytes or replica.fetch.max.bytes (dynamic-config only on this broker)) — ensure target `message.max.bytes` and `replica.fetch.max.bytes` ≥ largest source topic's effective max.message.bytes
W03 info: KMS key ARN set on backup channel — CMK access is not verified by this precheck phase; ensure the caller has kms:Encrypt/Decrypt/GenerateDataKey
I01 info: target is IAM-auth — ACLs will be emitted as access-map.json for customer IaC to translate to IAM policies (tool does not apply IAM)

Step 4: Execute

kafka-backup migrate msk-kraft execute \
  --config migration.yaml \
  --journal-dir ./journal

For 500GB, expect:

Seed phase: ~3-4 hours
Tail convergence: ~10-15 minutes
Total to drain-ready: ~4 hours

Example drain-ready output:

2026-04-25T06:15:07.512121Z topology_copy -> seed
2026-04-25T06:25:30.777840Z seed -> tail
2026-04-25T06:26:19.079610Z tail -> drain_ready drain ready: max_partition_lag=0 records_replayed=0 bytes_replayed=0

Step 5: Cutover

Coordinate with application teams, then:

kafka-backup migrate msk-kraft cutover \
  --config migration.yaml \
  --migration-id <ID> \
  --journal-dir ./journal

The webhook at https://internal-api.example.com/kafka/freeze receives a POST request. Your application pauses producers for ~30 seconds while cutover completes.

Example cutover-ready output:

2026-04-25T06:40:46.854037Z cutover -> awaiting_client_switch READY_FOR_CLIENT_SWITCH: groups_translated=0 offsets_committed=0 warnings=0

Step 6: Switch clients

Update application configs to point to the KRaft cluster bootstrap servers:

# Get new bootstrap servers
aws kafka get-bootstrap-brokers \
  --cluster-arn arn:aws:kafka:us-east-1:123456789012:cluster/prod-kraft/a1b2c3d4-5678-90ab-cdef-222222222222

Roll your deployments. Consumers resume from the translated target offsets, preserving message continuity across the switch.

Step 7: Acknowledge and finalize

# Acknowledge client switch
kafka-backup migrate msk-kraft cutover-ack \
  --config migration.yaml \
  --migration-id <ID> \
  --journal-dir ./journal

# Finalize (runs validation + uploads evidence)
kafka-backup migrate msk-kraft finalize \
  --config migration.yaml \
  --migration-id <ID> \
  --journal-dir ./journal

Verify the Evidence

aws s3 cp \
  "s3://prod-migration-evidence/migrations/<MIGRATION_ID>/evidence.json" \
  ./evidence.json

# Check validation outcome
cat evidence.json | jq -r '.bundle_json' | jq '.validation.overall'
# Expected: "PASSED"

# Check per-check results
cat evidence.json | jq -r '.bundle_json' | jq '{
  topic_parity: .validation.topic_parity.outcome,
  counts_and_offsets: .validation.counts_and_offsets.outcome,
  spot_check_records: .validation.spot_check_records.outcome,
  sentinel_presence: .validation.sentinel_presence.outcome,
  consumer_group_reconciliation: .validation.consumer_group_reconciliation.outcome
}'

Optional source/target comparison after finalize:

partitions_checked=306
target_behind_or_missing=0
earliest_partitions_checked=306
earliest_mismatches=0
latest_partitions_checked=306
latest_mismatches=0

Clean Up

After verifying the migration is successful and all applications are stable on the KRaft cluster:

# Remove migration segments from S3
aws s3 rm s3://prod-migration-segments/zk-to-kraft/ --recursive

# Decommission the source ZK cluster (when confident)
aws kafka delete-cluster \
  --cluster-arn arn:aws:kafka:us-east-1:123456789012:cluster/prod-zk/a1b2c3d4-5678-90ab-cdef-111111111111

Keep the evidence bucket

Do not delete the evidence bucket. The signed evidence bundle is your compliance proof that the migration succeeded. With Object Lock enabled, it's retained for 7 years automatically.

Next Steps

Cross-Auth Example — SCRAM to IAM migration
Monitoring Guide — what to watch during migration
Troubleshooting — common migration errors

Scenario​

Infrastructure Setup​

1. Create the target KRaft cluster​

2. Create S3 buckets​

3. Configure IAM permissions​

Migration Config​

Run the Migration​

Step 1: Generate the plan​

Step 2: Attach IAM policy​

Step 3: Run precheck​

Step 4: Execute​

Step 5: Cutover​

Step 6: Switch clients​

Step 7: Acknowledge and finalize​

Verify the Evidence​

Clean Up​

Next Steps​