MSK KRaft Migration Precheck Codes
The precheck command performs read-only analysis of both clusters and reports findings at three severity levels. Precheck is free — no license required.
kafka-backup migrate msk-kraft precheck --config migration.yaml
Severity Levels
| Severity | Code prefix | Effect |
|---|---|---|
| Blocker | B## | Migration cannot proceed. Must be resolved first. |
| Warning | W## | Migration can proceed, but review the finding. |
| Info | I## | Informational. No action required. |
Blockers
B02: MSK Serverless Cluster
Message: "{which} MSK Serverless; this migrator only operates on MSK Provisioned clusters"
Cause: One or both cluster ARNs point to MSK Serverless clusters. MSK Serverless is KRaft-only by construction — there is no ZK variant to migrate from.
Fix: Use MSK Provisioned cluster ARNs. The source must be ZK-mode Provisioned, the target must be KRaft-mode Provisioned.
B03: Source Not ZooKeeper Mode
Message: "source cluster metadata mode is {mode}, expected ZOOKEEPER"
Cause: The source cluster is already running in KRaft mode. There is nothing to migrate.
Fix: Verify the source ARN points to a ZooKeeper-mode cluster.
B04: Target Not KRaft Mode
Message: "target cluster metadata mode is {mode}, expected KRAFT"
Cause: The target cluster is not running in KRaft mode.
Fix: Provision the target MSK cluster with KRaft mode enabled (requires Kafka 3.7+).
B05: Source and Target Are the Same Cluster
Message: "source and target ARNs are identical"
Cause: Both source.cluster_arn and target.cluster_arn point to the same cluster. In-place migration is not supported.
Fix: Create a separate KRaft-mode MSK cluster for the target.
B06: Target Kafka Version Too Low
Message: "target Kafka version {version} is below minimum 3.7 required for KRaft on MSK"
Cause: The target cluster is running a Kafka version that does not support KRaft on MSK.
Fix: Upgrade the target MSK cluster to Kafka 3.7.x or later.
B07: Backup S3 Bucket Not Reachable
Message: "backup S3 bucket '{bucket}' not reachable: {error}"
Cause: The HeadBucket API call failed on the backup S3 bucket. Either the bucket does not exist or the caller lacks permissions.
Fix:
- Create the bucket:
aws s3 mb s3://<bucket> --region <region> - Ensure the migration runner's IAM role has
s3:HeadBucket,s3:GetObject,s3:PutObject,s3:ListBucketon this bucket
B08: Evidence S3 Bucket Not Reachable
Message: "evidence S3 bucket '{bucket}' not reachable: {error}"
Cause: Same as B07, but for the evidence bucket.
Fix: Same as B07. If using S3 Object Lock, ensure the bucket was created with Object Lock enabled (cannot be added retroactively).
B09: Source Kafka Not Reachable
Message: "source Kafka protocol not reachable: {error}"
Cause: Cannot connect to the source cluster's bootstrap servers or fetch metadata.
Fix:
- Verify bootstrap servers are correct (check MSK console or
aws kafka get-bootstrap-brokers) - Check security group ingress rules — the migration runner must reach the broker ports
- Verify auth mode matches the cluster's authentication configuration
- For SCRAM: verify the username/password are correct and the SCRAM secret exists in AWS Secrets Manager
B10: Target Kafka Not Reachable
Message: "target Kafka protocol not reachable: {error}"
Cause: Same as B09, but for the target cluster.
Fix: Same as B09.
B11: Target message.max.bytes Too Small
Message: "target broker message.max.bytes={value} is below the largest source topic's effective max.message.bytes={max} (topic '{topic}') — replay would fail with RecordTooLargeException"
Cause: The target cluster's message.max.bytes broker setting is smaller than the largest message size allowed by any source topic. During restore, oversized records would be rejected.
Fix: Raise the target broker's message.max.bytes to at least match the source floor. Update the MSK cluster configuration:
aws kafka update-cluster-configuration \
--cluster-arn <target-arn> \
--configuration-info '{"Arn":"<config-arn>","Revision":<N>}' \
--current-version <cluster-version>
B12: Target replica.fetch.max.bytes Too Small
Message: "target broker replica.fetch.max.bytes={value} is below the largest source topic's effective max.message.bytes={max} (topic '{topic}') — replication would stall on oversized batches"
Cause: The target's inter-broker replication cannot handle the largest messages from source.
Fix: Raise the target broker's replica.fetch.max.bytes alongside message.max.bytes.
B13: Reverse Replication Not Implemented
Message: "cutover.reverse_replication_enabled=true, but reverse replication is not implemented"
Cause: The config enables a feature that is not yet available.
Fix: Set cutover.reverse_replication_enabled: false in your config. Post-cutover rollback to the source cluster is a manual procedure.
Warnings
W01: Target Has Fewer Brokers
Message: "target has {target} brokers but source has {source} — consider scaling up before seed"
Cause: The target cluster has fewer brokers than the source. Topics with replication factor equal to source broker count may not be replicable.
Action: Consider scaling up the target cluster before migration.
W02: Cross-Region Migration
Message: "source region {source} ≠ target region {target} — seed + tail will incur egress bandwidth cost"
Cause: Source and target are in different AWS regions. Data transfer between regions incurs egress charges.
Action: Review the cost estimate from plan --format cost. Consider whether the data transfer cost is acceptable.
W03: KMS Key Configured
Message: "KMS key ARN set on backup channel — CMK access is not verified by this precheck phase; ensure the caller has kms:Encrypt/Decrypt/GenerateDataKey"
Cause: A custom KMS key is configured for S3 encryption. Precheck does not verify KMS permissions.
Action: Ensure the migration runner's IAM role has kms:Encrypt, kms:Decrypt, and kms:GenerateDataKey on the specified KMS key ARN.
W04: Message Size Check Skipped
Message: "could not verify target message-size floor ({reason}) — ensure target message.max.bytes and replica.fetch.max.bytes ≥ largest source topic's effective max.message.bytes"
Cause: The DescribeConfigs API call failed for source or target brokers, but the brokers are reachable. This is a fail-open scenario.
Action: Manually verify that the target's message.max.bytes and replica.fetch.max.bytes are sufficient.
W05: Static Consumer Group Members
Message: "source cluster has static consumer-group members ({summary}). Post-cutover, these consumers MUST restart against the target with the same group.instance.id values..."
Cause: Some consumer groups use static membership (group.instance.id). These consumers must reconnect to the target with identical instance IDs to avoid a full group rebalance.
Action: Ensure application deployments preserve group.instance.id values when switching to the target cluster.
W06: Transactional Producers Detected
Message: "source cluster has {total} transactional producer(s)..."
Cause: Transactional state (producer ID + epoch) does not migrate. Exactly-once guarantees do not span the cutover boundary.
Action: Applications using transactions must call initTransactions() after reconnecting to the target. Active transactions should be drained on source before pressing cutover.
W07: Log-Compacted Topics
Message: "{count} source topic(s) use cleanup.policy=compact..."
Cause: Compacted topics may have records deleted between seed and tail phases. The validation suite treats empty fetches and drift on compacted topics as warnings instead of failures.
Action: If bit-for-bit parity is required for compacted topics, run your own diff after finalize.
W08: SCRAM Target Needs Pre-Provisioned Users
Message: "target uses SCRAM-SHA-512 — SCRAM user credentials cannot be read via the Kafka protocol..."
Cause: The target cluster uses SCRAM authentication. SCRAM user credentials (stored in AWS Secrets Manager) cannot be read or copied programmatically. If the same users don't exist on the target, copied ACLs will reference unauthenticated principals.
Action: Pre-provision all SCRAM users on the target cluster before cutover. Use aws kafka batch-associate-scram-secret to associate the same Secrets Manager secrets.
W09: MSK Internal ACLs Will Be Filtered
Message: "{count} source ACL binding(s) reference MSK/Kafka internal principals or resources and will be filtered during ACL copy"
Cause: Some ACL bindings on the source reference internal principals (e.g., User:ANONYMOUS) or internal resources (__consumer_offsets). These are managed by MSK automatically and should not be copied.
Action: No action needed. The filtered bindings are logged for transparency.
W10: Finite Delete-Retention Topics
Message: "{count} source topic(s) use finite delete retention ({topic}({retention_ms}ms)...). SEED restores original CreateTime timestamps, so the target broker may advance log-start before cutover if old restored records become retention-eligible. Temporarily extend topic retention for the migration window, or rely on the cutover offset-floor guard to block the client switch if truncation occurs."
Cause: One or more source topics use cleanup.policy=delete with finite retention.ms. Kafka retention uses the record timestamp when message.timestamp.type=CreateTime, so restored historical records may become immediately retention-eligible on the target.
Action: Temporarily extend retention for affected topics during the migration window, or set retention to -1 until finalize completes. Keep the target offset-floor guard enabled; it verifies target log-start offsets before READY_FOR_CLIENT_SWITCH and blocks the switch if the target has already truncated copied data.
Example topic configuration that triggers this warning:
cleanup.policy=delete
message.timestamp.type=CreateTime
retention.ms=604800000
segment.ms=604800000
Info
I01: IAM Target — ACLs Emitted as Access Map
Message: "target is IAM-auth — ACLs will be emitted as access-map.json for customer IaC to translate to IAM policies (tool does not apply IAM)"
Cause: The target uses IAM authentication. Kafka ACLs don't apply on IAM-auth clusters. Instead, the tool generates an access-map.json that maps source ACL principals and permissions to the IAM policies you need to create.
Action: After migration, apply the generated IAM policies using your infrastructure-as-code tooling (Terraform, CloudFormation, CDK).
Next Steps
- Production Migration Runbook — step-by-step guide
- Configuration Reference — tune precheck-related settings
- Troubleshooting — post-precheck error resolution