Kafka Replication Across Data Centers: Architecture Patterns for Multi-DC
Kafka replication across data centers comes down to one architectural decision: run a single stretched cluster spanning your datacenters, or run independent clusters connected by asynchronous replication. Stretched clusters give you zero data loss but demand low-latency links. Separate clusters tolerate WAN conditions but accept a non-zero recovery point. This guide covers both designs, when each wins, and how to build the second one with MirrorMaker 2.
Stretch a single cluster only when your datacenters are close enough for single-digit-millisecond round trips and you can place brokers in three locations. Everywhere else, run one cluster per datacenter and replicate asynchronously with MirrorMaker 2 — then size the link for peak produce throughput, not the average.
Why replicate Kafka across data centers?
A single-datacenter Kafka cluster has a single failure domain. replication.factor=3
protects you from losing a broker, not from losing the building. Teams add a
second datacenter for four reasons:
- Disaster recovery. Survive a datacenter failure with a standby cluster that already holds your data.
- Latency and locality. Serve consumers from the datacenter nearest to them instead of hauling every read across a WAN.
- Data residency. Keep regulated data in-country while replicating only the topics that are allowed to leave.
- Migration. Move workloads between datacenters or into the cloud gradually, with both sides live during the transition (see the migration use cases).
Each reason pushes you toward one of two designs, and they behave very differently under failure.
Stretched cluster vs. cross-DC replication
A stretched cluster is one Kafka cluster whose brokers live in multiple
datacenters. Partition replicas are spread across sites, and with acks=all
plus a correctly set min.insync.replicas, every write is confirmed in more
than one datacenter before the producer moves on. Replication is synchronous,
so the recovery point objective (RPO) is zero.
Cross-DC replication runs an independent cluster in each datacenter and copies topics between them with a replication tool — MirrorMaker 2, Confluent Replicator, or Cluster Linking. Replication is asynchronous: the source cluster acknowledges writes locally, and the copy follows milliseconds to seconds later. Whatever has not yet crossed the wire when the datacenter fails is your RPO.
| Stretched cluster | Cross-DC replication | |
|---|---|---|
| Clusters | One, spanning DCs | One per DC |
| Replication | Synchronous (in-cluster) | Asynchronous (MM2 or similar) |
| RPO on DC loss | Zero | Replication lag at failure time |
| Produce latency | Includes inter-DC round trip | Local only |
| Network partition | Can halt writes cluster-wide | Each cluster keeps working |
| Offset handling | Native — one cluster | Requires offset translation |
| Latency requirement | Single-digit ms round trips | Tolerates WAN latency |
| Typical fit | Metro-area DCs, three sites | Everything else |
The latency requirement is the deciding factor in practice. Every produce with
acks=all in a stretched cluster waits on the slowest in-sync replica, so
inter-DC round-trip time lands directly on your producers. Metro-area
datacenters connected by dark fiber can absorb that. Datacenters in different
regions cannot.
Quorum placement is the second constraint. With brokers in only two sites, a network partition leaves you choosing between availability and consistency — neither half can safely claim the majority. Production stretched clusters need three locations, even if the third only hosts controller quorum members. If you cannot get three sites with low-latency links, that alone rules the stretched design out.
One consolation for stretched-cluster read traffic: since KIP-392, consumers
can fetch from the closest replica by matching client.rack to broker.rack,
which keeps read traffic inside a datacenter even though writes cross it.
Cross-DC replication patterns
Once you have chosen separate clusters, the topology question is which direction data flows:
- Active-passive. One datacenter serves all traffic; the second receives a continuous copy and waits. Simple to reason about, and failover is a client-repointing exercise. This is the right default.
- Active-active. Both datacenters produce and consume, with bidirectional replication between them. You gain locality and put the standby to work, and pay for it with loop prevention, topic-naming discipline, and conflict handling.
- Aggregate (fan-in). Several source datacenters replicate into a central cluster for analytics or archival. Common in retail and IoT estates where edge sites produce locally.
Our geo replication guide compares these patterns in depth, including hub-and-spoke and mesh variants and their RTO and cost profiles. The short version: start active-passive, and let a proven requirement — not idle-standby guilt — move you to active-active.
Implementing cross-DC replication with MirrorMaker 2
MirrorMaker 2 (MM2) ships with Apache Kafka and runs on the Connect framework.
A working active-passive flow from a dc1 cluster to a dc2 standby:
clusters = dc1, dc2
dc1.bootstrap.servers = kafka-dc1.internal:9092
dc2.bootstrap.servers = kafka-dc2.internal:9092
# One-way replication: dc1 -> dc2
dc1->dc2.enabled = true
dc1->dc2.topics = orders.*, payments.*
dc1->dc2.topics.exclude = .*\.internal, __.*
# Translate consumer offsets so groups can fail over
emit.checkpoints.enabled = true
sync.group.offsets.enabled = true
sync.group.offsets.interval.seconds = 60
# Copies on the target cluster
dc2.replication.factor = 3
Four things deserve attention before you trust this in production:
- Topic naming. MM2's default replication policy prefixes remote topics —
ordersfromdc1becomesdc1.ordersondc2. That prevents loops in bidirectional setups but means failover consumers must subscribe to the prefixed name. Kafka 3.0+ includesIdentityReplicationPolicy, which keeps names unchanged for one-way, active-passive flows. - Offset translation. Offsets are not comparable between clusters. MM2
checkpoints map each group's committed position from source to target, and
sync.group.offsets.enabledapplies the mapping automatically when the group is idle on the target. Verify translated offsets in a failover drill before an incident forces the question. - Where MM2 runs. Deploy the Connect workers in the target datacenter. Consuming over the WAN and producing locally degrades more gracefully when the link gets slow, and the standby side keeps operating if the primary datacenter is degraded.
- Partition behavior. MM2 preserves partition counts and routes records to the same partition number on the target, so key-based ordering survives replication. It does not replicate every cluster detail — ACLs and quotas need their own process.
During a network partition, MM2 simply falls behind and catches up when the link returns. That is the asynchronous design working as intended: each cluster keeps serving its local clients, and your monitoring — not your producers — absorbs the disruption.
Network architecture for cross-DC Kafka
The replication link is a production dependency and deserves the same design attention as the clusters:
- Bandwidth: size for peak, not average. The link must sustain your peak
produce rate on replicated topics, with headroom for catch-up after an
outage. A link sized to the average will fall behind at the daily peak and
never recover before the next one. Enable
compression.type=lz4orzstdon the MM2 producer to cut wire bytes substantially for text-heavy payloads. - Encryption in transit. Cross-DC traffic leaves your building. Terminate
TLS on the brokers (
security.protocol: SASL_SSLorSSLon the cluster listeners), not just on a VPN wrapper, so the requirement survives network re-architecture. - Dedicated vs. shared links. Replication competing with general WAN traffic produces lag spikes that look like Kafka problems. If a dedicated link is not available, apply QoS so bulk replication cannot starve — or be starved by — other traffic.
- Failover DNS. Clients should bootstrap through DNS names you can repoint
(
kafka-primary.example.com), with TTLs low enough to matter mid-incident. Rehearse the repointing; a failover plan that has never run is a hypothesis.
Monitor end-to-end lag as the timestamp delta between a record's append at the source and its availability at the target — connector task lag alone understates what consumers actually experience after failover.
Replication is not backup
Cross-datacenter replication protects against losing a datacenter. It does not protect against bad data, because it copies everything faithfully: a producer bug, a malformed schema, or an accidental topic deletion reaches the second datacenter in milliseconds. Both copies are now wrong, and neither can answer "restore this topic to yesterday 14:00."
That job needs an immutable, point-in-time copy outside both clusters. OSO Kafka Backup writes records, consumer group offsets, and topic configuration to object storage — S3, Azure Blob, GCS, or filesystem — and restores to millisecond precision on any cluster, including the DR cluster you just failed over to. A minimal backup configuration from the configuration reference:
mode: backup
backup_id: "dc1-orders-daily"
source:
bootstrap_servers:
- kafka-dc1.internal:9092
storage:
backend: s3
Mature multi-DC estates run both layers: replication for availability when a site fails, backups for recoverability when the data itself is the problem.
Choosing your architecture
Work through the decision in this order:
- Can you meet stretched-cluster physics? Three sites, single-digit
millisecond round trips, and producers that can tolerate inter-DC latency
on every
acks=allwrite. If yes and RPO must be zero, stretch. - Otherwise, replicate asynchronously. One cluster per datacenter, MM2 running in the target site, active-passive until a workload proves it needs active-active.
- Design the network deliberately. Peak-rate bandwidth, compression, TLS, and DNS failover are architecture, not afterthoughts.
- Add the backup layer. Replication answers "what if the datacenter fails?" Backups answer "what if the data is wrong?" Production estates need both answers.
Network failures between datacenters are a certainty on a long enough timeline. The asynchronous design treats them as routine; the stretched design treats them as an event. Choose based on which failure conversation you would rather have.
Frequently asked questions
How do you replicate Kafka across data centers?
Either stretch one cluster across datacenters with rack-aware replica placement and synchronous replication, or run a separate cluster per datacenter and copy topics asynchronously with MirrorMaker 2 or a similar tool. Stretched clusters need low-latency links and three sites; asynchronous replication works over ordinary WAN links.
Can a single Kafka cluster span multiple data centers?
Yes, if the datacenters are close enough. A stretched cluster needs round-trip latency in the low single-digit milliseconds — typically metro-area distance — and brokers or quorum members in at least three locations so a network partition leaves a clear majority.
What is the difference between a stretched Kafka cluster and cross-DC replication?
A stretched cluster is one Kafka cluster whose replicas span datacenters, replicating synchronously with zero RPO but inter-DC latency on every acknowledged write. Cross-DC replication runs independent clusters connected by an asynchronous copier such as MirrorMaker 2, keeping produce latency local but accepting a recovery point equal to replication lag.
What bandwidth is needed for Kafka cross-datacenter replication?
Size the link for the peak produce throughput of the replicated topics, plus headroom to catch up after link outages. A link sized to average throughput falls behind at every daily peak. Compression on the replication producer (lz4 or zstd) reduces wire bytes significantly.
How does MirrorMaker 2 handle network partitions between data centers?
MM2 replication is asynchronous, so a partition causes it to fall behind rather than fail. Each cluster continues serving its local producers and consumers, and MM2 resumes from its committed source offsets and catches up when connectivity returns. Monitor end-to-end lag so you know the real recovery point during the gap.
Related reading: Kafka geo replication patterns, OSO Kafka Backup vs MirrorMaker 2, and disaster recovery use cases.