<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="rss.xsl"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>OSO Kafka Backup Blog</title>
        <link>https://kafkabackup.com/blog</link>
        <description>Engineering deep-dives on Apache Kafka backup, disaster recovery, and replication from the OSO team.</description>
        <lastBuildDate>Fri, 03 Jul 2026 00:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <copyright>Copyright © 2026 OSO.</copyright>
        <item>
            <title><![CDATA[How to Backup and Restore Kafka Topics: A Step-by-Step Guide]]></title>
            <link>https://kafkabackup.com/blog/backup-restore-kafka-topics</link>
            <guid>https://kafkabackup.com/blog/backup-restore-kafka-topics</guid>
            <pubDate>Fri, 03 Jul 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Three ways to backup and restore Kafka topics — Kafka Connect S3 sink, the kafka-backup CLI with offset preservation, and consumer scripts — with a verification checklist.]]></description>
            <content:encoded><![CDATA[<p>You cannot undo a deleted Kafka topic unless you have a backup. To backup a
Kafka topic, you capture its records, partition layout, and offsets to durable
storage outside the cluster; to restore, you produce that data back into the
same or a different cluster. This guide walks through three ways to do it —
from a purpose-built CLI to a bare consumer script — and how to verify the
result actually restores.</p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>Key takeaway</div><div class="admonitionContent_BuS1"><p>Quick decision guide: a handful of small topics for a one-off → consumer script.
Production topics that need offset preservation or point-in-time restore → the
<code>kafka-backup</code> CLI. An existing Kafka Connect estate that only needs raw record
archiving → an S3 sink connector.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-backing-up-a-kafka-topic-actually-means">What "backing up a Kafka topic" actually means<a href="https://kafkabackup.com/blog/backup-restore-kafka-topics#what-backing-up-a-kafka-topic-actually-means" class="hash-link" aria-label="Direct link to What &quot;backing up a Kafka topic&quot; actually means" title="Direct link to What &quot;backing up a Kafka topic&quot; actually means" translate="no">​</a></h2>
<p>A real topic backup captures four things:</p>
<ol>
<li class=""><strong>Records</strong> — keys, values, headers, and timestamps</li>
<li class=""><strong>Partition layout</strong> — which records lived on which partition, in what order</li>
<li class=""><strong>Offsets</strong> — both record offsets and consumer group positions</li>
<li class=""><strong>Topic configuration</strong> — partition count, retention, cleanup policy</li>
</ol>
<p>Two things that are <em>not</em> backups, despite being treated as such:</p>
<ul>
<li class=""><strong>Retention</strong> is scheduled deletion. When <code>retention.ms</code> expires, the data is
gone regardless of whether anyone still needs it.</li>
<li class=""><strong>Replication</strong> (in-cluster RF=3, or <a class="" href="https://kafkabackup.com/compare/mirrormaker">MirrorMaker 2</a>
across clusters) copies every write — including the accidental delete and the
poisoned deploy — within milliseconds.</li>
</ul>
<p>Backups exist so you can go <em>backwards</em> in time. Replication only goes
forwards.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="method-1--kafka-backup-cli-production-topics">Method 1 — kafka-backup CLI (production topics)<a href="https://kafkabackup.com/blog/backup-restore-kafka-topics#method-1--kafka-backup-cli-production-topics" class="hash-link" aria-label="Direct link to Method 1 — kafka-backup CLI (production topics)" title="Direct link to Method 1 — kafka-backup CLI (production topics)" translate="no">​</a></h2>
<p>The <a class="" href="https://kafkabackup.com/getting-started">OSO Kafka Backup CLI</a> backs up topics with offset
preservation and restores them with millisecond-precision time windows. It is
the right default for production.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="step-1-install-and-configure">Step 1: Install and configure<a href="https://kafkabackup.com/blog/backup-restore-kafka-topics#step-1-install-and-configure" class="hash-link" aria-label="Direct link to Step 1: Install and configure" title="Direct link to Step 1: Install and configure" translate="no">​</a></h3>
<p>Follow the <a class="" href="https://kafkabackup.com/getting-started">installation guide</a> for your platform, then
write a backup config:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_OeMC">backup.yaml</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">mode</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> backup</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">source</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">bootstrap_servers</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> broker</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">1</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">9092</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">topics</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">include</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> orders</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> payments</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">storage</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">backend</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> s3</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">bucket</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> my</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">kafka</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">backups</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">region</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> us</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">west</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">2</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">prefix</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> backups/production</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">backup</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">compression</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> zstd</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">start_offset</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> earliest</span><br></span></code></pre></div></div>
<p>Filesystem, Azure Blob, and GCS backends use the same shape — see the
<a class="" href="https://kafkabackup.com/reference/config-yaml">configuration reference</a> and the
<a class="" href="https://kafkabackup.com/integrations/s3">S3 integration guide</a>.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="step-2-run-the-backup">Step 2: Run the backup<a href="https://kafkabackup.com/blog/backup-restore-kafka-topics#step-2-run-the-backup" class="hash-link" aria-label="Direct link to Step 2: Run the backup" title="Direct link to Step 2: Run the backup" translate="no">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kafka-backup backup </span><span class="token parameter variable" style="color:#36acaa">--config</span><span class="token plain"> backup.yaml</span><br></span></code></pre></div></div>
<p>Progress is checkpointed as it runs, so an interrupted backup resumes rather
than restarting. For continuously changing topics, set <code>continuous: true</code> to
stream changes instead of taking discrete snapshots.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="step-3-restore--to-anywhere-at-any-point-in-time">Step 3: Restore — to anywhere, at any point in time<a href="https://kafkabackup.com/blog/backup-restore-kafka-topics#step-3-restore--to-anywhere-at-any-point-in-time" class="hash-link" aria-label="Direct link to Step 3: Restore — to anywhere, at any point in time" title="Direct link to Step 3: Restore — to anywhere, at any point in time" translate="no">​</a></h3>
<p>Restore to the original cluster, a new cluster, or a renamed topic:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_OeMC">restore.yaml</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">mode</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> restore</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">backup_id</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"prod-backup-20260701"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">target</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">bootstrap_servers</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> dr</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">broker</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">1</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">9092</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">storage</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">backend</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> s3</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">bucket</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> my</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">kafka</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">backups</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">region</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> us</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">west</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">2</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">prefix</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> backups/production</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">restore</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">create_topics</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">topic_mapping</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">orders</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> orders</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">restored</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># Point-in-time: only records up to the moment before the incident</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">time_window_end</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1751500800000</span><span class="token plain">   </span><span class="token comment" style="color:#999988;font-style:italic"># Unix ms</span><br></span></code></pre></div></div>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kafka-backup restore </span><span class="token parameter variable" style="color:#36acaa">--config</span><span class="token plain"> restore.yaml</span><br></span></code></pre></div></div>
<p>The <code>time_window_end</code> option is what makes this a genuine undo button: restore
the topic to 14:03:27.451, the millisecond before the bad deploy started
writing. Set <code>dry_run: true</code> first to validate the whole plan without producing
a record.</p>
<p><strong>Use this method when:</strong> you need point-in-time recovery, consumer offsets
must survive the restore, or the backup must live outside the cluster's
failure domain.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="method-2--kafka-connect-s3-sink-existing-connect-estates">Method 2 — Kafka Connect S3 sink (existing Connect estates)<a href="https://kafkabackup.com/blog/backup-restore-kafka-topics#method-2--kafka-connect-s3-sink-existing-connect-estates" class="hash-link" aria-label="Direct link to Method 2 — Kafka Connect S3 sink (existing Connect estates)" title="Direct link to Method 2 — Kafka Connect S3 sink (existing Connect estates)" translate="no">​</a></h2>
<p>If you already operate Kafka Connect, an S3 sink connector can archive topic
records to a bucket:</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_OeMC">s3-sink.json</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"name"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"orders-s3-backup"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"config"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"connector.class"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"io.confluent.connect.s3.S3SinkConnector"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"topics"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"orders"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"s3.bucket.name"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"my-kafka-archive"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"s3.region"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"us-west-2"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"format.class"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"io.confluent.connect.s3.format.json.JsonFormat"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"flush.size"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"10000"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre></div></div>
<p>Restoring means running the matching S3 <em>source</em> connector to replay the
records into a topic.</p>
<p>The trade-offs are real, though: consumer group offsets are not captured,
restores replay records with new offsets (breaking offset-based consumers),
and there is no point-in-time selection beyond whatever partitioning your sink
wrote. It is archiving, not recovery tooling.</p>
<p><strong>Use this method when:</strong> you need raw record archives for analytics or
compliance, offsets do not matter, and Connect is already running.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="method-3--consumer-script-small-topics-one-offs">Method 3 — Consumer script (small topics, one-offs)<a href="https://kafkabackup.com/blog/backup-restore-kafka-topics#method-3--consumer-script-small-topics-one-offs" class="hash-link" aria-label="Direct link to Method 3 — Consumer script (small topics, one-offs)" title="Direct link to Method 3 — Consumer script (small topics, one-offs)" translate="no">​</a></h2>
<p>For a small topic in a dev environment, a script can be enough:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Backup: dump records with key, timestamp, and partition</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kafka-console-consumer --bootstrap-server broker-1:9092 </span><span class="token punctuation" style="color:#393A34">\</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token parameter variable" style="color:#36acaa">--topic</span><span class="token plain"> orders --from-beginning </span><span class="token punctuation" style="color:#393A34">\</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token parameter variable" style="color:#36acaa">--property</span><span class="token plain"> </span><span class="token assign-left variable" style="color:#36acaa">print.key</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">true </span><span class="token punctuation" style="color:#393A34">\</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token parameter variable" style="color:#36acaa">--property</span><span class="token plain"> </span><span class="token assign-left variable" style="color:#36acaa">print.timestamp</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">true </span><span class="token punctuation" style="color:#393A34">\</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token parameter variable" style="color:#36acaa">--property</span><span class="token plain"> </span><span class="token assign-left variable" style="color:#36acaa">print.partition</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">true </span><span class="token punctuation" style="color:#393A34">\</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --timeout-ms </span><span class="token number" style="color:#36acaa">10000</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> orders-backup.txt</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Restore: replay values into a new topic</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kafka-console-producer --bootstrap-server broker-1:9092 </span><span class="token punctuation" style="color:#393A34">\</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token parameter variable" style="color:#36acaa">--topic</span><span class="token plain"> orders-restored </span><span class="token operator" style="color:#393A34">&lt;</span><span class="token plain"> orders-backup.txt</span><br></span></code></pre></div></div>
<p>Be honest about the limits: no offset preservation, no header capture in older
tooling, timestamps become produce-time on restore, and nothing about this is
incremental. It is a photocopy, not a backup system.</p>
<p><strong>Use this method when:</strong> the topic is small, the moment is now, and the
stakes are low.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="verifying-the-backup-whichever-method-you-chose">Verifying the backup (whichever method you chose)<a href="https://kafkabackup.com/blog/backup-restore-kafka-topics#verifying-the-backup-whichever-method-you-chose" class="hash-link" aria-label="Direct link to Verifying the backup (whichever method you chose)" title="Direct link to Verifying the backup (whichever method you chose)" translate="no">​</a></h2>
<p>An unverified backup is a guess. After every backup — and on a weekly schedule —
check:</p>
<ul>
<li class=""><strong>Record counts</strong> match between source and restored topic
(<code>kafka-run-class kafka.tools.GetOffsetShell</code> on both sides)</li>
<li class=""><strong>Offset continuity</strong> — no gaps at segment boundaries</li>
<li class=""><strong>Schema compatibility</strong> — restored records deserialize with the current
schema</li>
<li class=""><strong>Consumer resume</strong> — a consumer group restored with the data picks up where
it left off instead of reprocessing from zero</li>
</ul>
<p>The <a class="" href="https://kafkabackup.com/blog/kafka-backup-best-practices">backup best practices guide</a> covers
turning this checklist into automated, alerting-backed verification.</p>
<section class="pseo-faq"><h2>Frequently asked questions</h2><details class="pseo-faq__item"><summary>How do you backup a Kafka topic?</summary><p>Capture the topic's records, partition layout, offsets, and configuration to storage outside the cluster. In practice: run a backup tool such as the kafka-backup CLI with a config naming the topics and a storage backend (S3, Azure Blob, GCS, or filesystem), or archive records with a Kafka Connect S3 sink if offsets do not matter.</p></details><details class="pseo-faq__item"><summary>Can you restore a deleted Kafka topic?</summary><p>Only from a backup taken before the deletion. Replication does not help — the delete propagates to replicas and mirrored clusters. From a backup, recreate the topic (create_topics: true) and restore records and consumer offsets, optionally to a new topic name.</p></details><details class="pseo-faq__item"><summary>How do you backup Kafka topics to S3?</summary><p>Point a backup config at an S3 bucket (backend: s3, bucket, region, prefix) and run kafka-backup backup --config backup.yaml. Records are compressed with Zstandard or LZ4 before upload. A Kafka Connect S3 sink connector is an alternative when you only need record archiving.</p></details><details class="pseo-faq__item"><summary>How to take backup of a Kafka topic without downtime?</summary><p>Backups read topics through standard consumer protocols, so producers and consumers keep running during the backup. For topics with constant writes, continuous mode streams changes instead of taking point snapshots.</p></details><details class="pseo-faq__item"><summary>How do you verify a Kafka backup?</summary><p>Restore it — to a scratch cluster or with dry_run: true — and compare record counts, check offset continuity, confirm schemas deserialize, and verify a consumer group resumes from its restored offsets. Schedule this weekly; a backup that has never been restored is unproven.</p></details></section>
<hr>
<p><em>Next steps: the <a class="" href="https://kafkabackup.com/getting-started/first-backup">first backup tutorial</a> walks
this end to end, and the <a class="" href="https://kafkabackup.com/reference/cli-reference">CLI reference</a> documents
every flag used above.</em></p>]]></content:encoded>
            <category>Kafka Backup</category>
        </item>
        <item>
            <title><![CDATA[Kafka Backup Best Practices: 10 Rules for Production Data Protection]]></title>
            <link>https://kafkabackup.com/blog/kafka-backup-best-practices</link>
            <guid>https://kafkabackup.com/blog/kafka-backup-best-practices</guid>
            <pubDate>Fri, 03 Jul 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[10 Kafka backup best practices for production: automated restore verification, lag monitoring, offset and metadata capture, encryption, cost control, and DR testing.]]></description>
            <content:encoded><![CDATA[<p>Kafka backup best practices come down to one principle: <strong>a backup you have not
restored is a hope, not a backup.</strong> Retention deletes your data on schedule,
replication copies your mistakes in real time, and neither can return a topic to
the state it was in before an incident. These 10 rules turn Kafka backups from a
checkbox into something you can bet an on-call shift on.</p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>Key takeaway</div><div class="admonitionContent_BuS1"><p>Prioritize rules 2 and 3 — automated restore verification and backup lag
monitoring. They catch the two failure modes that actually burn teams: backups
that silently stopped working, and backups that cannot be restored.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-treat-backup-as-code">1. Treat backup as code<a href="https://kafkabackup.com/blog/kafka-backup-best-practices#1-treat-backup-as-code" class="hash-link" aria-label="Direct link to 1. Treat backup as code" title="Direct link to 1. Treat backup as code" translate="no">​</a></h2>
<p>Backup configuration belongs in version control next to the applications it
protects. A <code>backup.yaml</code> reviewed in a pull request is auditable; a config
hand-edited on a VM is a mystery six months later.</p>
<ul>
<li class="">Store <a class="" href="https://kafkabackup.com/reference/config-yaml">backup configurations</a> in Git</li>
<li class="">Provision backup infrastructure with Terraform or Helm, not the console</li>
<li class="">Ship config changes through CI so a typo cannot silently disable a nightly job</li>
</ul>
<p>If you run Kafka on Kubernetes, the <a class="" href="https://kafkabackup.com/operator">backup operator</a> takes this
further: backup schedules become custom resources that GitOps tools reconcile
like any other manifest.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-verify-backups-automatically--trust-but-verify">2. Verify backups automatically — trust, but verify<a href="https://kafkabackup.com/blog/kafka-backup-best-practices#2-verify-backups-automatically--trust-but-verify" class="hash-link" aria-label="Direct link to 2. Verify backups automatically — trust, but verify" title="Direct link to 2. Verify backups automatically — trust, but verify" translate="no">​</a></h2>
<p>The most common backup failure is discovered during an outage: the backup ran
for months, but nobody ever restored one. Schedule automated restore tests —
weekly at minimum — into a scratch cluster or isolated namespace.</p>
<p>Validate three things on every test:</p>
<ol>
<li class=""><strong>Record counts</strong> match between the source topic and the restored topic</li>
<li class=""><strong>Offsets are continuous</strong> — no gaps or overlaps at segment boundaries</li>
<li class=""><strong>Consumers can resume</strong> from restored consumer group offsets</li>
</ol>
<p>The <code>restore</code> mode supports <code>dry_run: true</code>, which validates a backup against
the target cluster without producing a single record — cheap enough to run
daily:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">mode</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> restore</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">backup_id</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"prod-backup-latest"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">restore</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">dry_run</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><br></span></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-monitor-backup-lag-and-health-continuously">3. Monitor backup lag and health continuously<a href="https://kafkabackup.com/blog/kafka-backup-best-practices#3-monitor-backup-lag-and-health-continuously" class="hash-link" aria-label="Direct link to 3. Monitor backup lag and health continuously" title="Direct link to 3. Monitor backup lag and health continuously" translate="no">​</a></h2>
<p>A backup job that dies on Friday night should page someone before Monday. OSO
Kafka Backup exposes Prometheus metrics for exactly this:</p>
<table><thead><tr><th>Metric</th><th>What to alert on</th></tr></thead><tbody><tr><td><code>kafka_backup_lag_records</code></td><td>Lag exceeding your RPO budget</td></tr><tr><td><code>kafka_backup_errors_total</code></td><td>Any sustained non-zero rate</td></tr><tr><td><code>kafka_backup_records_total</code></td><td>Rate dropping to zero mid-window</td></tr><tr><td><code>kafka_backup_compression_ratio</code></td><td>Sudden shifts (often a payload change upstream)</td></tr></tbody></table>
<p>The full list is in the <a class="" href="https://kafkabackup.com/reference/metrics">metrics reference</a>. Wire the lag
metric to your alerting with a threshold derived from your RPO — not a number
picked in a hurry.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-define-recovery-objectives-before-you-need-them">4. Define recovery objectives before you need them<a href="https://kafkabackup.com/blog/kafka-backup-best-practices#4-define-recovery-objectives-before-you-need-them" class="hash-link" aria-label="Direct link to 4. Define recovery objectives before you need them" title="Direct link to 4. Define recovery objectives before you need them" translate="no">​</a></h2>
<p>Two numbers drive every backup decision:</p>
<ul>
<li class=""><strong>RPO (recovery point objective)</strong> — how much data you can afford to lose.
This sets backup frequency, or pushes you to continuous mode.</li>
<li class=""><strong>RTO (recovery time objective)</strong> — how long you can be down. This sets your
restore method, parallelism, and where backups physically live.</li>
</ul>
<p>Map every critical topic to an RPO/RTO tier, write the mapping down, and review
it quarterly. A payments topic and a clickstream topic should not share a
policy. For the architecture side of this decision, see the
<a class="" href="https://kafkabackup.com/use-cases/disaster-recovery">disaster recovery use cases</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-back-up-offsets-and-metadata-not-just-messages">5. Back up offsets and metadata, not just messages<a href="https://kafkabackup.com/blog/kafka-backup-best-practices#5-back-up-offsets-and-metadata-not-just-messages" class="hash-link" aria-label="Direct link to 5. Back up offsets and metadata, not just messages" title="Direct link to 5. Back up offsets and metadata, not just messages" translate="no">​</a></h2>
<p>A topic restore that loses consumer group offsets forces every consumer to
choose between reprocessing everything and skipping to latest — both are
incidents of their own. Message data alone is roughly half a backup. Capture:</p>
<ul>
<li class=""><strong>Consumer group offsets</strong>, so processing resumes where it stopped</li>
<li class=""><strong>Topic configurations</strong> — partitions, retention, cleanup policy</li>
<li class=""><strong>Schemas</strong>, so downstream consumers can still deserialize what you restored</li>
<li class=""><strong>ACLs</strong>, so security posture survives the restore</li>
</ul>
<p>OSO Kafka Backup captures offsets and topic configuration as part of every
backup, and keeps offsets consistent with the restored data during
<a class="" href="https://kafkabackup.com/use-cases/disaster-recovery">point-in-time recovery</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="6-encrypt-backup-data-at-rest-and-in-transit">6. Encrypt backup data at rest and in transit<a href="https://kafkabackup.com/blog/kafka-backup-best-practices#6-encrypt-backup-data-at-rest-and-in-transit" class="hash-link" aria-label="Direct link to 6. Encrypt backup data at rest and in transit" title="Direct link to 6. Encrypt backup data at rest and in transit" translate="no">​</a></h2>
<p>Backups concentrate months of your most valuable data into one bucket — treat
them with at least the rigor of the cluster itself.</p>
<ul>
<li class="">Enable server-side encryption on the storage target (SSE-S3 or SSE-KMS on
Amazon S3, service-managed keys on Azure and GCS)</li>
<li class="">Use TLS between the backup process and both the brokers and the object store</li>
<li class="">Keep encryption keys in a KMS with its own access policy, so a Kafka
credential leak does not also expose the archive</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="7-control-costs-without-weakening-the-safety-net">7. Control costs without weakening the safety net<a href="https://kafkabackup.com/blog/kafka-backup-best-practices#7-control-costs-without-weakening-the-safety-net" class="hash-link" aria-label="Direct link to 7. Control costs without weakening the safety net" title="Direct link to 7. Control costs without weakening the safety net" translate="no">​</a></h2>
<p>Backup storage costs are predictable and controllable — unlike the cost of
losing the data.</p>
<ul>
<li class=""><strong>Compress.</strong> Backups are compressed with Zstandard or LZ4 before upload
(<code>compression: zstd</code> in the <a class="" href="https://kafkabackup.com/reference/config-yaml">backup config</a>),
independent of producer-side compression.</li>
<li class=""><strong>Tier.</strong> Keep recent backups hot for fast restore; move older ones to
infrequent-access or archive classes with bucket lifecycle rules.</li>
<li class=""><strong>Expire.</strong> Retention policies should delete what compliance no longer
requires — storage you forgot about is pure waste.</li>
</ul>
<p>Storage layout details are in the <a class="" href="https://kafkabackup.com/reference/storage-format">storage format reference</a>,
which is what lifecycle rules operate against.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="8-meet-data-retention-regulations-deliberately">8. Meet data retention regulations deliberately<a href="https://kafkabackup.com/blog/kafka-backup-best-practices#8-meet-data-retention-regulations-deliberately" class="hash-link" aria-label="Direct link to 8. Meet data retention regulations deliberately" title="Direct link to 8. Meet data retention regulations deliberately" translate="no">​</a></h2>
<p>If your topics carry regulated data, backups are in scope too:</p>
<ul>
<li class="">Map topics to their regimes (GDPR, HIPAA, SOX, PCI-DSS) and set backup
retention to match — both minimums and maximums</li>
<li class="">Use object-lock or immutable storage for audit-relevant backups</li>
<li class="">Automate deletion when retention windows close; manual cleanup does not
survive staff turnover</li>
<li class="">Keep restore procedures documented — auditors ask for evidence that recovery
works, not just that backups exist</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="9-test-disaster-recovery-quarterly">9. Test disaster recovery quarterly<a href="https://kafkabackup.com/blog/kafka-backup-best-practices#9-test-disaster-recovery-quarterly" class="hash-link" aria-label="Direct link to 9. Test disaster recovery quarterly" title="Direct link to 9. Test disaster recovery quarterly" translate="no">​</a></h2>
<p>Backup verification (rule 2) proves the data is restorable. A DR test proves
your <em>organization</em> can restore it: the runbook is current, the credentials
work, the on-call engineer knows which cluster to target, and the application
teams can validate their services afterward.</p>
<p>Run a full failover drill quarterly. Measure the RTO and RPO you actually
achieved against the targets from rule 4, and fix the gap — in tooling or in
targets — after every drill.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="10-write-runbooks-for-the-engineer-at-3-am">10. Write runbooks for the engineer at 3 a.m.<a href="https://kafkabackup.com/blog/kafka-backup-best-practices#10-write-runbooks-for-the-engineer-at-3-am" class="hash-link" aria-label="Direct link to 10. Write runbooks for the engineer at 3 a.m." title="Direct link to 10. Write runbooks for the engineer at 3 a.m." translate="no">​</a></h2>
<p>The person running a restore under pressure should never compose a config from
memory. Good runbooks contain:</p>
<ul>
<li class="">Pre-validated, copy-paste commands: <code>kafka-backup restore --config restore-payments.yaml</code>, with the config already in Git</li>
<li class="">A decision tree: partial topic restore vs. full recovery vs.
point-in-time rollback</li>
<li class="">Escalation paths and the list of application owners to notify</li>
<li class="">Links to the dashboards from rule 3, so progress is observable</li>
</ul>
<p>Start with the <a class="" href="https://kafkabackup.com/getting-started/first-backup">first backup tutorial</a> as a
template and extend it with your environment's specifics.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="where-to-start">Where to start<a href="https://kafkabackup.com/blog/kafka-backup-best-practices#where-to-start" class="hash-link" aria-label="Direct link to Where to start" title="Direct link to Where to start" translate="no">​</a></h2>
<p>Do not attempt all ten at once. Add backup lag alerting today (rule 3),
schedule a weekly automated restore test this week (rule 2), and write the
RPO/RTO map next sprint (rule 4). The rest layer on from there.</p>
<p>A backup you test is a backup you can trust — everything else on this list
exists to make that testing routine instead of heroic.</p>
<section class="pseo-faq"><h2>Frequently asked questions</h2><details class="pseo-faq__item"><summary>What are the best practices for backing up Kafka?</summary><p>Version backup configuration in Git, verify restores automatically on a schedule, monitor backup lag with alerting, define RPO and RTO per topic, capture consumer offsets and metadata alongside messages, encrypt backups, control storage costs with compression and lifecycle tiers, and run quarterly disaster recovery drills.</p></details><details class="pseo-faq__item"><summary>How often should you test Kafka backups?</summary><p>Run automated restore verification at least weekly, and a dry-run validation daily if your tooling supports it. Full disaster recovery drills involving failover and application teams should run quarterly.</p></details><details class="pseo-faq__item"><summary>How do you monitor Kafka backup health?</summary><p>Track backup lag in records, error counts, and throughput via Prometheus metrics such as kafka_backup_lag_records and kafka_backup_errors_total. Alert when lag exceeds your RPO budget or when the error rate is sustained above zero.</p></details><details class="pseo-faq__item"><summary>What metadata should be included in Kafka backups?</summary><p>Consumer group offsets, topic configurations (partition counts, retention, cleanup policy), schemas, and ACLs. Without offsets, consumers must reprocess or skip data after a restore; without configs and schemas, the restored topic may not behave like the original.</p></details><details class="pseo-faq__item"><summary>How do you reduce Kafka backup storage costs?</summary><p>Compress backup data with Zstandard or LZ4 before upload, move older backups to infrequent-access or archive storage classes with lifecycle policies, and expire backups automatically once retention requirements lapse.</p></details></section>
<hr>
<p><em>Ready to put these into practice? <a class="" href="https://kafkabackup.com/getting-started">Take your first backup in minutes</a>,
or see how backup fits alongside replication in our
<a class="" href="https://kafkabackup.com/compare/mirrormaker">MirrorMaker 2 comparison</a>.</em></p>]]></content:encoded>
            <category>Kafka Backup</category>
            <category>Disaster Recovery</category>
        </item>
        <item>
            <title><![CDATA[Kafka Geo Replication: Multi-Region and Cross-Datacenter Patterns]]></title>
            <link>https://kafkabackup.com/blog/kafka-geo-replication</link>
            <guid>https://kafkabackup.com/blog/kafka-geo-replication</guid>
            <pubDate>Fri, 03 Jul 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[How Kafka geo replication works: active-passive, active-active, hub-and-spoke, and mesh patterns compared, MirrorMaker 2 setup for WAN links, and where backups fit.]]></description>
            <content:encoded><![CDATA[<p>Kafka geo replication copies topics between clusters in different regions or
datacenters, so a regional outage does not take your streaming platform with
it. In-cluster replication (RF=3) protects against broker loss inside one
failure domain; geo replication protects against losing the domain itself.
This guide compares the four patterns, shows what MirrorMaker 2 setup looks
like over a WAN, and covers the failure mode replication cannot solve.</p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>Key takeaway</div><div class="admonitionContent_BuS1"><p>Start with active-passive between two regions. Measure real replication lag
before promising an RPO, budget for cross-region transfer costs, and pair
replication with point-in-time backups — replication propagates mistakes as
faithfully as it propagates good data.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-geo-replication-is-and-is-not">What geo replication is (and is not)<a href="https://kafkabackup.com/blog/kafka-geo-replication#what-geo-replication-is-and-is-not" class="hash-link" aria-label="Direct link to What geo replication is (and is not)" title="Direct link to What geo replication is (and is not)" translate="no">​</a></h2>
<p>Setting <code>replication.factor=3</code> puts three copies of each partition on three
brokers — in the <em>same</em> cluster. Rack awareness can spread those replicas
across availability zones, but the cluster is still one blast radius: one
control plane, one region, one set of humans with admin rights.</p>
<p>Geo replication runs a second (or third) Kafka cluster elsewhere and copies
topics between them, cluster to cluster. The use cases:</p>
<ul>
<li class=""><strong>Disaster recovery</strong> — survive a region failure with a warm standby</li>
<li class=""><strong>Data locality</strong> — serve consumers from the nearest region</li>
<li class=""><strong>Compliance</strong> — keep regional data in-region while sharing what is allowed</li>
<li class=""><strong>Migration</strong> — move workloads between datacenters or clouds without a big
bang (see the <a class="" href="https://kafkabackup.com/use-cases/migration">migration use cases</a>)</li>
</ul>
<p>The metrics that govern every design below: end-to-end replication lag,
cross-region latency, and network transfer cost.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-four-replication-patterns">The four replication patterns<a href="https://kafkabackup.com/blog/kafka-geo-replication#the-four-replication-patterns" class="hash-link" aria-label="Direct link to The four replication patterns" title="Direct link to The four replication patterns" translate="no">​</a></h2>
<table><thead><tr><th>Pattern</th><th>RTO</th><th>Cost</th><th>Complexity</th><th>Best for</th></tr></thead><tbody><tr><td><strong>Active-passive</strong></td><td>Minutes</td><td>2× infra</td><td>Low</td><td>DR for a single primary region</td></tr><tr><td><strong>Active-active</strong></td><td>Near-zero</td><td>2× infra + conflict handling</td><td>High</td><td>Regional serving with failover both ways</td></tr><tr><td><strong>Hub-and-spoke</strong></td><td>Varies by spoke</td><td>Hub + N spokes</td><td>Medium</td><td>Central aggregation, regional distribution</td></tr><tr><td><strong>Mesh</strong></td><td>Near-zero</td><td>N× everything</td><td>Very high</td><td>Few orgs genuinely need this</td></tr></tbody></table>
<p><strong>Active-passive</strong> is the honest default. One cluster serves traffic; a
standby in another region receives a continuous copy. Failover means
repointing clients — the hard part is offset translation, not data movement.</p>
<p><strong>Active-active</strong> lets both regions produce and consume. It halves your wasted
standby capacity but introduces bidirectional flows, loop prevention, and
topic naming discipline. Choose it when both regions must serve writes, not
because idle standby feels wasteful.</p>
<p><strong>Hub-and-spoke</strong> fits aggregation topologies: regional clusters replicate
into a central hub for analytics, or a hub fans reference data out to the
edges.</p>
<p><strong>Mesh</strong> — everyone replicates to everyone — multiplies links, monitoring, and
failure modes quadratically. It is listed here mostly so you can decline it
deliberately.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="implementing-geo-replication-with-mirrormaker-2">Implementing geo replication with MirrorMaker 2<a href="https://kafkabackup.com/blog/kafka-geo-replication#implementing-geo-replication-with-mirrormaker-2" class="hash-link" aria-label="Direct link to Implementing geo replication with MirrorMaker 2" title="Direct link to Implementing geo replication with MirrorMaker 2" translate="no">​</a></h2>
<p>MirrorMaker 2 (MM2) ships with Apache Kafka and runs on the Connect framework.
A minimal active-passive setup:</p>
<div class="language-properties codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_OeMC">mm2.properties</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-properties codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">clusters = primary, dr</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">primary.bootstrap.servers = kafka-us-east.example.com:9092</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">dr.bootstrap.servers = kafka-us-west.example.com:9092</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Replicate everything except internals from primary to DR</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">primary-&gt;dr.enabled = true</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">primary-&gt;dr.topics = .*</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Consumer group offset sync, so consumers can fail over</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">emit.checkpoints.enabled = true</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">sync.group.offsets.enabled = true</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">sync.group.offsets.interval.seconds = 60</span><br></span></code></pre></div></div>
<p>Three things bite teams on real WAN links:</p>
<ol>
<li class=""><strong>Remote topic prefixes.</strong> MM2 replicates <code>orders</code> as <code>primary.orders</code> on
the DR cluster by default. Consumers failing over must subscribe
accordingly, or you must override the replication policy.</li>
<li class=""><strong>Offset translation.</strong> Offsets differ between source and target. MM2's
checkpoints translate consumer group positions — verify translated offsets
in a drill before an outage forces the issue.</li>
<li class=""><strong>WAN tuning.</strong> Raise producer <code>batch.size</code> and <code>linger.ms</code> on the
connectors, enable compression, and monitor end-to-end lag (record
timestamp delta at the consumer), not just Connect task lag.</li>
</ol>
<p>Our <a class="" href="https://kafkabackup.com/compare/mirrormaker">MirrorMaker 2 comparison</a> covers where MM2 shines
and where it stops.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="a-reference-architecture-that-holds-up">A reference architecture that holds up<a href="https://kafkabackup.com/blog/kafka-geo-replication#a-reference-architecture-that-holds-up" class="hash-link" aria-label="Direct link to A reference architecture that holds up" title="Direct link to A reference architecture that holds up" translate="no">​</a></h2>
<p>The pattern we see work repeatedly for cross-datacenter DR:</p>
<ul>
<li class=""><strong>Two regions, active-passive</strong>, MM2 running in the <em>target</em> region (pull
model — the DR site keeps working if the primary degrades)</li>
<li class=""><strong>Dedicated replication bandwidth</strong> sized at peak produce throughput plus
headroom, compressed on the wire</li>
<li class=""><strong>Health-checked DNS failover</strong> for client bootstrap servers, with TTLs low
enough to matter during an incident</li>
<li class=""><strong>Quarterly failover drills</strong> that measure achieved RTO/RPO against targets —
untested DR is a diagram, not a capability</li>
<li class=""><strong>Cost line items</strong> reviewed explicitly: standby compute, cross-region
transfer (usually the surprise), and duplicated storage</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-failure-mode-replication-cannot-solve">The failure mode replication cannot solve<a href="https://kafkabackup.com/blog/kafka-geo-replication#the-failure-mode-replication-cannot-solve" class="hash-link" aria-label="Direct link to The failure mode replication cannot solve" title="Direct link to The failure mode replication cannot solve" translate="no">​</a></h2>
<p>Geo replication is built to copy everything, quickly. That is precisely why it
cannot protect you from:</p>
<ul>
<li class="">A producer bug writing poisoned records — replicated in milliseconds</li>
<li class="">An accidental topic deletion — propagated to the standby</li>
<li class="">A compliance request to reconstruct data as of last quarter</li>
</ul>
<p>For those you need an immutable copy that lives <em>outside</em> both clusters and
can be restored to a moment in time. That is what
<a class="" href="https://kafkabackup.com/use-cases/disaster-recovery">point-in-time backup</a> provides: records,
consumer offsets, and topic configuration in object storage, restorable to any
cluster — including the DR cluster you just failed over to.</p>
<p>Mature Kafka estates run both layers: replication for availability, backups
for recoverability. The <a class="" href="https://kafkabackup.com/blog/kafka-backup-best-practices">best practices guide</a>
covers operating that second layer well.</p>
<section class="pseo-faq"><h2>Frequently asked questions</h2><details class="pseo-faq__item"><summary>How does Kafka geo replication work?</summary><p>A replication tool — most commonly MirrorMaker 2 — consumes topics from a source cluster and produces them to a target cluster in another region, continuously. Checkpoints translate consumer group offsets between clusters so consumers can fail over and resume near where they left off.</p></details><details class="pseo-faq__item"><summary>What is the difference between Kafka replication and geo replication?</summary><p>Kafka replication (replication.factor) keeps copies of each partition on multiple brokers within one cluster. Geo replication copies topics between separate clusters in different regions or datacenters, protecting against the loss of an entire site rather than a single broker.</p></details><details class="pseo-faq__item"><summary>Can Kafka replicate across data centers?</summary><p>Yes. MirrorMaker 2, Confluent Replicator, and Confluent Cluster Linking all replicate topics between clusters in different datacenters. Cross-datacenter links need WAN tuning: compression, larger batches, and monitoring of end-to-end lag rather than connector lag alone.</p></details><details class="pseo-faq__item"><summary>What is the best pattern for Kafka cross-region replication?</summary><p>Active-passive is the right starting point for most teams: one primary cluster and a warm standby receiving a continuous copy. Active-active adds bidirectional replication and conflict handling, and is only worth the complexity when both regions must serve writes.</p></details><details class="pseo-faq__item"><summary>Does geo replication replace Kafka backups?</summary><p>No. Replication copies every write to the standby within milliseconds — including corrupted data and accidental deletions. Backups provide immutable, point-in-time copies outside both clusters, which is what you restore from after a logical failure rather than an infrastructure one.</p></details></section>
<hr>
<p><em>Related reading: <a class="" href="https://kafkabackup.com/compare/mirrormaker">OSO Kafka Backup vs MirrorMaker 2</a>,
<a class="" href="https://kafkabackup.com/use-cases/disaster-recovery">disaster recovery use cases</a>, and
<a class="" href="https://kafkabackup.com/blog/backup-restore-kafka-topics">how to backup and restore Kafka topics</a>.</em></p>]]></content:encoded>
            <category>Replication</category>
            <category>Disaster Recovery</category>
        </item>
    </channel>
</rss>