Sustainability

"Minimising the environmental impact of Kafka backup operations through efficient resource utilisation, data lifecycle management, and considered infrastructure choices."

Every compute cycle, stored byte, and network transfer has an environmental cost. While individual backup workloads are modest in isolation, the cumulative impact across environments, regions, and retention periods adds up. The Sustainability pillar helps you reduce this footprint by making deliberate choices about how, where, and how long you store and process backup data.

Design Principles

Understand your impact — Measure the carbon footprint of your backup infrastructure using cloud-provider tools. You cannot reduce what you do not measure.
Maximise utilisation — Right-size compute resources, use efficient compression algorithms, and eliminate idle capacity. Higher utilisation means less wasted energy per unit of useful work.
Adopt more efficient technology — kafka-backup is written in Rust, which delivers significantly lower CPU and memory consumption compared to JVM-based alternatives. Choosing inherently efficient tooling reduces energy consumption at the source.
Reduce downstream impact — Efficient storage tiering moves data to lower-energy cold storage over time, reducing the ongoing energy required to maintain backups.

Best Practices

SUS-01: Efficient Resource Utilisation

What

Minimise the compute, memory, and energy consumed per unit of backed-up data by right-sizing resources, using efficient algorithms, and eliminating waste.

Why

Over-provisioned infrastructure consumes energy whether it is doing useful work or not. A right-sized, efficiently compressed backup pipeline can deliver the same data protection with a fraction of the environmental footprint.

Implementation Guidance

Rust is inherently efficient — kafka-backup is written in Rust, which compiles to native machine code with no garbage collection overhead. This means significantly lower CPU and memory consumption compared to JVM-based backup tools, translating directly into smaller instances and lower energy usage.
Right-size compute resources — Follow the guidance in PE-04 to size instances based on actual workload. Monitor utilisation and downsize when headroom exceeds 30%.

# Right-sized resource allocation
resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
  limits:
    cpu: "2000m"
    memory: "2Gi"

Enable autoscaling — Scale compute to match backup windows rather than running at peak capacity 24/7. Use Kubernetes Horizontal Pod Autoscaler or cluster autoscaler to add and remove capacity dynamically.
Use efficient compression — zstd compression reduces stored data by 3–5x with minimal CPU overhead, reducing both storage energy and transfer energy.

backup:
  compression:
    enabled: true
    algorithm: zstd
    level: 3

Choose regions with lower carbon intensity — Where latency and data residency requirements allow, prefer regions powered by renewable energy.

tip

The combination of Rust's efficiency, right-sized compute, and zstd compression can reduce the energy footprint of your backup pipeline by 5–10x compared to a naively provisioned JVM-based alternative.

Anti-patterns

Large instances running 24/7 — Running oversized compute around the clock for a workload that runs for two hours per day wastes 90% of the energy consumed.
No compression — Storing and transferring uncompressed data when efficient compression is available at negligible CPU cost.
No region consideration — Deploying backup infrastructure based solely on latency without considering the carbon intensity of the region's energy grid.

SUS-02: Data Lifecycle & Minimisation

What

Back up only the data you need, retain it only as long as required, and match recovery granularity to actual business requirements.

Why

Every byte stored consumes energy — for storage media, cooling, and redundancy. Reducing the volume of stored data through selective backup, appropriate retention, and right-sized granularity directly reduces the ongoing energy cost of your backup infrastructure.

Implementation Guidance

Implement retention policies — Follow the tiered retention guidance in CO-04 to ensure data is deleted when it is no longer needed. Every deleted backup is energy that no longer needs to be spent on storage.
Use topic filtering — Back up only the topics that require protection. Exclude ephemeral topics, internal Kafka topics, and any topic whose data can be trivially regenerated.

backup:
  topics:
    include:
      - "orders.*"
      - "payments.*"
      - "customer.*"
    exclude:
      - ".*\\.internal"
      - "logs\\.debug.*"
      - "__consumer_offsets"

Match PITR granularity to actual needs — Point-in-time recovery with minute-level granularity generates far more backup data than hourly granularity. Choose the granularity your RTO and RPO actually require, not the finest granularity available.
Archive to cold storage tiers — Move older backups to cold storage (Glacier, Archive, Coldline). Cold storage tiers consume less energy per byte than hot storage because they use denser, less frequently accessed media.

warning

Before excluding topics from backup, confirm with application owners that the data is genuinely ephemeral or regenerable. Excluding a topic that turns out to be critical is not recoverable.

Anti-patterns

Backing up all topics indiscriminately — Including internal Kafka topics, debug logs, and ephemeral streams that have no recovery value.
No retention policies — Storing backups indefinitely, consuming energy for data that will never be accessed again.
Over-specifying granularity — Configuring minute-level PITR for a workload where hourly recovery is perfectly acceptable, generating 60x more backup data than necessary.

SUS-03: Region & Storage Tier Selection

What

Choose cloud regions and storage tiers that minimise the carbon intensity of your backup infrastructure.

Why

Cloud regions differ significantly in carbon intensity depending on the local energy grid. A backup stored in a region powered primarily by renewables has a materially lower carbon footprint than the same backup in a coal-heavy region — with identical durability and availability.

Implementation Guidance

Prefer regions with renewable energy:

Provider	Lower-Carbon Regions
AWS	`eu-west-1` (Ireland), `eu-north-1` (Stockholm)
Azure	North Europe (Ireland), Sweden Central
GCP	`europe-north1` (Finland), `us-central1` (Iowa)

tip

AWS, Azure, and GCP all publish sustainability commitments and region-level carbon data. Use this information when choosing where to deploy backup infrastructure, especially for DR copies that are latency-insensitive.

Use cold storage for long-term backups — Cold and archive storage tiers use denser storage media that consumes less energy per byte. For backups older than 90 days that are rarely accessed, cold storage is both cheaper and more sustainable.
Track your carbon footprint — Use cloud-provider tools to measure the emissions attributable to your backup infrastructure:

# AWS: View Carbon Footprint in the Billing Console
# Navigate to: AWS Billing > Carbon Footprint

# Azure: View emissions via Emissions Dashboard
# Navigate to: Azure Portal > Carbon Optimization

# GCP: View Carbon Footprint in the Console
# Navigate to: GCP Console > Carbon Footprint

Set sustainability targets — Track carbon emissions per GB of backed-up data as a sustainability KPI. Review quarterly and set reduction targets aligned with your organisation's sustainability commitments.

Anti-patterns

No carbon consideration in region selection — Choosing regions based solely on cost or latency without evaluating carbon intensity.
All hot storage, all the time — Keeping every backup in hot/standard storage tiers when the vast majority will never be accessed after 30 days.
No emissions tracking — Operating backup infrastructure without any visibility into its environmental impact.

Review Questions

Use these questions to evaluate the sustainability of your Kafka backup architecture:

Do you know the carbon intensity of the regions where your backup infrastructure runs?
Are compute resources right-sized to actual workload utilisation, avoiding persistent over-provisioning?
Is compression enabled to reduce both storage volume and transfer energy?
Do you have retention policies that automatically delete backups when they are no longer needed?
Are you using cloud-provider carbon footprint tools to track and report on the emissions of your backup infrastructure?

Design Principles​

Best Practices​

SUS-01: Efficient Resource Utilisation​

What​

Why​

Implementation Guidance​

Anti-patterns​

SUS-02: Data Lifecycle & Minimisation​

What​

Why​

Implementation Guidance​

Anti-patterns​

SUS-03: Region & Storage Tier Selection​

What​

Why​

Implementation Guidance​

Anti-patterns​

Review Questions​

Resources​

Design Principles

Best Practices

SUS-01: Efficient Resource Utilisation

What

Why

Implementation Guidance

Anti-patterns

SUS-02: Data Lifecycle & Minimisation

What

Why

Implementation Guidance

Anti-patterns

SUS-03: Region & Storage Tier Selection

What

Why

Implementation Guidance

Anti-patterns

Review Questions

Resources