Glossary

Definitions of key terms used throughout the OSO Kafka Backup Well-Architected Framework documentation.

Term	Definition
ACL	Access Control List. A set of rules in Apache Kafka that define which users or service accounts are permitted to perform specific operations (read, write, describe) on topics, consumer groups, and other resources.
Air-Gapped Backup	A backup stored in a location that is physically or logically isolated from the primary environment, preventing compromise of both primary data and backups in a single incident.
Audit Log	A chronological record of operations performed by OSO Kafka Backup, including who initiated each action, what was affected, and the outcome. Available in the Enterprise edition.
Backup	A durable copy of Kafka topic data and metadata stored in external object storage, created by OSO Kafka Backup for disaster recovery and compliance purposes.
Backup ID	A unique identifier assigned to each backup run, used to reference and manage specific backup snapshots.
Backup Window	The time period during which a backup operation runs. Shorter backup windows reduce the risk of data loss but may require more resources.
Bootstrap Servers	A comma-separated list of Kafka broker addresses (host:port) used by clients to establish an initial connection to the Kafka cluster and discover the full cluster topology.
Broker	A Kafka server that stores topic partitions and serves client requests. A Kafka cluster consists of one or more brokers.
Chaos Engineering	The discipline of experimenting on a system to build confidence in its ability to withstand turbulent conditions in production, such as simulating broker failures or network partitions.
Checkpoint	A record of the last successfully committed offset for each topic-partition, stored in a local SQLite database. Checkpoints enable incremental backups and crash-resilient resume.
Consumer Group	A named group of Kafka consumers that coordinate to consume messages from one or more topics, with each partition assigned to exactly one consumer in the group.
CRD	Custom Resource Definition. A Kubernetes extension mechanism used by the OSO Kafka Backup Operator to define custom resources such as `KafkaBackup`, `KafkaRestore`, and `KafkaBackupSchedule`.
Customer-Managed Key (CMK)	An encryption key owned and managed by the customer (rather than the cloud provider) used for encrypting backup data at rest, providing full control over key lifecycle and access.
Data Masking	The process of obfuscating or redacting sensitive fields within Kafka messages during backup, ensuring that personally identifiable information (PII) is not stored in plain text. Available in the Enterprise edition.
DR Drill	Disaster Recovery Drill. A planned exercise that tests the end-to-end restore process, validating that backups are viable and that the team can meet RTO and RPO targets.
Encryption at Rest	Protection of stored data using encryption algorithms (e.g., AES-256) so that data on disk or in object storage is unreadable without the decryption key.
Encryption in Transit	Protection of data as it moves between systems using TLS, ensuring that data exchanged between Kafka brokers, backup tools, and storage backends cannot be intercepted.
Full Backup	A backup that captures all messages in the configured topics from the earliest available offset through to the current offset. Contrast with incremental backup.
GCS	Google Cloud Storage. An object storage service from Google Cloud Platform, supported as a backup storage backend by OSO Kafka Backup.
GitOps	An operational model where the desired state of infrastructure and applications is declared in Git repositories, with automated tooling (e.g., ArgoCD, Flux) reconciling the live state to match.
Grafana	An open-source observability platform used to visualise Prometheus metrics from OSO Kafka Backup through pre-built dashboards.
IAM	Identity and Access Management. Cloud provider services (AWS IAM, Azure RBAC, GCP IAM) that control which identities can access storage buckets, encryption keys, and other resources.
Incremental Backup	A backup that captures only messages produced since the last checkpoint, reducing backup duration and storage consumption compared to a full backup.
ISR (In-Sync Replicas)	The set of partition replicas that are fully caught up with the leader replica. A message is considered committed only when all ISR members have acknowledged it.
KRaft	Kafka Raft. The consensus protocol that replaces ZooKeeper for Kafka cluster metadata management, available from Kafka 3.3 and the default from Kafka 4.0.
Kubernetes Operator	A software extension to Kubernetes that uses CRDs and custom controllers to manage the lifecycle of OSO Kafka Backup resources, including scheduling, monitoring, and reconciliation.
Lifecycle Policy	A storage backend rule that automatically transitions or deletes objects based on age. Used to manage backup retention by moving older backups to cheaper storage tiers or expiring them.
Manifest	A JSON file (`manifest.json`) stored at the root of a backup that contains metadata about the backup, including topics, partitions, offset ranges, and timestamps.
MinIO	An open-source, S3-compatible object storage system that can serve as a self-hosted backup storage backend for OSO Kafka Backup.
mTLS	Mutual TLS. A TLS configuration where both the client and server present certificates and verify each other's identity, providing stronger authentication than one-way TLS.
Object Storage	A storage architecture that manages data as objects (with metadata and a unique identifier) rather than as files in a hierarchy. Examples include S3, GCS, and Azure Blob Storage.
Offset	A sequential integer assigned to each message within a Kafka partition, uniquely identifying the message's position. Offsets are used to track consumer progress and enable point-in-time recovery.
Partition	A subdivision of a Kafka topic that provides parallelism. Each partition is an ordered, immutable sequence of messages, and each message within a partition has a unique offset.
Point-in-Time Recovery (PITR)	The ability to restore Kafka topic data to any arbitrary timestamp by filtering backed-up messages based on their timestamps.
Prometheus	An open-source monitoring and alerting toolkit used to collect and query metrics exposed by OSO Kafka Backup on its metrics endpoint (default port 8080).
RBAC	Role-Based Access Control. A security model that restricts operations based on the roles assigned to users or service accounts. Available in the Enterprise edition.
Recovery Point Objective (RPO)	The maximum acceptable amount of data loss measured in time. An RPO of 1 hour means that up to 1 hour of data may be lost in a disaster.
Recovery Time Objective (RTO)	The maximum acceptable time to restore service after a disaster. An RTO of 4 hours means the system must be operational within 4 hours of an incident.
Replication Factor	The number of copies of each partition maintained across Kafka brokers. A replication factor of 3 means each partition has three replicas, providing fault tolerance.
Restore	The process of reading backed-up data from object storage and producing it to a target Kafka cluster, optionally filtered by time window, topic, or partition.
Runbook	A documented procedure for performing a specific operational task, such as restoring a Kafka topic from backup or responding to a backup failure alert.
S3	Amazon Simple Storage Service. An object storage service from AWS, and the most commonly used backup storage backend for OSO Kafka Backup.
SASL	Simple Authentication and Security Layer. A framework for authentication used by Kafka clients, supporting mechanisms such as PLAIN, SCRAM-SHA-256, and SCRAM-SHA-512.
Segment	A compressed file within a backup that contains a range of Kafka messages for a specific topic-partition. Segments are named by their starting offset (e.g., `segment-000000001000.zst`).
Server-Side Encryption (SSE)	Encryption performed by the storage provider (e.g., S3, GCS, Azure Blob) at rest, transparently encrypting and decrypting objects without changes to the client.
SLA	Service Level Agreement. A formal commitment defining the expected availability, performance, and support response times for a service.
SLI	Service Level Indicator. A quantitative metric used to measure system behaviour, such as backup success rate, restore latency, or storage write throughput.
SLO	Service Level Objective. A target value or range for an SLI, such as "99.9% backup success rate" or "restore completes within 4 hours."
Storage Tier	A class of storage with specific cost and performance characteristics. For example, S3 Standard for active backups and S3 Glacier for long-term archival.
TLS	Transport Layer Security. A cryptographic protocol that provides encrypted communication between Kafka clients and brokers, and between the backup tool and storage backends.
Topic	A named category or feed in Apache Kafka to which messages are published. Topics are divided into partitions for scalability and parallelism.
ZooKeeper	A centralised coordination service historically used by Apache Kafka for cluster metadata management, being replaced by KRaft in modern Kafka deployments.