Skip to main content

Glossary

Definitions of key terms used throughout the OSO Kafka Backup Well-Architected Framework documentation.

TermDefinition
ACLAccess Control List. A set of rules in Apache Kafka that define which users or service accounts are permitted to perform specific operations (read, write, describe) on topics, consumer groups, and other resources.
Air-Gapped BackupA backup stored in a location that is physically or logically isolated from the primary environment, preventing compromise of both primary data and backups in a single incident.
Audit LogA chronological record of operations performed by OSO Kafka Backup, including who initiated each action, what was affected, and the outcome. Available in the Enterprise edition.
BackupA durable copy of Kafka topic data and metadata stored in external object storage, created by OSO Kafka Backup for disaster recovery and compliance purposes.
Backup IDA unique identifier assigned to each backup run, used to reference and manage specific backup snapshots.
Backup WindowThe time period during which a backup operation runs. Shorter backup windows reduce the risk of data loss but may require more resources.
Bootstrap ServersA comma-separated list of Kafka broker addresses (host:port) used by clients to establish an initial connection to the Kafka cluster and discover the full cluster topology.
BrokerA Kafka server that stores topic partitions and serves client requests. A Kafka cluster consists of one or more brokers.
Chaos EngineeringThe discipline of experimenting on a system to build confidence in its ability to withstand turbulent conditions in production, such as simulating broker failures or network partitions.
CheckpointA record of the last successfully committed offset for each topic-partition, stored in a local SQLite database. Checkpoints enable incremental backups and crash-resilient resume.
Consumer GroupA named group of Kafka consumers that coordinate to consume messages from one or more topics, with each partition assigned to exactly one consumer in the group.
CRDCustom Resource Definition. A Kubernetes extension mechanism used by the OSO Kafka Backup Operator to define custom resources such as KafkaBackup, KafkaRestore, and KafkaBackupSchedule.
Customer-Managed Key (CMK)An encryption key owned and managed by the customer (rather than the cloud provider) used for encrypting backup data at rest, providing full control over key lifecycle and access.
Data MaskingThe process of obfuscating or redacting sensitive fields within Kafka messages during backup, ensuring that personally identifiable information (PII) is not stored in plain text. Available in the Enterprise edition.
DR DrillDisaster Recovery Drill. A planned exercise that tests the end-to-end restore process, validating that backups are viable and that the team can meet RTO and RPO targets.
Encryption at RestProtection of stored data using encryption algorithms (e.g., AES-256) so that data on disk or in object storage is unreadable without the decryption key.
Encryption in TransitProtection of data as it moves between systems using TLS, ensuring that data exchanged between Kafka brokers, backup tools, and storage backends cannot be intercepted.
Full BackupA backup that captures all messages in the configured topics from the earliest available offset through to the current offset. Contrast with incremental backup.
GCSGoogle Cloud Storage. An object storage service from Google Cloud Platform, supported as a backup storage backend by OSO Kafka Backup.
GitOpsAn operational model where the desired state of infrastructure and applications is declared in Git repositories, with automated tooling (e.g., ArgoCD, Flux) reconciling the live state to match.
GrafanaAn open-source observability platform used to visualise Prometheus metrics from OSO Kafka Backup through pre-built dashboards.
IAMIdentity and Access Management. Cloud provider services (AWS IAM, Azure RBAC, GCP IAM) that control which identities can access storage buckets, encryption keys, and other resources.
Incremental BackupA backup that captures only messages produced since the last checkpoint, reducing backup duration and storage consumption compared to a full backup.
ISR (In-Sync Replicas)The set of partition replicas that are fully caught up with the leader replica. A message is considered committed only when all ISR members have acknowledged it.
KRaftKafka Raft. The consensus protocol that replaces ZooKeeper for Kafka cluster metadata management, available from Kafka 3.3 and the default from Kafka 4.0.
Kubernetes OperatorA software extension to Kubernetes that uses CRDs and custom controllers to manage the lifecycle of OSO Kafka Backup resources, including scheduling, monitoring, and reconciliation.
Lifecycle PolicyA storage backend rule that automatically transitions or deletes objects based on age. Used to manage backup retention by moving older backups to cheaper storage tiers or expiring them.
ManifestA JSON file (manifest.json) stored at the root of a backup that contains metadata about the backup, including topics, partitions, offset ranges, and timestamps.
MinIOAn open-source, S3-compatible object storage system that can serve as a self-hosted backup storage backend for OSO Kafka Backup.
mTLSMutual TLS. A TLS configuration where both the client and server present certificates and verify each other's identity, providing stronger authentication than one-way TLS.
Object StorageA storage architecture that manages data as objects (with metadata and a unique identifier) rather than as files in a hierarchy. Examples include S3, GCS, and Azure Blob Storage.
OffsetA sequential integer assigned to each message within a Kafka partition, uniquely identifying the message's position. Offsets are used to track consumer progress and enable point-in-time recovery.
PartitionA subdivision of a Kafka topic that provides parallelism. Each partition is an ordered, immutable sequence of messages, and each message within a partition has a unique offset.
Point-in-Time Recovery (PITR)The ability to restore Kafka topic data to any arbitrary timestamp by filtering backed-up messages based on their timestamps.
PrometheusAn open-source monitoring and alerting toolkit used to collect and query metrics exposed by OSO Kafka Backup on its metrics endpoint (default port 8080).
RBACRole-Based Access Control. A security model that restricts operations based on the roles assigned to users or service accounts. Available in the Enterprise edition.
Recovery Point Objective (RPO)The maximum acceptable amount of data loss measured in time. An RPO of 1 hour means that up to 1 hour of data may be lost in a disaster.
Recovery Time Objective (RTO)The maximum acceptable time to restore service after a disaster. An RTO of 4 hours means the system must be operational within 4 hours of an incident.
Replication FactorThe number of copies of each partition maintained across Kafka brokers. A replication factor of 3 means each partition has three replicas, providing fault tolerance.
RestoreThe process of reading backed-up data from object storage and producing it to a target Kafka cluster, optionally filtered by time window, topic, or partition.
RunbookA documented procedure for performing a specific operational task, such as restoring a Kafka topic from backup or responding to a backup failure alert.
S3Amazon Simple Storage Service. An object storage service from AWS, and the most commonly used backup storage backend for OSO Kafka Backup.
SASLSimple Authentication and Security Layer. A framework for authentication used by Kafka clients, supporting mechanisms such as PLAIN, SCRAM-SHA-256, and SCRAM-SHA-512.
SegmentA compressed file within a backup that contains a range of Kafka messages for a specific topic-partition. Segments are named by their starting offset (e.g., segment-000000001000.zst).
Server-Side Encryption (SSE)Encryption performed by the storage provider (e.g., S3, GCS, Azure Blob) at rest, transparently encrypting and decrypting objects without changes to the client.
SLAService Level Agreement. A formal commitment defining the expected availability, performance, and support response times for a service.
SLIService Level Indicator. A quantitative metric used to measure system behaviour, such as backup success rate, restore latency, or storage write throughput.
SLOService Level Objective. A target value or range for an SLI, such as "99.9% backup success rate" or "restore completes within 4 hours."
Storage TierA class of storage with specific cost and performance characteristics. For example, S3 Standard for active backups and S3 Glacier for long-term archival.
TLSTransport Layer Security. A cryptographic protocol that provides encrypted communication between Kafka clients and brokers, and between the backup tool and storage backends.
TopicA named category or feed in Apache Kafka to which messages are published. Topics are divided into partitions for scalability and parallelism.
ZooKeeperA centralised coordination service historically used by Apache Kafka for cluster metadata management, being replaced by KRaft in modern Kafka deployments.