Glossary
Definitions of key terms used throughout the OSO Kafka Backup Well-Architected Framework documentation.
| Term | Definition |
|---|---|
| ACL | Access Control List. A set of rules in Apache Kafka that define which users or service accounts are permitted to perform specific operations (read, write, describe) on topics, consumer groups, and other resources. |
| Air-Gapped Backup | A backup stored in a location that is physically or logically isolated from the primary environment, preventing compromise of both primary data and backups in a single incident. |
| Audit Log | A chronological record of operations performed by OSO Kafka Backup, including who initiated each action, what was affected, and the outcome. Available in the Enterprise edition. |
| Backup | A durable copy of Kafka topic data and metadata stored in external object storage, created by OSO Kafka Backup for disaster recovery and compliance purposes. |
| Backup ID | A unique identifier assigned to each backup run, used to reference and manage specific backup snapshots. |
| Backup Window | The time period during which a backup operation runs. Shorter backup windows reduce the risk of data loss but may require more resources. |
| Bootstrap Servers | A comma-separated list of Kafka broker addresses (host:port) used by clients to establish an initial connection to the Kafka cluster and discover the full cluster topology. |
| Broker | A Kafka server that stores topic partitions and serves client requests. A Kafka cluster consists of one or more brokers. |
| Chaos Engineering | The discipline of experimenting on a system to build confidence in its ability to withstand turbulent conditions in production, such as simulating broker failures or network partitions. |
| Checkpoint | A record of the last successfully committed offset for each topic-partition, stored in a local SQLite database. Checkpoints enable incremental backups and crash-resilient resume. |
| Consumer Group | A named group of Kafka consumers that coordinate to consume messages from one or more topics, with each partition assigned to exactly one consumer in the group. |
| CRD | Custom Resource Definition. A Kubernetes extension mechanism used by the OSO Kafka Backup Operator to define custom resources such as KafkaBackup, KafkaRestore, and KafkaBackupSchedule. |
| Customer-Managed Key (CMK) | An encryption key owned and managed by the customer (rather than the cloud provider) used for encrypting backup data at rest, providing full control over key lifecycle and access. |
| Data Masking | The process of obfuscating or redacting sensitive fields within Kafka messages during backup, ensuring that personally identifiable information (PII) is not stored in plain text. Available in the Enterprise edition. |
| DR Drill | Disaster Recovery Drill. A planned exercise that tests the end-to-end restore process, validating that backups are viable and that the team can meet RTO and RPO targets. |
| Encryption at Rest | Protection of stored data using encryption algorithms (e.g., AES-256) so that data on disk or in object storage is unreadable without the decryption key. |
| Encryption in Transit | Protection of data as it moves between systems using TLS, ensuring that data exchanged between Kafka brokers, backup tools, and storage backends cannot be intercepted. |
| Full Backup | A backup that captures all messages in the configured topics from the earliest available offset through to the current offset. Contrast with incremental backup. |
| GCS | Google Cloud Storage. An object storage service from Google Cloud Platform, supported as a backup storage backend by OSO Kafka Backup. |
| GitOps | An operational model where the desired state of infrastructure and applications is declared in Git repositories, with automated tooling (e.g., ArgoCD, Flux) reconciling the live state to match. |
| Grafana | An open-source observability platform used to visualise Prometheus metrics from OSO Kafka Backup through pre-built dashboards. |
| IAM | Identity and Access Management. Cloud provider services (AWS IAM, Azure RBAC, GCP IAM) that control which identities can access storage buckets, encryption keys, and other resources. |
| Incremental Backup | A backup that captures only messages produced since the last checkpoint, reducing backup duration and storage consumption compared to a full backup. |
| ISR (In-Sync Replicas) | The set of partition replicas that are fully caught up with the leader replica. A message is considered committed only when all ISR members have acknowledged it. |
| KRaft | Kafka Raft. The consensus protocol that replaces ZooKeeper for Kafka cluster metadata management, available from Kafka 3.3 and the default from Kafka 4.0. |
| Kubernetes Operator | A software extension to Kubernetes that uses CRDs and custom controllers to manage the lifecycle of OSO Kafka Backup resources, including scheduling, monitoring, and reconciliation. |
| Lifecycle Policy | A storage backend rule that automatically transitions or deletes objects based on age. Used to manage backup retention by moving older backups to cheaper storage tiers or expiring them. |
| Manifest | A JSON file (manifest.json) stored at the root of a backup that contains metadata about the backup, including topics, partitions, offset ranges, and timestamps. |
| MinIO | An open-source, S3-compatible object storage system that can serve as a self-hosted backup storage backend for OSO Kafka Backup. |
| mTLS | Mutual TLS. A TLS configuration where both the client and server present certificates and verify each other's identity, providing stronger authentication than one-way TLS. |
| Object Storage | A storage architecture that manages data as objects (with metadata and a unique identifier) rather than as files in a hierarchy. Examples include S3, GCS, and Azure Blob Storage. |
| Offset | A sequential integer assigned to each message within a Kafka partition, uniquely identifying the message's position. Offsets are used to track consumer progress and enable point-in-time recovery. |
| Partition | A subdivision of a Kafka topic that provides parallelism. Each partition is an ordered, immutable sequence of messages, and each message within a partition has a unique offset. |
| Point-in-Time Recovery (PITR) | The ability to restore Kafka topic data to any arbitrary timestamp by filtering backed-up messages based on their timestamps. |
| Prometheus | An open-source monitoring and alerting toolkit used to collect and query metrics exposed by OSO Kafka Backup on its metrics endpoint (default port 8080). |
| RBAC | Role-Based Access Control. A security model that restricts operations based on the roles assigned to users or service accounts. Available in the Enterprise edition. |
| Recovery Point Objective (RPO) | The maximum acceptable amount of data loss measured in time. An RPO of 1 hour means that up to 1 hour of data may be lost in a disaster. |
| Recovery Time Objective (RTO) | The maximum acceptable time to restore service after a disaster. An RTO of 4 hours means the system must be operational within 4 hours of an incident. |
| Replication Factor | The number of copies of each partition maintained across Kafka brokers. A replication factor of 3 means each partition has three replicas, providing fault tolerance. |
| Restore | The process of reading backed-up data from object storage and producing it to a target Kafka cluster, optionally filtered by time window, topic, or partition. |
| Runbook | A documented procedure for performing a specific operational task, such as restoring a Kafka topic from backup or responding to a backup failure alert. |
| S3 | Amazon Simple Storage Service. An object storage service from AWS, and the most commonly used backup storage backend for OSO Kafka Backup. |
| SASL | Simple Authentication and Security Layer. A framework for authentication used by Kafka clients, supporting mechanisms such as PLAIN, SCRAM-SHA-256, and SCRAM-SHA-512. |
| Segment | A compressed file within a backup that contains a range of Kafka messages for a specific topic-partition. Segments are named by their starting offset (e.g., segment-000000001000.zst). |
| Server-Side Encryption (SSE) | Encryption performed by the storage provider (e.g., S3, GCS, Azure Blob) at rest, transparently encrypting and decrypting objects without changes to the client. |
| SLA | Service Level Agreement. A formal commitment defining the expected availability, performance, and support response times for a service. |
| SLI | Service Level Indicator. A quantitative metric used to measure system behaviour, such as backup success rate, restore latency, or storage write throughput. |
| SLO | Service Level Objective. A target value or range for an SLI, such as "99.9% backup success rate" or "restore completes within 4 hours." |
| Storage Tier | A class of storage with specific cost and performance characteristics. For example, S3 Standard for active backups and S3 Glacier for long-term archival. |
| TLS | Transport Layer Security. A cryptographic protocol that provides encrypted communication between Kafka clients and brokers, and between the backup tool and storage backends. |
| Topic | A named category or feed in Apache Kafka to which messages are published. Topics are divided into partitions for scalability and parallelism. |
| ZooKeeper | A centralised coordination service historically used by Apache Kafka for cluster metadata management, being replaced by KRaft in modern Kafka deployments. |