Prerequisites
Clusters, kernel requirements, tooling versions
This page defines the minimum infrastructure, software, and tooling requirements for deploying Site Recovery. Meeting these requirements before you begin ensures that Ansible can provision all components correctly, that DRBD replication can run between your primary and DR cluster nodes, and that pgctl can manage Protection Groups and failover operations. Requirements vary slightly depending on whether you choose the LINSTOR deployment model (three clusters required) or the DRBD Operator deployment model (two clusters minimum); both models are covered here.
Review all requirements for your chosen deployment model before provisioning any infrastructure.
Deployment model selection
Choose one of the two supported models before proceeding. Your choice determines cluster count and which storage operator Ansible deploys.
| Model | Minimum clusters | Quorum cluster |
|---|---|---|
| LINSTOR | 3 (primary, DR, quorum) | Required — hosts the LINSTOR controller and failover controllers |
| DRBD Operator | 2 (primary, DR) | Optional but recommended for management plane isolation |
LINSTOR model: The quorum cluster is mandatory. It hosts the LINSTOR controller and failover controllers. It does not run application workloads and does not relay DRBD replication traffic. Replication runs directly between primary and DR cluster nodes.
DRBD Operator model: A quorum cluster is optional. When present, it serves as an isolated management plane only — no application workloads, no replication relay.
Cluster requirements
All clusters must be fully provisioned and reachable before you run any Ansible playbooks.
- Primary cluster — runs active VM workloads; must have DRBD-compatible kernel on every storage node
- DR cluster — standby for disaster recovery; must have DRBD-compatible kernel on every storage node
- Quorum cluster — management plane only (required for LINSTOR model; recommended for DRBD Operator model); standard kernel is sufficient; no application workloads are scheduled here
Node kernel requirements
- Primary and DR cluster nodes that participate in storage replication must run a DRBD-compatible Linux kernel
- The quorum cluster has no special kernel requirement
- Verify kernel compatibility for your distribution before provisioning nodes; DRBD kernel module installation is handled by the
playbooks/deploy-drbd.ymlAnsible playbook
Network connectivity
- All clusters must be mutually reachable at the network level
- DRBD replication traffic flows directly between primary and DR cluster nodes — ensure the necessary ports are open between those node pools
- The quorum cluster requires network access to both primary and DR cluster API servers for management operations
- Kubeconfig credentials for all clusters must be available on the workstation running Ansible and
pgctl
Tooling versions
| Tool | Minimum version | Used for |
|---|---|---|
| Ansible | 2.12 | Deploying all infrastructure components (DRBD, LINSTOR or DRBD Operator, controllers) to primary, DR, and quorum clusters |
| Helm | 3.x | Deploying the Site Manager UI to the quorum cluster |
pgctl | (see release notes) | Protection Group management, failover operations, and multi-tenant deployment management |
kubectl | Compatible with your cluster version | Cluster inspection and manual resource management |
Storage requirements
- Storage devices must be available on both primary and DR cluster nodes before Ansible runs storage configuration
- For LINSTOR deployments: LVM thin pools are required; size and device paths must be known in advance
- Sufficient block storage capacity must exist on both sites to hold all protected VM volumes
Access and credentials
- Valid kubeconfig files for each cluster, accessible from the Ansible control node and from workstations running
pgctl - Registry credentials for
drbd.ioif your environment pulls images from that registry - Sufficient RBAC permissions to create namespaces, service accounts, ClusterRoles, and ClusterRoleBindings on all clusters
- For LINSTOR multi-tenant deployments: permissions to create secrets and deployments in deployment-specific namespaces on the quorum cluster
KubeVirt
- KubeVirt must already be installed and operational on your primary and DR clusters before you configure VM protection
- VM definitions and their associated PVCs must use storage classes backed by DRBD-replicated volumes
Step 1: Verify cluster reachability
Confirm that your workstation can reach all required clusters.
# Primary cluster
kubectl --kubeconfig ~/.kube/config-primary get nodes
# DR cluster
kubectl --kubeconfig ~/.kube/config-dr get nodes
# Quorum cluster (LINSTOR model, or DRBD Operator model with optional quorum)
kubectl --kubeconfig ~/.kube/config-quorum get nodes
All commands must return a node list without errors before you proceed.
Step 2: Verify Ansible version
ansible --version
The output must show version 2.12 or later. Example:
ansible [core 2.14.3]
Step 3: Verify Helm version
helm version
The output must show version 3.x or later. Example:
version.BuildInfo{Version:"v3.12.0", ...}
Step 4: Verify pgctl is installed and initialized
pgctl --version
Then initialize pgctl with your cluster kubeconfigs:
pgctl init
After initialization, kubeconfig files must be present at the expected paths:
ls ~/.kube/pgctl/
# Expected: cluster1.kubeconfig cluster2.kubeconfig (and quorum.kubeconfig if applicable)
Step 5: Verify DRBD kernel compatibility on storage nodes
Run the following on each primary and DR cluster node that will participate in storage replication:
uname -r
modprobe drbd && echo "DRBD module loadable"
If modprobe drbd fails, your kernel does not include the DRBD module. You must either install the appropriate kernel package for your distribution or switch to a DRBD-compatible kernel before running the playbooks/deploy-drbd.yml Ansible playbook.
Step 6: Confirm storage devices are available (LINSTOR model)
For LINSTOR deployments, identify the block devices that will back the LVM thin pools on both primary and DR nodes:
# On primary cluster storage nodes
lsblk
# On DR cluster storage nodes
lsblk
Record the device paths (for example, /dev/nvme0n1). You will need these when running the LINSTOR storage configuration steps.
Step 7: Confirm registry credentials (if required)
If your environment pulls images from drbd.io, confirm that you have valid credentials before deployment:
docker login drbd.io
These credentials will be used to create pull secrets on the primary and DR clusters during Ansible deployment.
The following configuration decisions must be made before deployment. These choices affect which Ansible playbooks you run, how many clusters you provision, and how pgctl and the Site Manager UI are configured.
Deployment model
| Setting | Options | Effect |
|---|---|---|
| Deployment model | linstor | drbd-operator | Determines which storage operator Ansible deploys, the minimum cluster count, and whether a quorum cluster is required |
- LINSTOR model: Ansible deploys the LINSTOR controller to the quorum cluster, LINSTOR satellite nodes to primary and DR clusters, and the failover operator to primary and DR clusters. The LINSTOR controller on the quorum cluster coordinates storage metadata; it does not carry replication traffic.
- DRBD Operator model: Ansible deploys the DRBD Operator instead of LINSTOR. Two clusters are the minimum; a quorum cluster is optional but recommended so that management components are isolated from workload clusters.
Cluster roles and kubeconfig paths
pgctl and the Ansible inventory both require kubeconfig files to be placed at predictable paths. Ensure the following are in place before initializing pgctl or running playbooks:
~/.kube/pgctl/cluster1.kubeconfig # Primary cluster
~/.kube/pgctl/cluster2.kubeconfig # DR cluster
~/.kube/pgctl/quorum.kubeconfig # Quorum cluster (LINSTOR model, or optional for DRBD Operator)
Namespace isolation (multi-tenant deployments)
For multi-tenant environments where multiple DR deployment pairs share a single quorum cluster, each deployment is isolated in its own namespace following the convention dr-<deployment-name>. Resources created within a deployment namespace — including kubeconfig secrets, service accounts, RBAC, and failover controller components — are scoped to that namespace.
LINSTOR storage configuration (LINSTOR model only)
The following parameters must be decided before running storage configuration steps:
| Parameter | Description | Example |
|---|---|---|
| Storage device path | Block device on each node that backs the LVM volume group | /dev/nvme0n1 |
| Volume group name | LVM volume group created on each node | linstor_vg |
| Thin pool name | LVM thin pool within the volume group | linstor_thinpool |
| Thin pool size | Capacity allocated to the thin pool | 5T |
| Resource group name | LINSTOR resource group used for geo-replicated placement | <deployment-name>_geo_rg |
| Storage class name | Kubernetes StorageClass backed by the LINSTOR resource group | <deployment-name>-geo-replicated |
| Placement count | Number of replicas (must be 2 for geo-replication across primary and DR) | 2 |
Why placement count matters: Setting
placementCount: "2"with LINSTOR'sautoPlaceensures that every volume is replicated to exactly one node on each site. Reducing this value below 2 removes DR protection for those volumes.
Satellite host networking
LINSTOR satellite DaemonSets must be configured with hostNetwork: true and dnsPolicy: ClusterFirstWithHostNet. This is required for satellites to bind to the correct node IP addresses for DRBD replication. This setting is applied during deployment but must be compatible with your cluster's network policy.
RBAC scope
The failover controller requires both namespace-scoped permissions (for deployment-specific resources) and cluster-scoped permissions (for CRDs, PersistentVolumes, StorageClasses, and VolumeSnapshots). Ensure that the account running Ansible has sufficient privileges to create ClusterRoles and ClusterRoleBindings on the quorum cluster.
Once your prerequisites are confirmed and your deployment model is selected, your workflow follows this sequence:
- Provision clusters — bring up primary, DR, and (where required or desired) quorum clusters with the kernel and network requirements described on this page
- Run Ansible — use the Ansible playbooks to deploy DRBD, the storage operator (LINSTOR or DRBD Operator), and the failover controllers to your clusters
- Deploy Site Manager UI — use Helm to deploy the Site Manager UI to the quorum cluster
- Initialize pgctl — run
pgctl initand confirm kubeconfigs are in place - Configure replication and protect VMs — create Protection Groups and configure DRBD geo-replication
- Validate DR readiness — run a test failover using
pgctl - Execute failover or failback — use
pgctlfor all Protection Group and failover operations
Working with pgctl
pgctl is the primary CLI tool for all Protection Group management, failover operations, and multi-tenant deployment management. After initialization, use it to:
- Sync VM definitions to DR clusters before a failover
- Validate Protection Group readiness
- Activate a Protection Group on the target cluster during failover
- Manage deployment namespaces in multi-tenant quorum setups
# Validate a Protection Group is ready for failover
pgctl validate pg <protection-group-name>
# Sync VM definitions to target cluster
pgctl sync vm <vm-name>
# Activate a Protection Group on the target cluster (planned failover)
pgctl activate-pg <protection-group-name> <target-cluster> [namespace]
Multi-tenant quorum cluster
If you are adding a new DR deployment pair to an existing quorum cluster, each pair is managed in its own dr-<deployment-name> namespace. Use pgctl to manage deployments without affecting other tenants sharing the same quorum cluster.
Example 1: Verify all clusters are reachable before deployment
Run this check from your Ansible control node after kubeconfigs are in place.
for cluster in primary dr quorum; do
echo "--- $cluster ---"
kubectl --kubeconfig ~/.kube/config-${cluster} get nodes --no-headers | awk '{print $1, $2}'
done
Expected output:
--- primary ---
node-primary-1 Ready
--- dr ---
node-dr-1 Ready
--- quorum ---
node-quorum-1 Ready
All nodes must report Ready before you run any Ansible playbooks.
Example 2: Confirm pgctl kubeconfig initialization
After running pgctl init, verify that the expected kubeconfig files are present.
ls -1 ~/.kube/pgctl/
Expected output (LINSTOR model with quorum):
cluster1.kubeconfig
cluster2.kubeconfig
quorum.kubeconfig
Expected output (DRBD Operator model, two-cluster minimum):
cluster1.kubeconfig
cluster2.kubeconfig
Example 3: Validate Ansible and Helm versions meet minimums
ansible --version | head -1
helm version --short
Expected output:
ansible [core 2.14.3]
v3.12.0+g5654a60
Ansible must be 2.12 or later. Helm must be 3.x or later. If either version is below the minimum, update before continuing.
Example 4: Confirm DRBD kernel module is loadable on a storage node
Run this on each node in your primary and DR clusters that will participate in DRBD replication.
ssh <node-ip> 'uname -r && modprobe drbd && echo "DRBD OK"'
Expected output:
5.15.0-91-generic
DRBD OK
If modprobe drbd returns an error, install the DRBD-compatible kernel package for your Linux distribution before proceeding to the playbooks/deploy-drbd.yml step.
Example 5: Confirm storage devices are available (LINSTOR model)
Identify block devices on primary and DR storage nodes before running LINSTOR storage configuration.
# Primary cluster
kubectl --kubeconfig ~/.kube/config-primary exec -n linbit-sds <satellite-pod> \
-c linstor-satellite -- lsblk -d -o NAME,SIZE,TYPE | grep disk
# DR cluster
kubectl --kubeconfig ~/.kube/config-dr exec -n linbit-sds <satellite-pod> \
-c linstor-satellite -- lsblk -d -o NAME,SIZE,TYPE | grep disk
Expected output:
nvme0n1 3.5T disk
Record the device name. You will reference it when creating the LVM physical volume and volume group during storage configuration.
Cluster unreachable from Ansible control node
Symptom: kubectl get nodes returns a connection refused error or times out.
Likely cause: The kubeconfig points to an IP or hostname that is not routable from your workstation, or the cluster API server is not yet available.
Fix: Confirm network connectivity to the cluster API server endpoint. Check that the server address in the kubeconfig matches an accessible IP or hostname. If using a VPN or bastion, ensure the tunnel is active.
DRBD module fails to load on storage nodes
Symptom: modprobe drbd returns ERROR: could not insert 'drbd': No such device or Module drbd not found.
Likely cause: The running kernel does not include the DRBD module or was not built with DRBD support. This is common on managed Kubernetes node images that do not include out-of-tree kernel modules.
Fix: Install the DRBD kernel module package for your Linux distribution and kernel version. For some distributions this is available as drbd-dkms or drbd-utils. Reboot the node after installation and re-verify with modprobe drbd.
Ansible version below minimum
Symptom: Ansible playbook execution fails with syntax errors or missing module errors on constructs that are valid in Ansible 2.12+.
Likely cause: The installed Ansible version is older than 2.12.
Fix: Upgrade Ansible on your control node. Use pip install --upgrade ansible or follow your distribution's package manager. Verify with ansible --version after upgrade.
pgctl init fails or kubeconfigs not found
Symptom: pgctl commands return errors about missing kubeconfigs or fail to connect to clusters after pgctl init.
Likely cause: Kubeconfig files were not placed at the paths pgctl expects (~/.kube/pgctl/cluster1.kubeconfig, etc.), or pgctl init was not run.
Fix: Run pgctl init and confirm the kubeconfig files exist at the expected paths with ls ~/.kube/pgctl/. Ensure the filenames match the cluster names that pgctl is configured to use.
LINSTOR satellite nodes not registering as Online
Symptom: linstor node list shows satellites in OFFLINE or UNKNOWN state after deployment.
Likely cause (1): Satellite DaemonSets are not using host networking. Without hostNetwork: true, satellites bind to the pod IP rather than the node IP, which prevents DRBD replication connections.
Fix: Patch the satellite DaemonSet on both primary and DR clusters to add hostNetwork: true and dnsPolicy: ClusterFirstWithHostNet, then restart the satellite pods.
Likely cause (2): The satellite node was not registered in the LINSTOR controller with the correct node IP.
Fix: Re-run the linstor node create command from the quorum cluster using the correct node IP address, then verify connectivity with linstor node list.
Missing registry pull secret causes image pull failures
Symptom: Pods on primary or DR clusters remain in ImagePullBackOff state for LINSTOR or DRBD components.
Likely cause: The drbdio-pull-secret was not created in the linbit-sds namespace, or the credentials are incorrect.
Fix: Create the pull secret in the linbit-sds namespace on both clusters using your drbd.io credentials, then patch the affected Deployments and DaemonSets to reference the secret under imagePullSecrets.
Insufficient RBAC prevents Ansible from creating ClusterRoles
Symptom: Ansible playbook fails with Forbidden errors when attempting to create ClusterRole or ClusterRoleBinding resources.
Likely cause: The kubeconfig used by Ansible does not have cluster-admin or equivalent privileges on the target cluster.
Fix: Use a kubeconfig with cluster-admin permissions for the initial deployment. After deployment, you can restrict ongoing operational access as appropriate for your environment.