Guide

Prerequisites

Clusters, kernel requirements, tooling versions

Overview

This page defines the minimum infrastructure, software, and tooling requirements for deploying Site Recovery. Meeting these requirements before you begin ensures that Ansible can provision all components correctly, that DRBD replication can run between your primary and DR cluster nodes, and that pgctl can manage Protection Groups and failover operations. Requirements vary slightly depending on whether you choose the LINSTOR deployment model (three clusters required) or the DRBD Operator deployment model (two clusters minimum); both models are covered here.

Prerequisites

Review all requirements for your chosen deployment model before provisioning any infrastructure.

Deployment model selection

Choose one of the two supported models before proceeding. Your choice determines cluster count and which storage operator Ansible deploys.

Model	Minimum clusters	Quorum cluster
LINSTOR	3 (primary, DR, quorum)	Required — hosts the LINSTOR controller and failover controllers
DRBD Operator	2 (primary, DR)	Optional but recommended for management plane isolation

LINSTOR model: The quorum cluster is mandatory. It hosts the LINSTOR controller and failover controllers. It does not run application workloads and does not relay DRBD replication traffic. Replication runs directly between primary and DR cluster nodes.

DRBD Operator model: A quorum cluster is optional. When present, it serves as an isolated management plane only — no application workloads, no replication relay.

Cluster requirements

All clusters must be fully provisioned and reachable before you run any Ansible playbooks.

Primary cluster — runs active VM workloads; must have DRBD-compatible kernel on every storage node
DR cluster — standby for disaster recovery; must have DRBD-compatible kernel on every storage node
Quorum cluster — management plane only (required for LINSTOR model; recommended for DRBD Operator model); standard kernel is sufficient; no application workloads are scheduled here

Node kernel requirements

Primary and DR cluster nodes that participate in storage replication must run a DRBD-compatible Linux kernel
The quorum cluster has no special kernel requirement
Verify kernel compatibility for your distribution before provisioning nodes; DRBD kernel module installation is handled by the playbooks/deploy-drbd.yml Ansible playbook

Network connectivity

All clusters must be mutually reachable at the network level
DRBD replication traffic flows directly between primary and DR cluster nodes — ensure the necessary ports are open between those node pools
The quorum cluster requires network access to both primary and DR cluster API servers for management operations
Kubeconfig credentials for all clusters must be available on the workstation running Ansible and pgctl

Tooling versions

Tool	Minimum version	Used for
Ansible	2.12	Deploying all infrastructure components (DRBD, LINSTOR or DRBD Operator, controllers) to primary, DR, and quorum clusters
Helm	3.x	Deploying the Site Manager UI to the quorum cluster
`pgctl`	(see release notes)	Protection Group management, failover operations, and multi-tenant deployment management
`kubectl`	Compatible with your cluster version	Cluster inspection and manual resource management

Storage requirements

Storage devices must be available on both primary and DR cluster nodes before Ansible runs storage configuration
For LINSTOR deployments: LVM thin pools are required; size and device paths must be known in advance
Sufficient block storage capacity must exist on both sites to hold all protected VM volumes

Access and credentials

Valid kubeconfig files for each cluster, accessible from the Ansible control node and from workstations running pgctl
Registry credentials for drbd.io if your environment pulls images from that registry
Sufficient RBAC permissions to create namespaces, service accounts, ClusterRoles, and ClusterRoleBindings on all clusters
For LINSTOR multi-tenant deployments: permissions to create secrets and deployments in deployment-specific namespaces on the quorum cluster

KubeVirt

KubeVirt must already be installed and operational on your primary and DR clusters before you configure VM protection
VM definitions and their associated PVCs must use storage classes backed by DRBD-replicated volumes

Installation

Step 1: Verify cluster reachability

Confirm that your workstation can reach all required clusters.

# Primary cluster
kubectl --kubeconfig ~/.kube/config-primary get nodes

# DR cluster
kubectl --kubeconfig ~/.kube/config-dr get nodes

# Quorum cluster (LINSTOR model, or DRBD Operator model with optional quorum)
kubectl --kubeconfig ~/.kube/config-quorum get nodes

All commands must return a node list without errors before you proceed.

Step 2: Verify Ansible version

ansible --version

The output must show version 2.12 or later. Example:

ansible [core 2.14.3]

Step 3: Verify Helm version

helm version

The output must show version 3.x or later. Example:

version.BuildInfo{Version:"v3.12.0", ...}

Step 4: Verify pgctl is installed and initialized

pgctl --version

Then initialize pgctl with your cluster kubeconfigs:

pgctl init

After initialization, kubeconfig files must be present at the expected paths:

ls ~/.kube/pgctl/
# Expected: cluster1.kubeconfig  cluster2.kubeconfig  (and quorum.kubeconfig if applicable)

Step 5: Verify DRBD kernel compatibility on storage nodes

Run the following on each primary and DR cluster node that will participate in storage replication:

uname -r
modprobe drbd && echo "DRBD module loadable"

If modprobe drbd fails, your kernel does not include the DRBD module. You must either install the appropriate kernel package for your distribution or switch to a DRBD-compatible kernel before running the playbooks/deploy-drbd.yml Ansible playbook.

Step 6: Confirm storage devices are available (LINSTOR model)

For LINSTOR deployments, identify the block devices that will back the LVM thin pools on both primary and DR nodes:

# On primary cluster storage nodes
lsblk

# On DR cluster storage nodes
lsblk

Record the device paths (for example, /dev/nvme0n1). You will need these when running the LINSTOR storage configuration steps.

Step 7: Confirm registry credentials (if required)

If your environment pulls images from drbd.io, confirm that you have valid credentials before deployment:

docker login drbd.io

These credentials will be used to create pull secrets on the primary and DR clusters during Ansible deployment.

Configuration

The following configuration decisions must be made before deployment. These choices affect which Ansible playbooks you run, how many clusters you provision, and how pgctl and the Site Manager UI are configured.

Deployment model

Setting	Options	Effect
Deployment model	`linstor` \| `drbd-operator`	Determines which storage operator Ansible deploys, the minimum cluster count, and whether a quorum cluster is required

LINSTOR model: Ansible deploys the LINSTOR controller to the quorum cluster, LINSTOR satellite nodes to primary and DR clusters, and the failover operator to primary and DR clusters. The LINSTOR controller on the quorum cluster coordinates storage metadata; it does not carry replication traffic.
DRBD Operator model: Ansible deploys the DRBD Operator instead of LINSTOR. Two clusters are the minimum; a quorum cluster is optional but recommended so that management components are isolated from workload clusters.

Cluster roles and kubeconfig paths

pgctl and the Ansible inventory both require kubeconfig files to be placed at predictable paths. Ensure the following are in place before initializing pgctl or running playbooks:

~/.kube/pgctl/cluster1.kubeconfig    # Primary cluster
~/.kube/pgctl/cluster2.kubeconfig    # DR cluster
~/.kube/pgctl/quorum.kubeconfig      # Quorum cluster (LINSTOR model, or optional for DRBD Operator)

Namespace isolation (multi-tenant deployments)

For multi-tenant environments where multiple DR deployment pairs share a single quorum cluster, each deployment is isolated in its own namespace following the convention dr-<deployment-name>. Resources created within a deployment namespace — including kubeconfig secrets, service accounts, RBAC, and failover controller components — are scoped to that namespace.

LINSTOR storage configuration (LINSTOR model only)

The following parameters must be decided before running storage configuration steps:

Parameter	Description	Example
Storage device path	Block device on each node that backs the LVM volume group	`/dev/nvme0n1`
Volume group name	LVM volume group created on each node	`linstor_vg`
Thin pool name	LVM thin pool within the volume group	`linstor_thinpool`
Thin pool size	Capacity allocated to the thin pool	`5T`
Resource group name	LINSTOR resource group used for geo-replicated placement	`<deployment-name>_geo_rg`
Storage class name	Kubernetes StorageClass backed by the LINSTOR resource group	`<deployment-name>-geo-replicated`
Placement count	Number of replicas (must be 2 for geo-replication across primary and DR)	`2`

Why placement count matters: Setting placementCount: "2" with LINSTOR's autoPlace ensures that every volume is replicated to exactly one node on each site. Reducing this value below 2 removes DR protection for those volumes.

Satellite host networking

LINSTOR satellite DaemonSets must be configured with hostNetwork: true and dnsPolicy: ClusterFirstWithHostNet. This is required for satellites to bind to the correct node IP addresses for DRBD replication. This setting is applied during deployment but must be compatible with your cluster's network policy.

RBAC scope

The failover controller requires both namespace-scoped permissions (for deployment-specific resources) and cluster-scoped permissions (for CRDs, PersistentVolumes, StorageClasses, and VolumeSnapshots). Ensure that the account running Ansible has sufficient privileges to create ClusterRoles and ClusterRoleBindings on the quorum cluster.

Usage

Once your prerequisites are confirmed and your deployment model is selected, your workflow follows this sequence:

Provision clusters — bring up primary, DR, and (where required or desired) quorum clusters with the kernel and network requirements described on this page
Run Ansible — use the Ansible playbooks to deploy DRBD, the storage operator (LINSTOR or DRBD Operator), and the failover controllers to your clusters
Deploy Site Manager UI — use Helm to deploy the Site Manager UI to the quorum cluster
Initialize pgctl — run pgctl init and confirm kubeconfigs are in place
Configure replication and protect VMs — create Protection Groups and configure DRBD geo-replication
Validate DR readiness — run a test failover using pgctl
Execute failover or failback — use pgctl for all Protection Group and failover operations

Working with pgctl

pgctl is the primary CLI tool for all Protection Group management, failover operations, and multi-tenant deployment management. After initialization, use it to:

Sync VM definitions to DR clusters before a failover
Validate Protection Group readiness
Activate a Protection Group on the target cluster during failover
Manage deployment namespaces in multi-tenant quorum setups

# Validate a Protection Group is ready for failover
pgctl validate pg <protection-group-name>

# Sync VM definitions to target cluster
pgctl sync vm <vm-name>

# Activate a Protection Group on the target cluster (planned failover)
pgctl activate-pg <protection-group-name> <target-cluster> [namespace]

Multi-tenant quorum cluster

If you are adding a new DR deployment pair to an existing quorum cluster, each pair is managed in its own dr-<deployment-name> namespace. Use pgctl to manage deployments without affecting other tenants sharing the same quorum cluster.

Examples

Example 1: Verify all clusters are reachable before deployment

Run this check from your Ansible control node after kubeconfigs are in place.

for cluster in primary dr quorum; do
  echo "--- $cluster ---"
  kubectl --kubeconfig ~/.kube/config-${cluster} get nodes --no-headers | awk '{print $1, $2}'
done

Expected output:

--- primary ---
node-primary-1   Ready
--- dr ---
node-dr-1        Ready
--- quorum ---
node-quorum-1    Ready

All nodes must report Ready before you run any Ansible playbooks.

Example 2: Confirm pgctl kubeconfig initialization

After running pgctl init, verify that the expected kubeconfig files are present.

ls -1 ~/.kube/pgctl/

Expected output (LINSTOR model with quorum):

cluster1.kubeconfig
cluster2.kubeconfig
quorum.kubeconfig

Expected output (DRBD Operator model, two-cluster minimum):

cluster1.kubeconfig
cluster2.kubeconfig

Example 3: Validate Ansible and Helm versions meet minimums

ansible --version | head -1
helm version --short

Expected output:

ansible [core 2.14.3]
v3.12.0+g5654a60

Ansible must be 2.12 or later. Helm must be 3.x or later. If either version is below the minimum, update before continuing.

Example 4: Confirm DRBD kernel module is loadable on a storage node

Run this on each node in your primary and DR clusters that will participate in DRBD replication.

ssh <node-ip> 'uname -r && modprobe drbd && echo "DRBD OK"'

Expected output:

5.15.0-91-generic
DRBD OK

If modprobe drbd returns an error, install the DRBD-compatible kernel package for your Linux distribution before proceeding to the playbooks/deploy-drbd.yml step.

Example 5: Confirm storage devices are available (LINSTOR model)

Identify block devices on primary and DR storage nodes before running LINSTOR storage configuration.

# Primary cluster
kubectl --kubeconfig ~/.kube/config-primary exec -n linbit-sds <satellite-pod> \
  -c linstor-satellite -- lsblk -d -o NAME,SIZE,TYPE | grep disk

# DR cluster
kubectl --kubeconfig ~/.kube/config-dr exec -n linbit-sds <satellite-pod> \
  -c linstor-satellite -- lsblk -d -o NAME,SIZE,TYPE | grep disk

Expected output:

nvme0n1   3.5T disk

Record the device name. You will reference it when creating the LVM physical volume and volume group during storage configuration.

Troubleshooting

Cluster unreachable from Ansible control node

Symptom: kubectl get nodes returns a connection refused error or times out.

Likely cause: The kubeconfig points to an IP or hostname that is not routable from your workstation, or the cluster API server is not yet available.

Fix: Confirm network connectivity to the cluster API server endpoint. Check that the server address in the kubeconfig matches an accessible IP or hostname. If using a VPN or bastion, ensure the tunnel is active.

DRBD module fails to load on storage nodes

Symptom: modprobe drbd returns ERROR: could not insert 'drbd': No such device or Module drbd not found.

Likely cause: The running kernel does not include the DRBD module or was not built with DRBD support. This is common on managed Kubernetes node images that do not include out-of-tree kernel modules.

Fix: Install the DRBD kernel module package for your Linux distribution and kernel version. For some distributions this is available as drbd-dkms or drbd-utils. Reboot the node after installation and re-verify with modprobe drbd.

Ansible version below minimum

Symptom: Ansible playbook execution fails with syntax errors or missing module errors on constructs that are valid in Ansible 2.12+.

Likely cause: The installed Ansible version is older than 2.12.

Fix: Upgrade Ansible on your control node. Use pip install --upgrade ansible or follow your distribution's package manager. Verify with ansible --version after upgrade.

pgctl init fails or kubeconfigs not found

Symptom: pgctl commands return errors about missing kubeconfigs or fail to connect to clusters after pgctl init.

Likely cause: Kubeconfig files were not placed at the paths pgctl expects (~/.kube/pgctl/cluster1.kubeconfig, etc.), or pgctl init was not run.

Fix: Run pgctl init and confirm the kubeconfig files exist at the expected paths with ls ~/.kube/pgctl/. Ensure the filenames match the cluster names that pgctl is configured to use.

LINSTOR satellite nodes not registering as Online

Symptom: linstor node list shows satellites in OFFLINE or UNKNOWN state after deployment.

Likely cause (1): Satellite DaemonSets are not using host networking. Without hostNetwork: true, satellites bind to the pod IP rather than the node IP, which prevents DRBD replication connections.

Fix: Patch the satellite DaemonSet on both primary and DR clusters to add hostNetwork: true and dnsPolicy: ClusterFirstWithHostNet, then restart the satellite pods.

Likely cause (2): The satellite node was not registered in the LINSTOR controller with the correct node IP.

Fix: Re-run the linstor node create command from the quorum cluster using the correct node IP address, then verify connectivity with linstor node list.

Missing registry pull secret causes image pull failures

Symptom: Pods on primary or DR clusters remain in ImagePullBackOff state for LINSTOR or DRBD components.

Likely cause: The drbdio-pull-secret was not created in the linbit-sds namespace, or the credentials are incorrect.

Fix: Create the pull secret in the linbit-sds namespace on both clusters using your drbd.io credentials, then patch the affected Deployments and DaemonSets to reference the secret under imagePullSecrets.

Insufficient RBAC prevents Ansible from creating ClusterRoles

Symptom: Ansible playbook fails with Forbidden errors when attempting to create ClusterRole or ClusterRoleBinding resources.

Likely cause: The kubeconfig used by Ansible does not have cluster-admin or equivalent privileges on the target cluster.

Fix: Use a kubeconfig with cluster-admin permissions for the initial deployment. After deployment, you can restrict ongoing operational access as appropriate for your environment.