NabuStore
Guide

Configuration

Environment variables, config files, feature flags


Overview

This page explains how to configure Nabu Store (AIStore) nodes for production and development deployments. Configuration is applied through command-line flags passed to the aistore binary, environment variables injected via Kubernetes manifests, or Helm chart values that translate into container arguments. Understanding these settings helps you tune storage backends, cluster topology, erasure coding, CXL memory tiering, and observability to match your infrastructure.


Prerequisites

Before you configure a Nabu Store deployment, ensure you have:

  • A Kubernetes cluster (1.25 or later recommended) with kubectl access, or a host where you can run the aistore binary directly
  • Helm 3.x if you are using the Helm chart deployment path
  • Sufficient persistent storage provisioned per node (default request: 100 GiB per node)
  • Root or CAP_SYS_ADMIN privileges on any host where you intend to enable the SPDK NVMe backend or CXL memory tiering
  • SPDK v24.09 installed and NVMe devices bound if you plan to use the SPDK backend (see the SPDK setup guide)
  • A running seed node reachable over TCP before joining additional nodes to a cluster

Installation

Nabu Store configuration takes effect at startup. There is no separate configuration file to install — settings are passed as flags to the binary or expressed in the Helm values.yaml.

Option A — Binary flags (single-node or bare-metal)

  1. Download or build the aistore binary and place it on your PATH.

  2. Start the node with the minimum required flags:

aistore \
  -node-id node1 \
  -listen :50051 \
  -data-dir /data/aistore/node1 \
  -backend localfs
  1. For a multi-node cluster, pass -seed-nodes on every node except the first:
aistore \
  -node-id node2 \
  -listen :50052 \
  -data-dir /data/aistore/node2 \
  -backend localfs \
  -seed-nodes localhost:50051

Option B — Helm chart (Kubernetes)

  1. Add the Trilio Helm repository and update:
helm repo add trilio https://charts.trilio.io
helm repo update
  1. Create a custom-values.yaml file with your overrides (see the Configuration section for all available keys).

  2. Install the chart:

helm install nabu-store trilio/aistore \
  --namespace aistore \
  --create-namespace \
  -f custom-values.yaml
  1. Verify that pods reach Running state:
kubectl get pods -n aistore

Option C — Raw Kubernetes manifests

  1. Apply the provided manifest directly:
kubectl apply -f deploy/k8s/aistore-cluster.yaml
  1. Watch the pods become ready:
kubectl get pods -n aistore -w

Configuration

All configuration knobs map to flags accepted by the aistore binary. When deploying with Helm, these flags are set through values.yaml fields that the chart renders into container args. When deploying via raw Kubernetes manifests, you edit the args array in the pod spec directly.


Node identity and networking

FlagHelm keyDefaultDescription
-node-idhostnameUnique identifier for this node within the cluster. Defaults to the system hostname when omitted.
-listennode.port:50051TCP address the node listens on for gRPC traffic. Change when running multiple nodes on the same host (:50052, :50053, …).
-seed-nodes"" (empty)Comma-separated list of host:port addresses for nodes already in the cluster. The first node bootstrapping a new cluster should leave this empty.

Consistent hashing

FlagHelm keyDefaultDescription
-vnodesnode.vnodes150Number of virtual nodes each physical node occupies on the consistent hash ring. Higher values produce more even distribution but increase ring metadata size. 150 is the recommended default for production.

Storage backend

FlagDefaultDescription
-backendauto-detectedStorage backend to use. Valid values: localfs (local filesystem, always available), spdk (SPDK blobstore via RPC). Omit this flag to allow hardware auto-detection to choose the optimal backend.
-data-dir/data/aistoreFilesystem path where the node stores blobs and its index database (index.db). Must be writable by the process. In Kubernetes deployments this path is backed by a PersistentVolumeClaim.
-spdk-socket/var/tmp/spdk.sockPath to the SPDK JSON-RPC Unix socket. Only relevant when the SPDK backend is active.
-spdk-bdev"" (auto)Name of the SPDK bdev to use for blobstore. Leave empty to let the SPDK backend enumerate available bdevs automatically.

Helm — persistent storage: Set persistence.storageClass to local-path for development or nvme-ssd for production. Set persistence.size to control the PVC size per node (default: 100Gi).


CXL memory tiering

FlagDefaultDescription
-enable-cxlfalseEnable CXL memory tiering for the metadata index. The node checks for DAX-capable devices at startup; if none are found the flag is silently ignored.
-cxl-cache-size4294967296 (4 GiB)Maximum bytes of CXL-attached memory to use as a cache tier for index data.
-cxl-numa-1 (auto)Preferred NUMA node for CXL allocations. -1 lets the kernel choose automatically.

Cluster health and timing

FlagHelm keyDefaultDescription
-heartbeatnode.heartbeatInterval5sHow often each node broadcasts a heartbeat to peers. Lower values detect failures faster but increase network traffic.
-shutdown-timeoutnode.shutdownTimeout30sMaximum time the node waits for in-flight requests to complete before forcing shutdown. Increase for large-object workloads.

Metrics

FlagHelm keyDefaultDescription
-metrics-addrmetrics.port:9090HTTP address where Prometheus metrics are exposed. Set to an empty string to disable the metrics endpoint.

Helm-only settings

The following settings are only available through the Helm chart and control Kubernetes-level behavior rather than the aistore binary itself.

KeyDefaultDescription
replicaCount3Number of storage node pods in the StatefulSet.
resources.limits.cpu4Maximum CPU cores per pod.
resources.limits.memory8GiMaximum memory per pod.
resources.requests.cpu1Guaranteed CPU cores per pod.
resources.requests.memory2GiGuaranteed memory per pod.
podDisruptionBudget.minAvailable2Minimum number of pods that must remain available during voluntary disruptions.
podSecurityContext.runAsUser1000UID used to run the container process.
securityContext.readOnlyRootFilesystemtrueMounts the container root filesystem read-only. Your data directory must be on a separate writable volume.
extraEnv[]Additional environment variables to inject into each pod. Use this to pass secrets or toggle feature flags without rebuilding the image.

Usage

Starting a single-node cluster

For local development or testing, start a single node with the localfs backend:

aistore \
  -node-id dev-node \
  -listen :50051 \
  -data-dir /tmp/aistore-dev \
  -backend localfs \
  -metrics-addr :9090

The node is ready when you see the startup banner in stdout and the gRPC port becomes reachable.


Joining a node to an existing cluster

Every node after the first must point to at least one already-running seed node:

aistore \
  -node-id node2 \
  -listen :50052 \
  -data-dir /data/aistore/node2 \
  -seed-nodes 192.168.1.10:50051

You can supply multiple seed addresses separated by commas:

-seed-nodes 192.168.1.10:50051,192.168.1.11:50052

Overriding auto-detected backend

By default, the node auto-detects the best available storage backend. To force a specific backend — for example when you know SPDK is available but want to test with the local filesystem — pass the -backend flag explicitly:

aistore -backend localfs -data-dir /data/aistore

To use the SPDK NVMe backend, ensure the SPDK target is running and the socket is accessible, then start the node:

aistore \
  -backend spdk \
  -spdk-socket /var/tmp/spdk.sock \
  -spdk-bdev Nvme0n1

Enabling CXL memory tiering

CXL tiering accelerates metadata index lookups by keeping hot index data in CXL-attached memory. Enable it alongside your normal startup flags:

aistore \
  -node-id node1 \
  -listen :50051 \
  -data-dir /data/aistore/node1 \
  -enable-cxl \
  -cxl-cache-size 8589934592 \
  -cxl-numa 1

If the host does not have a DAX-capable CXL device, the flag is silently ignored and the node falls back to the mmap-based index.


Customizing Helm values for production

Create a production-values.yaml that overrides the defaults most relevant to production:

replicaCount: 5

node:
  vnodes: 150
  heartbeatInterval: 5s
  shutdownTimeout: 60s

persistence:
  storageClass: nvme-ssd
  size: 500Gi

resources:
  limits:
    cpu: 8
    memory: 16Gi
  requests:
    cpu: 4
    memory: 8Gi

podDisruptionBudget:
  enabled: true
  minAvailable: 3

metrics:
  enabled: true
  port: 9090

Apply it:

helm upgrade --install nabu-store trilio/aistore \
  --namespace aistore \
  -f production-values.yaml

Injecting extra environment variables via Helm

Use extraEnv to pass secrets or toggle runtime behavior without changing the image:

extraEnv:
  - name: NABU_LOG_LEVEL
    value: "debug"
  - name: NABU_API_TOKEN
    valueFrom:
      secretKeyRef:
        name: nabu-api-secret
        key: token

Examples

Example 1 — Three-node cluster on bare metal

Start three nodes on ports 50051–50053, with node1 as the bootstrap seed:

# Terminal 1 — bootstrap node
aistore -node-id node1 -listen :50051 -data-dir /data/aistore/node1 -backend localfs

# Terminal 2
aistore -node-id node2 -listen :50052 -data-dir /data/aistore/node2 -backend localfs \
  -seed-nodes localhost:50051

# Terminal 3
aistore -node-id node3 -listen :50053 -data-dir /data/aistore/node3 -backend localfs \
  -seed-nodes localhost:50051

Expected output on node2 startup (abridged):

╔═══════════════════════════════════════════════════════════╗
║                    AIStore Node                           ║
║           Kubernetes-Native AI Inference Storage          ║
╚═══════════════════════════════════════════════════════════╝
Node ID:      node2
Listen:       :50052
Backend:      localfs (auto-detected)
Index:        mmap (auto-detected)
EC Engine:    reedsolomon (auto-detected)
Transport:    tcp (auto-detected)
Data Dir:     /data/aistore/node2
VNodes/Node:  150
Seed Nodes:   [localhost:50051]

Example 2 — Helm deployment with custom storage class

# custom-values.yaml
replicaCount: 3

persistence:
  storageClass: nvme-ssd
  size: 200Gi

resources:
  limits:
    cpu: 4
    memory: 8Gi
  requests:
    cpu: 1
    memory: 2Gi

metrics:
  enabled: true
  port: 9090
helm install nabu-store trilio/aistore \
  --namespace aistore \
  --create-namespace \
  -f custom-values.yaml

Expected output:

NAME: nabu-store
LAST DEPLOYED: ...
NAMESPACE: aistore
STATUS: deployed
REVISION: 1
kubectl get pods -n aistore
# NAME                         READY   STATUS    RESTARTS   AGE
# nabu-store-0                 1/1     Running   0          45s
# nabu-store-1                 1/1     Running   0          30s
# nabu-store-2                 1/1     Running   0          15s

Example 3 — Node with SPDK backend and metrics

# After running setup-spdk.sh bind
aistore \
  -node-id nvme-node1 \
  -listen :50051 \
  -data-dir /data/aistore/node1 \
  -backend spdk \
  -spdk-socket /var/tmp/spdk.sock \
  -spdk-bdev Nvme0n1 \
  -metrics-addr :9090

Expected banner line:

Backend:      spdk (auto-detected)
Metrics:      :9090

Example 4 — Ten-node cluster for EC 8+2 testing (Kubernetes)

Apply the provided 10-node manifest. Each node binds to a dedicated host port (50051–50060) and exposes a metrics port (9081–9090):

kubectl apply -f deploy/k8s/aistore-cluster-10node.yaml
kubectl get pods -n aistore -l app=aistore

Expected output:

NAME              READY   STATUS    RESTARTS   AGE
aistore-node1     1/1     Running   0          2m
aistore-node2     1/1     Running   0          2m
...
aistore-node10    1/1     Running   0          2m

Example 5 — Node with CXL tiering and custom heartbeat

aistore \
  -node-id cxl-node1 \
  -listen :50051 \
  -data-dir /data/aistore \
  -enable-cxl \
  -cxl-cache-size 4294967296 \
  -cxl-numa 0 \
  -heartbeat 3s \
  -shutdown-timeout 60s

Expected banner line (when CXL device is present):

CXL:          enabled (cache: 4096 MiB)

Troubleshooting

Node fails to start: Error opening backend

Symptom: The process exits immediately with:

Error opening backend "spdk": ...
Registered: [localfs spdk]

Likely cause: The SPDK target is not running, or the socket path does not exist at -spdk-socket.

Fix:

  1. Verify the SPDK target is running: ls /var/tmp/spdk.sock
  2. If the socket is missing, start the SPDK target: sudo ./deploy/spdk/setup-spdk.sh start
  3. If you do not need SPDK, remove -backend spdk and let the node auto-detect, or pass -backend localfs explicitly.

Pod stuck in Pending: PVC not bound

Symptom: kubectl get pods -n aistore shows Pending and kubectl describe pod <name> mentions no persistent volumes available.

Likely cause: The persistence.storageClass specified in values.yaml does not exist in the cluster, or no storage provisioner is running.

Fix:

  1. List available storage classes: kubectl get storageclass
  2. Update persistence.storageClass in your values.yaml to match an available class (use local-path for development).
  3. Re-run helm upgrade.

Second node never joins the cluster

Symptom: Node2 starts without errors but the cluster still shows only node1.

Likely cause: The -seed-nodes value is unreachable — wrong hostname, port, or a firewall rule blocking gRPC (TCP) traffic.

Fix:

  1. From node2's host, confirm connectivity: nc -z <seed-host> 50051
  2. Check that node1's -listen address is 0.0.0.0:50051 or the specific interface node2 can reach, not 127.0.0.1:50051.
  3. Ensure no network policy or firewall blocks the port.

CXL tiering silently disabled

Symptom: You passed -enable-cxl but the startup banner does not print the CXL: enabled line.

Likely cause: The host has no DAX-capable CXL device, so the cxlmem index plugin falls back to mmap automatically.

Fix:

  1. Verify DAX device presence: ls /dev/dax*
  2. Check kernel CXL support: dmesg | grep -i cxl
  3. If no DAX device is available, the mmap index is used and no action is required — the node operates normally.

Metrics endpoint not reachable

Symptom: curl http://<node-ip>:9090/metrics returns connection refused.

Likely cause: Either -metrics-addr was set to an empty string (disabling metrics), or the port is not exposed by the Kubernetes Service.

Fix:

  1. Confirm the flag value in the pod spec: kubectl describe pod <name> -n aistore | grep metrics-addr
  2. Ensure metrics.enabled: true and metrics.port: 9090 are set in your Helm values.
  3. If using raw manifests, add the metrics port to the container ports list (as shown in aistore-cluster-10node.yaml).

Pod evicted due to memory limit

Symptom: Pods are OOMKilled or evicted. kubectl describe pod shows OOMKilled.

Likely cause: resources.limits.memory in values.yaml is too low for the workload, especially when CXL cache size is large.

Fix:

  1. Increase resources.limits.memory (default: 8Gi) and resources.requests.memory in values.yaml.
  2. If CXL is enabled, ensure cxl-cache-size in bytes does not exceed the container memory limit.
  3. Apply with helm upgrade.