Configuration
Environment variables, config files, feature flags
This page explains how to configure Nabu Store (AIStore) nodes for production and development deployments. Configuration is applied through command-line flags passed to the aistore binary, environment variables injected via Kubernetes manifests, or Helm chart values that translate into container arguments. Understanding these settings helps you tune storage backends, cluster topology, erasure coding, CXL memory tiering, and observability to match your infrastructure.
Before you configure a Nabu Store deployment, ensure you have:
- A Kubernetes cluster (1.25 or later recommended) with
kubectlaccess, or a host where you can run theaistorebinary directly - Helm 3.x if you are using the Helm chart deployment path
- Sufficient persistent storage provisioned per node (default request: 100 GiB per node)
- Root or
CAP_SYS_ADMINprivileges on any host where you intend to enable the SPDK NVMe backend or CXL memory tiering - SPDK v24.09 installed and NVMe devices bound if you plan to use the SPDK backend (see the SPDK setup guide)
- A running seed node reachable over TCP before joining additional nodes to a cluster
Nabu Store configuration takes effect at startup. There is no separate configuration file to install — settings are passed as flags to the binary or expressed in the Helm values.yaml.
Option A — Binary flags (single-node or bare-metal)
-
Download or build the
aistorebinary and place it on yourPATH. -
Start the node with the minimum required flags:
aistore \
-node-id node1 \
-listen :50051 \
-data-dir /data/aistore/node1 \
-backend localfs
- For a multi-node cluster, pass
-seed-nodeson every node except the first:
aistore \
-node-id node2 \
-listen :50052 \
-data-dir /data/aistore/node2 \
-backend localfs \
-seed-nodes localhost:50051
Option B — Helm chart (Kubernetes)
- Add the Trilio Helm repository and update:
helm repo add trilio https://charts.trilio.io
helm repo update
-
Create a
custom-values.yamlfile with your overrides (see the Configuration section for all available keys). -
Install the chart:
helm install nabu-store trilio/aistore \
--namespace aistore \
--create-namespace \
-f custom-values.yaml
- Verify that pods reach
Runningstate:
kubectl get pods -n aistore
Option C — Raw Kubernetes manifests
- Apply the provided manifest directly:
kubectl apply -f deploy/k8s/aistore-cluster.yaml
- Watch the pods become ready:
kubectl get pods -n aistore -w
All configuration knobs map to flags accepted by the aistore binary. When deploying with Helm, these flags are set through values.yaml fields that the chart renders into container args. When deploying via raw Kubernetes manifests, you edit the args array in the pod spec directly.
Node identity and networking
| Flag | Helm key | Default | Description |
|---|---|---|---|
-node-id | — | hostname | Unique identifier for this node within the cluster. Defaults to the system hostname when omitted. |
-listen | node.port | :50051 | TCP address the node listens on for gRPC traffic. Change when running multiple nodes on the same host (:50052, :50053, …). |
-seed-nodes | — | "" (empty) | Comma-separated list of host:port addresses for nodes already in the cluster. The first node bootstrapping a new cluster should leave this empty. |
Consistent hashing
| Flag | Helm key | Default | Description |
|---|---|---|---|
-vnodes | node.vnodes | 150 | Number of virtual nodes each physical node occupies on the consistent hash ring. Higher values produce more even distribution but increase ring metadata size. 150 is the recommended default for production. |
Storage backend
| Flag | Default | Description |
|---|---|---|
-backend | auto-detected | Storage backend to use. Valid values: localfs (local filesystem, always available), spdk (SPDK blobstore via RPC). Omit this flag to allow hardware auto-detection to choose the optimal backend. |
-data-dir | /data/aistore | Filesystem path where the node stores blobs and its index database (index.db). Must be writable by the process. In Kubernetes deployments this path is backed by a PersistentVolumeClaim. |
-spdk-socket | /var/tmp/spdk.sock | Path to the SPDK JSON-RPC Unix socket. Only relevant when the SPDK backend is active. |
-spdk-bdev | "" (auto) | Name of the SPDK bdev to use for blobstore. Leave empty to let the SPDK backend enumerate available bdevs automatically. |
Helm — persistent storage: Set
persistence.storageClasstolocal-pathfor development ornvme-ssdfor production. Setpersistence.sizeto control the PVC size per node (default:100Gi).
CXL memory tiering
| Flag | Default | Description |
|---|---|---|
-enable-cxl | false | Enable CXL memory tiering for the metadata index. The node checks for DAX-capable devices at startup; if none are found the flag is silently ignored. |
-cxl-cache-size | 4294967296 (4 GiB) | Maximum bytes of CXL-attached memory to use as a cache tier for index data. |
-cxl-numa | -1 (auto) | Preferred NUMA node for CXL allocations. -1 lets the kernel choose automatically. |
Cluster health and timing
| Flag | Helm key | Default | Description |
|---|---|---|---|
-heartbeat | node.heartbeatInterval | 5s | How often each node broadcasts a heartbeat to peers. Lower values detect failures faster but increase network traffic. |
-shutdown-timeout | node.shutdownTimeout | 30s | Maximum time the node waits for in-flight requests to complete before forcing shutdown. Increase for large-object workloads. |
Metrics
| Flag | Helm key | Default | Description |
|---|---|---|---|
-metrics-addr | metrics.port | :9090 | HTTP address where Prometheus metrics are exposed. Set to an empty string to disable the metrics endpoint. |
Helm-only settings
The following settings are only available through the Helm chart and control Kubernetes-level behavior rather than the aistore binary itself.
| Key | Default | Description |
|---|---|---|
replicaCount | 3 | Number of storage node pods in the StatefulSet. |
resources.limits.cpu | 4 | Maximum CPU cores per pod. |
resources.limits.memory | 8Gi | Maximum memory per pod. |
resources.requests.cpu | 1 | Guaranteed CPU cores per pod. |
resources.requests.memory | 2Gi | Guaranteed memory per pod. |
podDisruptionBudget.minAvailable | 2 | Minimum number of pods that must remain available during voluntary disruptions. |
podSecurityContext.runAsUser | 1000 | UID used to run the container process. |
securityContext.readOnlyRootFilesystem | true | Mounts the container root filesystem read-only. Your data directory must be on a separate writable volume. |
extraEnv | [] | Additional environment variables to inject into each pod. Use this to pass secrets or toggle feature flags without rebuilding the image. |
Starting a single-node cluster
For local development or testing, start a single node with the localfs backend:
aistore \
-node-id dev-node \
-listen :50051 \
-data-dir /tmp/aistore-dev \
-backend localfs \
-metrics-addr :9090
The node is ready when you see the startup banner in stdout and the gRPC port becomes reachable.
Joining a node to an existing cluster
Every node after the first must point to at least one already-running seed node:
aistore \
-node-id node2 \
-listen :50052 \
-data-dir /data/aistore/node2 \
-seed-nodes 192.168.1.10:50051
You can supply multiple seed addresses separated by commas:
-seed-nodes 192.168.1.10:50051,192.168.1.11:50052
Overriding auto-detected backend
By default, the node auto-detects the best available storage backend. To force a specific backend — for example when you know SPDK is available but want to test with the local filesystem — pass the -backend flag explicitly:
aistore -backend localfs -data-dir /data/aistore
To use the SPDK NVMe backend, ensure the SPDK target is running and the socket is accessible, then start the node:
aistore \
-backend spdk \
-spdk-socket /var/tmp/spdk.sock \
-spdk-bdev Nvme0n1
Enabling CXL memory tiering
CXL tiering accelerates metadata index lookups by keeping hot index data in CXL-attached memory. Enable it alongside your normal startup flags:
aistore \
-node-id node1 \
-listen :50051 \
-data-dir /data/aistore/node1 \
-enable-cxl \
-cxl-cache-size 8589934592 \
-cxl-numa 1
If the host does not have a DAX-capable CXL device, the flag is silently ignored and the node falls back to the mmap-based index.
Customizing Helm values for production
Create a production-values.yaml that overrides the defaults most relevant to production:
replicaCount: 5
node:
vnodes: 150
heartbeatInterval: 5s
shutdownTimeout: 60s
persistence:
storageClass: nvme-ssd
size: 500Gi
resources:
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 4
memory: 8Gi
podDisruptionBudget:
enabled: true
minAvailable: 3
metrics:
enabled: true
port: 9090
Apply it:
helm upgrade --install nabu-store trilio/aistore \
--namespace aistore \
-f production-values.yaml
Injecting extra environment variables via Helm
Use extraEnv to pass secrets or toggle runtime behavior without changing the image:
extraEnv:
- name: NABU_LOG_LEVEL
value: "debug"
- name: NABU_API_TOKEN
valueFrom:
secretKeyRef:
name: nabu-api-secret
key: token
Example 1 — Three-node cluster on bare metal
Start three nodes on ports 50051–50053, with node1 as the bootstrap seed:
# Terminal 1 — bootstrap node
aistore -node-id node1 -listen :50051 -data-dir /data/aistore/node1 -backend localfs
# Terminal 2
aistore -node-id node2 -listen :50052 -data-dir /data/aistore/node2 -backend localfs \
-seed-nodes localhost:50051
# Terminal 3
aistore -node-id node3 -listen :50053 -data-dir /data/aistore/node3 -backend localfs \
-seed-nodes localhost:50051
Expected output on node2 startup (abridged):
╔═══════════════════════════════════════════════════════════╗
║ AIStore Node ║
║ Kubernetes-Native AI Inference Storage ║
╚═══════════════════════════════════════════════════════════╝
Node ID: node2
Listen: :50052
Backend: localfs (auto-detected)
Index: mmap (auto-detected)
EC Engine: reedsolomon (auto-detected)
Transport: tcp (auto-detected)
Data Dir: /data/aistore/node2
VNodes/Node: 150
Seed Nodes: [localhost:50051]
Example 2 — Helm deployment with custom storage class
# custom-values.yaml
replicaCount: 3
persistence:
storageClass: nvme-ssd
size: 200Gi
resources:
limits:
cpu: 4
memory: 8Gi
requests:
cpu: 1
memory: 2Gi
metrics:
enabled: true
port: 9090
helm install nabu-store trilio/aistore \
--namespace aistore \
--create-namespace \
-f custom-values.yaml
Expected output:
NAME: nabu-store
LAST DEPLOYED: ...
NAMESPACE: aistore
STATUS: deployed
REVISION: 1
kubectl get pods -n aistore
# NAME READY STATUS RESTARTS AGE
# nabu-store-0 1/1 Running 0 45s
# nabu-store-1 1/1 Running 0 30s
# nabu-store-2 1/1 Running 0 15s
Example 3 — Node with SPDK backend and metrics
# After running setup-spdk.sh bind
aistore \
-node-id nvme-node1 \
-listen :50051 \
-data-dir /data/aistore/node1 \
-backend spdk \
-spdk-socket /var/tmp/spdk.sock \
-spdk-bdev Nvme0n1 \
-metrics-addr :9090
Expected banner line:
Backend: spdk (auto-detected)
Metrics: :9090
Example 4 — Ten-node cluster for EC 8+2 testing (Kubernetes)
Apply the provided 10-node manifest. Each node binds to a dedicated host port (50051–50060) and exposes a metrics port (9081–9090):
kubectl apply -f deploy/k8s/aistore-cluster-10node.yaml
kubectl get pods -n aistore -l app=aistore
Expected output:
NAME READY STATUS RESTARTS AGE
aistore-node1 1/1 Running 0 2m
aistore-node2 1/1 Running 0 2m
...
aistore-node10 1/1 Running 0 2m
Example 5 — Node with CXL tiering and custom heartbeat
aistore \
-node-id cxl-node1 \
-listen :50051 \
-data-dir /data/aistore \
-enable-cxl \
-cxl-cache-size 4294967296 \
-cxl-numa 0 \
-heartbeat 3s \
-shutdown-timeout 60s
Expected banner line (when CXL device is present):
CXL: enabled (cache: 4096 MiB)
Node fails to start: Error opening backend
Symptom: The process exits immediately with:
Error opening backend "spdk": ...
Registered: [localfs spdk]
Likely cause: The SPDK target is not running, or the socket path does not exist at -spdk-socket.
Fix:
- Verify the SPDK target is running:
ls /var/tmp/spdk.sock - If the socket is missing, start the SPDK target:
sudo ./deploy/spdk/setup-spdk.sh start - If you do not need SPDK, remove
-backend spdkand let the node auto-detect, or pass-backend localfsexplicitly.
Pod stuck in Pending: PVC not bound
Symptom: kubectl get pods -n aistore shows Pending and kubectl describe pod <name> mentions no persistent volumes available.
Likely cause: The persistence.storageClass specified in values.yaml does not exist in the cluster, or no storage provisioner is running.
Fix:
- List available storage classes:
kubectl get storageclass - Update
persistence.storageClassin yourvalues.yamlto match an available class (uselocal-pathfor development). - Re-run
helm upgrade.
Second node never joins the cluster
Symptom: Node2 starts without errors but the cluster still shows only node1.
Likely cause: The -seed-nodes value is unreachable — wrong hostname, port, or a firewall rule blocking gRPC (TCP) traffic.
Fix:
- From node2's host, confirm connectivity:
nc -z <seed-host> 50051 - Check that node1's
-listenaddress is0.0.0.0:50051or the specific interface node2 can reach, not127.0.0.1:50051. - Ensure no network policy or firewall blocks the port.
CXL tiering silently disabled
Symptom: You passed -enable-cxl but the startup banner does not print the CXL: enabled line.
Likely cause: The host has no DAX-capable CXL device, so the cxlmem index plugin falls back to mmap automatically.
Fix:
- Verify DAX device presence:
ls /dev/dax* - Check kernel CXL support:
dmesg | grep -i cxl - If no DAX device is available, the
mmapindex is used and no action is required — the node operates normally.
Metrics endpoint not reachable
Symptom: curl http://<node-ip>:9090/metrics returns connection refused.
Likely cause: Either -metrics-addr was set to an empty string (disabling metrics), or the port is not exposed by the Kubernetes Service.
Fix:
- Confirm the flag value in the pod spec:
kubectl describe pod <name> -n aistore | grep metrics-addr - Ensure
metrics.enabled: trueandmetrics.port: 9090are set in your Helm values. - If using raw manifests, add the metrics port to the container
portslist (as shown inaistore-cluster-10node.yaml).
Pod evicted due to memory limit
Symptom: Pods are OOMKilled or evicted. kubectl describe pod shows OOMKilled.
Likely cause: resources.limits.memory in values.yaml is too low for the workload, especially when CXL cache size is large.
Fix:
- Increase
resources.limits.memory(default:8Gi) andresources.requests.memoryinvalues.yaml. - If CXL is enabled, ensure
cxl-cache-sizein bytes does not exceed the container memory limit. - Apply with
helm upgrade.
