Guide

Development Guide

Local development setup and testing

Overview

This guide walks you through setting up a local nabustore development environment, running tests, and—most importantly—understanding the plugin architecture that makes nabustore extensible. nabustore is built around a set of well-defined interfaces for storage backends, erasure coding, cluster management, and GPU transfer; each interface can be enhanced or replaced independently to suit your platform. Whether you are integrating a new NVMe driver, writing a custom erasure coding scheme, or building a GPU-direct transfer plugin, this guide gives you the foundation to do so confidently.

Prerequisites

Before you begin, make sure you have the following:

Go 1.24 or later — nabustore uses generics and language features introduced in Go 1.21; 1.24 is the tested baseline.
Docker 24+ — all build and test commands can run inside a container so that no local Go toolchain is strictly required, but you need Docker to use the containerised workflow.
kubectl — required only if you intend to deploy to a Kubernetes or OpenShift cluster during development.
protoc and the Go gRPC plugins — required only if you modify .proto files. The container-based protoc command shown in the Installation section handles this without a local install.
Git — to clone the repository.
Linux x86-64 host — SPDK and CXL features are Linux-only; core storage and cluster code builds on macOS but is not tested there.
Access to the nabustore source repository:
```
git@github.com:nabustore/nabustore.git
```

Installation

1. Clone the repository

git clone git@github.com:nabustore/nabustore.git
cd nabustore

2. Build all packages

You do not need a local Go installation. The command below runs the build inside an official Go container with the source tree mounted:

docker run --rm -v $(pwd):/build -w /build golang:1.24-alpine go build ./...

To build the two main binaries locally (when Go is installed):

go build -o nabustore ./cmd/aistore
go build -o testclient ./cmd/testclient

3. Run all unit tests

docker run --rm -v $(pwd):/build -w /build golang:1.24-alpine go test ./...

Or, with a local Go toolchain:

go test ./...

4. Build the container image

docker build -t nabustore:dev -f deploy/docker/Dockerfile .

5. Regenerate protobuf code (only after editing `.proto` files)

docker run --rm -v $(pwd):/build -w /build golang:1.24-alpine sh -c \
  'apk add protobuf && go install google.golang.org/protobuf/cmd/protoc-gen-go@latest && \
   go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest && \
   protoc --go_out=. --go_opt=paths=source_relative \
          --go-grpc_out=. --go-grpc_opt=paths=source_relative proto/aistore.proto'

6. Run a single local node for manual testing

./nabustore --node-id=dev1 --listen=:50051 --data-dir=/tmp/nabustore-dev

7. Exercise the node with the test client

./testclient --addr=localhost:50051

Configuration

nabustore's server is configured entirely through command-line flags. The table below covers the flags that matter most during local development; see the deployment guides for Kubernetes-specific overrides.

Flag	Default	Valid values	Effect
`--node-id`	hostname	Any string	Unique identifier for this node in the cluster ring. During local dev, set an explicit value so restarts are deterministic.
`--listen`	`:50051`	`host:port`	gRPC listen address. When running multiple nodes locally (for EC testing), assign distinct ports, e.g. `:50051`–`:50060`.
`--data-dir`	`/data/aistore`	Any writable path	Root directory for blob storage. Use a path under `/tmp` during development to keep test data ephemeral.
`--index-path`	`/data/aistore/index.db`	Any writable path	Path to the BoltDB index file. Must be on the same filesystem as `--data-dir` for performance.
`--backend`	`localfs`	`localfs`, `spdk`	Selects the `BlobBackend` implementation. Use `localfs` locally; `spdk` requires IOMMU/VFIO and is not available in most dev environments.
`--seed-nodes`	(none)	Comma-separated `host:port` list	Addresses of existing cluster nodes to join. Omit for a standalone single-node dev setup.
`--enable-cxl`	`false`	`true`, `false`	Activates the CXL memory tiering layer. Requires CXL hardware or the CXL kernel modules; safe to leave `false` during development.
`--cxl-cache-size`	(none)	Bytes, e.g. `8589934592`	Size of the CXL-backed blob cache (8 GiB in the example). Only effective when `--enable-cxl=true`.

Virtual nodes (VNodesPerNode in the server Config struct, default 150) control how evenly the consistent-hash ring distributes data. Lower values speed up ring operations in unit tests at the cost of less uniform distribution.

Usage

Running a local multi-node simulation

For testing erasure coding (EC 4+2 or EC 8+2) you need at least six or ten nodes respectively. nabustore ships a simulator that spins up an in-process cluster without requiring separate processes or ports:

go run ./sim/cmd/runsim --nodes=10 --clients=4 --requests=100 --verbose

The simulator exercises blob puts, gets, and fault injection through the same interfaces that the real server uses, making it the fastest way to validate plugin changes.

Sending blob operations manually

With a node running on :50051, use testclient to exercise each operation:

# Default: 1 MB blob with replica3 policy
./testclient --addr=localhost:50051

# Smaller blob with EC 4+2 policy
./testclient --addr=localhost:50051 --policy=ec42 --size=65536

testclient performs a full round-trip: Put → Stat → Get → data integrity check → List → Delete → verify deletion.

Running targeted package tests

During plugin development you will often want to test a single package:

# Test only the blob backends
go test ./blob/...

# Test the consistent-hash ring
go test ./ring/...

# Test erasure coding
go test ./ec/...

# Test with race detector enabled
go test -race ./...

Checking CXL system availability

go run ./cmd/cxlinfo

This prints whether the kernel CXL modules are loaded and whether compatible hardware is present—useful before enabling --enable-cxl on a new host.

Examples

Example 1 — Quick smoke test (in-process simulator)

go run ./sim/cmd/runsim --quick

Expected output (abbreviated):

Running quick smoke test...
[sim] cluster ready: 4 nodes
[sim] workload complete: 400 requests
Simulation completed successfully!

Example 2 — Stress test with fault injection

go run ./sim/cmd/runsim --stress

Expected output (abbreviated):

Running stress test with fault injection...
[sim] fault injected: node-2 offline
[sim] node-2 recovered after 30s
[sim] all blobs verified intact
Simulation completed successfully!

Example 3 — Full client round-trip against a live node

Start a node in one terminal:

./nabustore --node-id=dev1 --listen=:50051 --data-dir=/tmp/ns-dev

In a second terminal:

./testclient --addr=localhost:50051 --policy=replica3 --size=1048576

Expected output:

Connecting to AIStore at localhost:50051...
Connected!

=== Test 1: Put blob (policy=replica3, size=1048576) ===
Put succeeded: size=1048576 bytes
Blob ID: a3f9c1...

=== Test 2: Stat blob ===
Stat succeeded: exists=true
  Size: 1048576

=== Test 3: Get blob ===
Get succeeded: retrieved 1048576 bytes
Data verified!

=== Test 4: List blobs ===
List succeeded: 1 blobs, hasMore=false
  1. a3f9c1...

=== Test 5: Delete blob ===
Delete succeeded: deleted=true

=== Test 6: Verify deletion ===
Blob no longer exists (expected)

✓ All tests passed!

Example 4 — Writing and registering a minimal custom `BlobBackend`

Create blob/memory.go:

package blob

import (
    "bytes"
    "context"
    "fmt"
    "io"
    "sync"
)

// MemoryBackend is an in-process BlobBackend for testing.
type MemoryBackend struct {
    mu    sync.RWMutex
    store map[string][]byte
    meta  map[string]BlobMeta
}

func NewMemoryBackend() *MemoryBackend {
    return &MemoryBackend{
        store: make(map[string][]byte),
        meta:  make(map[string]BlobMeta),
    }
}

func (m *MemoryBackend) Put(_ context.Context, id BlobID, r io.Reader, meta BlobMeta) error {
    data, err := io.ReadAll(r)
    if err != nil {
        return err
    }
    m.mu.Lock()
    defer m.mu.Unlock()
    key := fmt.Sprintf("%x", id)
    m.store[key] = data
    m.meta[key] = meta
    return nil
}

func (m *MemoryBackend) Get(_ context.Context, id BlobID) (io.ReadCloser, *BlobMeta, error) {
    m.mu.RLock()
    defer m.mu.RUnlock()
    key := fmt.Sprintf("%x", id)
    data, ok := m.store[key]
    if !ok {
        return nil, nil, fmt.Errorf("blob not found")
    }
    cp := make([]byte, len(data))
    copy(cp, data)
    meta := m.meta[key]
    return io.NopCloser(bytes.NewReader(cp)), &meta, nil
}

// ... implement GetRange, Delete, Stat, List similarly

Select it at startup by passing the backend name through the Config.Backend field in server startup code, or register it under a new flag value in cmd/aistore/main.go.

Troubleshooting

`go build ./...` fails with "module not found"

Symptom: Build error referencing github.com/nabustore/nabustore.git not found.

Cause: The Go module path in go.mod is github.com/nabustore/nabustore; if you forked the repo under a different path, import paths break.

Fix: Do not rename the module path in go.mod during development. If you must fork, update all import paths with sed -i 's|github.com/nabustore/nabustore|github.com/yourorg/nabustore|g' $(grep -rl 'nabustore/nabustore' --include='*.go' .)

`testclient` reports `Failed to connect: context deadline exceeded`

Symptom: ./testclient --addr=localhost:50051 times out immediately.

Cause: Either the nabustore server is not running, or it bound to a different address.

Fix: Confirm the server is up with ss -tlnp | grep 50051. If the process is running but the port is absent, check the --listen flag value used when starting the server.

`go test ./...` reports data races

Symptom: Running go test -race ./... surfaces concurrent map read/write warnings.

Cause: A new plugin or backend implementation accesses shared state without holding the appropriate mutex.

Fix: All core types (Ring, Index, LocalFSBackend) use sync.RWMutex. Follow the same pattern: acquire a read lock (RLock/RUnlock) for Get/Stat/List, and a write lock (Lock/Unlock) for Put/Delete. Use t.Helper() and table-driven tests so race-detected failures point to the correct line.

`--enable-cxl` panics on startup

Symptom: Server exits immediately with a panic or error about CXL device enumeration.

Cause: The CXL kernel modules are not loaded or no CXL hardware is present. Running go run ./cmd/cxlinfo confirms this.

Fix: Leave --enable-cxl=false (the default) during local development. CXL tiering is safe to disable; the server falls back to DRAM and NVMe automatically.

`--backend=spdk` fails with "VFIO not available"

Symptom: Server exits with an error about IOMMU or VFIO when --backend=spdk is set.

Cause: SPDK requires user-space NVMe access via VFIO, which in turn requires IOMMU to be enabled in the BIOS and kernel. Most developer workstations and VMs do not satisfy this.

Fix: Use --backend=localfs for all local development and testing. Switch to spdk only on baremetal nodes with IOMMU enabled.

Protobuf-generated files are out of date

Symptom: Compilation errors like undefined: pb.SomeNewField after pulling upstream changes.

Cause: .proto files changed but the generated *.pb.go files were not regenerated.

Fix: Re-run the protoc command from the Installation section. Always commit generated files alongside .proto changes.