Guide

Observability

Logging, metrics, tracing, health checks

Overview

This guide explains how to monitor, debug, and maintain your integration with the athlete-nutrition-ai API. You'll learn how to interpret structured logs, query health check endpoints, understand the metrics surfaced by the API, and trace requests end-to-end across the core workflows — from athlete profile setup through meal plan generation and shopping list delivery. Effective observability lets you detect anomalies early, reduce mean time to resolution, and give your users a reliable experience.

Prerequisites

Before working through this guide, make sure you have:

An active athlete-nutrition-ai developer account with a valid API key (see the Authentication guide)
A subscription tier that includes API access — observability endpoints are available on Standard and above
curl 7.68+ or any HTTP client capable of sending bearer-token requests
(Optional but recommended) A log aggregation tool such as Datadog, Grafana Loki, or the Elastic Stack to collect and query structured JSON logs
(Optional) An OpenTelemetry-compatible tracing backend (e.g., Jaeger, Zipkin, or Honeycomb) if you intend to use distributed tracing
Familiarity with HTTP status codes and JSON response formats

Installation

The observability features are built into the athlete-nutrition-ai API — there is nothing to install on the API side. You configure your own tooling to consume the signals the API exposes.

Step 1 — Verify API connectivity

curl -i https://api.athlete-nutrition-ai.com/v1/health

A 200 OK response confirms the API is reachable and healthy before you configure any downstream tooling.

Step 2 — Install the OpenTelemetry Collector (optional, for tracing)

If you want to collect distributed traces, deploy the OpenTelemetry Collector in your infrastructure:

# Using Docker
docker pull otel/opentelemetry-collector-contrib:latest

docker run -d \
  --name otel-collector \
  -p 4317:4317 \
  -p 4318:4318 \
  -v $(pwd)/otel-config.yaml:/etc/otelcol/config.yaml \
  otel/opentelemetry-collector-contrib:latest

Step 3 — Create a minimal collector configuration file

Save the following as otel-config.yaml in your working directory:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

exporters:
  logging:
    loglevel: debug
  # Replace with your preferred backend, e.g. jaeger, zipkin, otlp/http

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [logging]

Step 4 — Point your application to the collector

Set the following environment variables in your application so outbound trace context is forwarded:

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_SERVICE_NAME=my-nutrition-app
export OTEL_TRACES_EXPORTER=otlp

Step 5 — Confirm log forwarding

Ensure your log shipper (Fluentd, Logstash, Vector, etc.) is tailing the output stream of your application and forwarding JSON log lines to your aggregation backend. No special athlete-nutrition-ai configuration is required — all API responses include structured headers and JSON bodies that your shipper can parse directly.

Configuration

The following options control how observability signals are emitted. Pass them as HTTP request headers on each call, or set them as environment variables in your server-side integration layer.

Request-level headers

Header	Default	Valid values	Effect
`X-Request-ID`	Auto-generated UUID v4	Any string ≤ 128 characters	Correlates all log lines and trace spans for a single request. Always set this yourself so you can search your own logs by the same ID.
`X-Correlation-ID`	None	Any string ≤ 256 characters	Propagates a business-level correlation token (e.g., a user session ID or order ID) across multiple API calls. Appears in every response log entry.
`traceparent`	None	W3C Trace Context format (`version-trace_id-parent_id-flags`)	Injects an existing trace context so API-side spans become children of your trace tree. Required if you want end-to-end distributed traces.
`tracestate`	None	W3C Trace Context vendor extension string	Passes vendor-specific trace metadata alongside `traceparent`.

Environment-level settings (server-side SDK)

Variable	Default	Valid values	Effect
`ANA_LOG_LEVEL`	`info`	`debug`, `info`, `warn`, `error`	Controls the verbosity of structured JSON logs emitted by your integration layer. Use `debug` during development; switch to `info` or `warn` in production to reduce volume.
`ANA_METRICS_INTERVAL_SECONDS`	`60`	Integer `10`–`3600`	How frequently aggregated metrics are flushed to your metrics sink. Lower values give finer granularity at the cost of higher write volume.
`ANA_TRACE_SAMPLE_RATE`	`1.0`	Float `0.0`–`1.0`	Fraction of requests to sample for tracing. Set to `0.1` in high-traffic production environments to reduce overhead while retaining statistical coverage.
`ANA_HEALTH_TIMEOUT_SECONDS`	`5`	Integer `1`–`30`	Maximum time your application waits for a health check response before marking the API as unavailable.

Why these defaults? The defaults are chosen to be safe for low-to-medium traffic integrations. As your request volume grows, increasing ANA_TRACE_SAMPLE_RATE down toward 0.1 and ANA_METRICS_INTERVAL_SECONDS up toward 300 will keep your observability costs proportional to value.

Usage

Health checks

Poll the health endpoint from your load balancer, uptime monitor, or Kubernetes liveness probe to verify the API is operational before routing user traffic:

curl -s https://api.athlete-nutrition-ai.com/v1/health \
  -H "Authorization: Bearer $ANA_API_KEY"

The response body is a JSON object. A status of "ok" means all upstream dependencies (AI model service, database, calendar connector) are healthy. A status of "degraded" means some features may be slow or unavailable — check the components array for details.

Correlating logs across a workflow

For multi-step workflows (e.g., creating an athlete profile, connecting a calendar, then generating a meal plan), set a stable X-Correlation-ID for the entire session so every log line can be grouped:

CORRELATION_ID="session-$(date +%s)-$(uuidgen | tr -d '-')"

# Step 1: Create athlete profile
curl -s -X POST https://api.athlete-nutrition-ai.com/v1/athletes \
  -H "Authorization: Bearer $ANA_API_KEY" \
  -H "X-Correlation-ID: $CORRELATION_ID" \
  -H "Content-Type: application/json" \
  -d '{"name": "Jordan Smith", "sport": "triathlon"}'

# Step 2: Connect training calendar
curl -s -X POST https://api.athlete-nutrition-ai.com/v1/athletes/{athlete_id}/calendar \
  -H "Authorization: Bearer $ANA_API_KEY" \
  -H "X-Correlation-ID: $CORRELATION_ID" \
  -d '{"provider": "google_calendar", "token": "..." }'

# Step 3: Generate meal plan
curl -s -X POST https://api.athlete-nutrition-ai.com/v1/athletes/{athlete_id}/meal-plans \
  -H "Authorization: Bearer $ANA_API_KEY" \
  -H "X-Correlation-ID: $CORRELATION_ID"

All three calls will share the same X-Correlation-ID in the API's response headers and your log stream, making it trivial to reconstruct the full request sequence.

Reading response headers for observability signals

Every API response includes headers you should log on the client side:

Header	What it tells you
`X-Request-ID`	The server-assigned (or echoed) request ID. Log this alongside your own application logs.
`X-Response-Time-Ms`	Server-side processing time in milliseconds. Useful for latency trending.
`X-RateLimit-Remaining`	Requests remaining in your current rate-limit window. Alert when this approaches zero.
`X-RateLimit-Reset`	Unix timestamp when your rate-limit window resets.

Metrics to track

The following application-level metrics give you the most actionable signal for a nutrition-AI integration:

Meal plan generation latency (POST /v1/athletes/{id}/meal-plans → X-Response-Time-Ms) — AI inference calls are the slowest step; set a p95 alert threshold.
Shopping list generation success rate — track 2xx vs 4xx/5xx on POST /v1/athletes/{id}/shopping-lists.
Chat advisor response time (POST /v1/athletes/{id}/chat) — users expect near-real-time responses; alert if p50 exceeds your SLA.
Rate-limit headroom — alert when X-RateLimit-Remaining drops below 20% of your plan's limit.

Examples

Example 1 — Basic health check

Verify that all API components are operational.

curl -s https://api.athlete-nutrition-ai.com/v1/health \
  -H "Authorization: Bearer $ANA_API_KEY" | jq .

Expected output (healthy):

{
  "status": "ok",
  "timestamp": "2024-11-15T08:32:01Z",
  "components": {
    "api": "ok",
    "ai_model": "ok",
    "database": "ok",
    "calendar_connector": "ok"
  }
}

Expected output (degraded — AI model slow):

{
  "status": "degraded",
  "timestamp": "2024-11-15T08:32:01Z",
  "components": {
    "api": "ok",
    "ai_model": "degraded",
    "database": "ok",
    "calendar_connector": "ok"
  },
  "message": "AI model service is experiencing elevated latency. Meal plan and chat endpoints may be slow."
}

Example 2 — Injecting trace context into a meal plan request

Propagate your trace into the API so server-side spans nest under your root span.

curl -s -X POST https://api.athlete-nutrition-ai.com/v1/athletes/ath_01HX9K2/meal-plans \
  -H "Authorization: Bearer $ANA_API_KEY" \
  -H "Content-Type: application/json" \
  -H "traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01" \
  -H "X-Request-ID: req-d4e5f6a7-b8c9" \
  -H "X-Correlation-ID: session-1731660000-abc123" \
  -d '{"week_start": "2024-11-18"}' \
  -i

Expected response headers (truncated):

HTTP/2 201
Content-Type: application/json
X-Request-ID: req-d4e5f6a7-b8c9
X-Correlation-ID: session-1731660000-abc123
X-Response-Time-Ms: 1847
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 743
X-RateLimit-Reset: 1731664800

Expected response body (truncated):

{
  "meal_plan_id": "mp_07GY3P9",
  "athlete_id": "ath_01HX9K2",
  "week_start": "2024-11-18",
  "days": [...],
  "generated_at": "2024-11-15T08:40:22Z"
}

In your tracing backend, you will see a span named POST /v1/athletes/{id}/meal-plans with trace ID 4bf92f3577b34da6a3ce929d0e0e4736 nested under your root span.

Example 3 — Extracting latency from response headers for metrics

This shell snippet calls the shopping list endpoint and pushes the server-side latency to a StatsD-compatible sink. Adapt to your own metrics library.

RESPONSE=$(curl -s -D - -X POST \
  https://api.athlete-nutrition-ai.com/v1/athletes/ath_01HX9K2/shopping-lists \
  -H "Authorization: Bearer $ANA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"meal_plan_id": "mp_07GY3P9"}')

LATENCY=$(echo "$RESPONSE" | grep -i 'x-response-time-ms' | awk '{print $2}' | tr -d '\r')

echo "athlete_nutrition_ai.shopping_list.latency_ms:${LATENCY}|ms" | \
  nc -u -w1 localhost 8125

echo "Shopping list generated in ${LATENCY}ms"

Expected terminal output:

Shopping list generated in 312ms

Example 4 — Rate-limit headroom alert script

Run this script as a cron job or monitoring check to alert when your rate-limit budget is low.

#!/usr/bin/env bash
set -euo pipefail

HEADERS=$(curl -s -D - -o /dev/null \
  https://api.athlete-nutrition-ai.com/v1/health \
  -H "Authorization: Bearer $ANA_API_KEY")

REMAINING=$(echo "$HEADERS" | grep -i 'x-ratelimit-remaining' | awk '{print $2}' | tr -d '\r')
LIMIT=$(echo "$HEADERS" | grep -i 'x-ratelimit-limit' | awk '{print $2}' | tr -d '\r')

PCT=$(( REMAINING * 100 / LIMIT ))

echo "Rate-limit headroom: ${REMAINING}/${LIMIT} (${PCT}%)"

if [ "$PCT" -lt 20 ]; then
  echo "WARNING: Rate-limit headroom below 20%. Consider upgrading your subscription."
  exit 1
fi

Expected output (healthy):

Rate-limit headroom: 820/1000 (82%)

Expected output (warning):

Rate-limit headroom: 150/1000 (15%)
WARNING: Rate-limit headroom below 20%. Consider upgrading your subscription.

Troubleshooting

Issue 1 — Health check returns `503 Service Unavailable`

Symptom: GET /v1/health returns HTTP 503 and "status": "unavailable".

Likely cause: A critical upstream dependency (database or AI model service) is down, not just degraded.

Fix:

Check the components object in the response body to identify which component is unavailable.
Consult the athlete-nutrition-ai status page for active incidents.
Implement exponential backoff with jitter in your integration before retrying user-facing operations.
If the outage persists beyond 10 minutes, open a support ticket referencing the X-Request-ID from your health check response.

Issue 2 — `X-Request-ID` in API responses does not match the value you sent

Symptom: You set X-Request-ID: my-custom-id in your request, but the response header contains a different UUID.

Likely cause: The value you provided exceeded 128 characters or contained characters outside the allowed set (printable ASCII, no whitespace). The API silently generated its own ID.

Fix:

Ensure your X-Request-ID values are ≤ 128 printable ASCII characters with no spaces.
Log the response X-Request-ID header immediately on receipt — do not rely solely on the value you sent.

Issue 3 — Trace spans do not appear nested under your root span

Symptom: Your tracing backend shows API calls as disconnected root spans rather than children of your application trace.

Likely cause: The traceparent header was not included in the request, was malformed, or was stripped by an intermediate proxy or API gateway.

Fix:

Confirm the traceparent header format follows the W3C Trace Context spec: 00-{32-hex-trace-id}-{16-hex-parent-id}-{2-hex-flags}.
Check that your API gateway or reverse proxy is not stripping custom headers. Add traceparent and tracestate to its passthrough allowlist.
Send a test request with curl -v and inspect the outbound headers to verify the header is present.

Issue 4 — `X-Response-Time-Ms` shows unexpectedly high values for meal plan generation

Symptom: POST /v1/athletes/{id}/meal-plans consistently returns X-Response-Time-Ms values above 5000ms.

Likely cause: The AI inference step is computationally intensive and scales with the complexity of the athlete's training calendar (number of upcoming events, dietary constraints). Very large calendars or many dietary rules increase processing time. The API may also be under load.

Fix:

Check GET /v1/health — if ai_model is "degraded", wait for the status to recover.
Reduce the planning horizon in your request payload (e.g., request a 7-day plan instead of a 28-day plan) to lower inference complexity.
Move meal plan generation to an asynchronous background job on your side so users are not blocked by a synchronous HTTP timeout.
If high latency is consistent and the API status is healthy, contact support with the X-Request-ID and X-Correlation-ID values from the slow requests.

Issue 5 — Rate-limit headers are missing from responses

Symptom: API responses do not include X-RateLimit-Remaining or X-RateLimit-Reset headers.

Likely cause: Your account is on the free or trial tier, which does not surface rate-limit headers, or you are hitting a cached CDN response that strips custom headers.

Fix:

Log in to the developer dashboard and confirm your subscription tier. Rate-limit headers are available on Standard and above.
If you are on an eligible tier, add Cache-Control: no-store to your requests to bypass any CDN layer.
Use curl -v to inspect raw response headers and confirm whether the headers are present at the network level before assuming they are missing.

Issue 6 — Logs show `401 Unauthorized` on the health endpoint

Symptom: GET /v1/health returns 401 when called from your monitoring system.

Likely cause: Your monitoring system is not sending the Authorization: Bearer header, or the API key has been rotated or revoked.

Fix:

Verify the API key stored in your monitoring system's secret store matches the active key in your developer dashboard.
Ensure the Authorization: Bearer <token> header is being set — some monitoring tools require explicit header configuration separate from general HTTP settings.
If the key was rotated, update it in all downstream systems (monitoring, CI/CD, production environment variables) simultaneously to avoid gaps.

Observability

Request-level headers

Environment-level settings (server-side SDK)

Health checks

Correlating logs across a workflow

Reading response headers for observability signals

Metrics to track

Example 1 — Basic health check

Example 2 — Injecting trace context into a meal plan request

Example 3 — Extracting latency from response headers for metrics

Example 4 — Rate-limit headroom alert script

Issue 1 — Health check returns 503 Service Unavailable

Issue 2 — X-Request-ID in API responses does not match the value you sent

Issue 3 — Trace spans do not appear nested under your root span

Issue 4 — X-Response-Time-Ms shows unexpectedly high values for meal plan generation

Issue 5 — Rate-limit headers are missing from responses

Issue 6 — Logs show 401 Unauthorized on the health endpoint

Issue 1 — Health check returns `503 Service Unavailable`

Issue 2 — `X-Request-ID` in API responses does not match the value you sent

Issue 4 — `X-Response-Time-Ms` shows unexpectedly high values for meal plan generation

Issue 6 — Logs show `401 Unauthorized` on the health endpoint