Observability
Logging, metrics, tracing, health checks
This guide explains how to monitor, debug, and maintain your integration with the athlete-nutrition-ai API. You'll learn how to interpret structured logs, query health check endpoints, understand the metrics surfaced by the API, and trace requests end-to-end across the core workflows — from athlete profile setup through meal plan generation and shopping list delivery. Effective observability lets you detect anomalies early, reduce mean time to resolution, and give your users a reliable experience.
Before working through this guide, make sure you have:
- An active athlete-nutrition-ai developer account with a valid API key (see the Authentication guide)
- A subscription tier that includes API access — observability endpoints are available on Standard and above
curl7.68+ or any HTTP client capable of sending bearer-token requests- (Optional but recommended) A log aggregation tool such as Datadog, Grafana Loki, or the Elastic Stack to collect and query structured JSON logs
- (Optional) An OpenTelemetry-compatible tracing backend (e.g., Jaeger, Zipkin, or Honeycomb) if you intend to use distributed tracing
- Familiarity with HTTP status codes and JSON response formats
The observability features are built into the athlete-nutrition-ai API — there is nothing to install on the API side. You configure your own tooling to consume the signals the API exposes.
Step 1 — Verify API connectivity
curl -i https://api.athlete-nutrition-ai.com/v1/health
A 200 OK response confirms the API is reachable and healthy before you configure any downstream tooling.
Step 2 — Install the OpenTelemetry Collector (optional, for tracing)
If you want to collect distributed traces, deploy the OpenTelemetry Collector in your infrastructure:
# Using Docker
docker pull otel/opentelemetry-collector-contrib:latest
docker run -d \
--name otel-collector \
-p 4317:4317 \
-p 4318:4318 \
-v $(pwd)/otel-config.yaml:/etc/otelcol/config.yaml \
otel/opentelemetry-collector-contrib:latest
Step 3 — Create a minimal collector configuration file
Save the following as otel-config.yaml in your working directory:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
exporters:
logging:
loglevel: debug
# Replace with your preferred backend, e.g. jaeger, zipkin, otlp/http
service:
pipelines:
traces:
receivers: [otlp]
exporters: [logging]
Step 4 — Point your application to the collector
Set the following environment variables in your application so outbound trace context is forwarded:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_SERVICE_NAME=my-nutrition-app
export OTEL_TRACES_EXPORTER=otlp
Step 5 — Confirm log forwarding
Ensure your log shipper (Fluentd, Logstash, Vector, etc.) is tailing the output stream of your application and forwarding JSON log lines to your aggregation backend. No special athlete-nutrition-ai configuration is required — all API responses include structured headers and JSON bodies that your shipper can parse directly.
The following options control how observability signals are emitted. Pass them as HTTP request headers on each call, or set them as environment variables in your server-side integration layer.
Request-level headers
| Header | Default | Valid values | Effect |
|---|---|---|---|
X-Request-ID | Auto-generated UUID v4 | Any string ≤ 128 characters | Correlates all log lines and trace spans for a single request. Always set this yourself so you can search your own logs by the same ID. |
X-Correlation-ID | None | Any string ≤ 256 characters | Propagates a business-level correlation token (e.g., a user session ID or order ID) across multiple API calls. Appears in every response log entry. |
traceparent | None | W3C Trace Context format (version-trace_id-parent_id-flags) | Injects an existing trace context so API-side spans become children of your trace tree. Required if you want end-to-end distributed traces. |
tracestate | None | W3C Trace Context vendor extension string | Passes vendor-specific trace metadata alongside traceparent. |
Environment-level settings (server-side SDK)
| Variable | Default | Valid values | Effect |
|---|---|---|---|
ANA_LOG_LEVEL | info | debug, info, warn, error | Controls the verbosity of structured JSON logs emitted by your integration layer. Use debug during development; switch to info or warn in production to reduce volume. |
ANA_METRICS_INTERVAL_SECONDS | 60 | Integer 10–3600 | How frequently aggregated metrics are flushed to your metrics sink. Lower values give finer granularity at the cost of higher write volume. |
ANA_TRACE_SAMPLE_RATE | 1.0 | Float 0.0–1.0 | Fraction of requests to sample for tracing. Set to 0.1 in high-traffic production environments to reduce overhead while retaining statistical coverage. |
ANA_HEALTH_TIMEOUT_SECONDS | 5 | Integer 1–30 | Maximum time your application waits for a health check response before marking the API as unavailable. |
Why these defaults? The defaults are chosen to be safe for low-to-medium traffic integrations. As your request volume grows, increasing
ANA_TRACE_SAMPLE_RATEdown toward0.1andANA_METRICS_INTERVAL_SECONDSup toward300will keep your observability costs proportional to value.
Health checks
Poll the health endpoint from your load balancer, uptime monitor, or Kubernetes liveness probe to verify the API is operational before routing user traffic:
curl -s https://api.athlete-nutrition-ai.com/v1/health \
-H "Authorization: Bearer $ANA_API_KEY"
The response body is a JSON object. A status of "ok" means all upstream dependencies (AI model service, database, calendar connector) are healthy. A status of "degraded" means some features may be slow or unavailable — check the components array for details.
Correlating logs across a workflow
For multi-step workflows (e.g., creating an athlete profile, connecting a calendar, then generating a meal plan), set a stable X-Correlation-ID for the entire session so every log line can be grouped:
CORRELATION_ID="session-$(date +%s)-$(uuidgen | tr -d '-')"
# Step 1: Create athlete profile
curl -s -X POST https://api.athlete-nutrition-ai.com/v1/athletes \
-H "Authorization: Bearer $ANA_API_KEY" \
-H "X-Correlation-ID: $CORRELATION_ID" \
-H "Content-Type: application/json" \
-d '{"name": "Jordan Smith", "sport": "triathlon"}'
# Step 2: Connect training calendar
curl -s -X POST https://api.athlete-nutrition-ai.com/v1/athletes/{athlete_id}/calendar \
-H "Authorization: Bearer $ANA_API_KEY" \
-H "X-Correlation-ID: $CORRELATION_ID" \
-d '{"provider": "google_calendar", "token": "..." }'
# Step 3: Generate meal plan
curl -s -X POST https://api.athlete-nutrition-ai.com/v1/athletes/{athlete_id}/meal-plans \
-H "Authorization: Bearer $ANA_API_KEY" \
-H "X-Correlation-ID: $CORRELATION_ID"
All three calls will share the same X-Correlation-ID in the API's response headers and your log stream, making it trivial to reconstruct the full request sequence.
Reading response headers for observability signals
Every API response includes headers you should log on the client side:
| Header | What it tells you |
|---|---|
X-Request-ID | The server-assigned (or echoed) request ID. Log this alongside your own application logs. |
X-Response-Time-Ms | Server-side processing time in milliseconds. Useful for latency trending. |
X-RateLimit-Remaining | Requests remaining in your current rate-limit window. Alert when this approaches zero. |
X-RateLimit-Reset | Unix timestamp when your rate-limit window resets. |
Metrics to track
The following application-level metrics give you the most actionable signal for a nutrition-AI integration:
- Meal plan generation latency (
POST /v1/athletes/{id}/meal-plans→X-Response-Time-Ms) — AI inference calls are the slowest step; set a p95 alert threshold. - Shopping list generation success rate — track
2xxvs4xx/5xxonPOST /v1/athletes/{id}/shopping-lists. - Chat advisor response time (
POST /v1/athletes/{id}/chat) — users expect near-real-time responses; alert if p50 exceeds your SLA. - Rate-limit headroom — alert when
X-RateLimit-Remainingdrops below 20% of your plan's limit.
Example 1 — Basic health check
Verify that all API components are operational.
curl -s https://api.athlete-nutrition-ai.com/v1/health \
-H "Authorization: Bearer $ANA_API_KEY" | jq .
Expected output (healthy):
{
"status": "ok",
"timestamp": "2024-11-15T08:32:01Z",
"components": {
"api": "ok",
"ai_model": "ok",
"database": "ok",
"calendar_connector": "ok"
}
}
Expected output (degraded — AI model slow):
{
"status": "degraded",
"timestamp": "2024-11-15T08:32:01Z",
"components": {
"api": "ok",
"ai_model": "degraded",
"database": "ok",
"calendar_connector": "ok"
},
"message": "AI model service is experiencing elevated latency. Meal plan and chat endpoints may be slow."
}
Example 2 — Injecting trace context into a meal plan request
Propagate your trace into the API so server-side spans nest under your root span.
curl -s -X POST https://api.athlete-nutrition-ai.com/v1/athletes/ath_01HX9K2/meal-plans \
-H "Authorization: Bearer $ANA_API_KEY" \
-H "Content-Type: application/json" \
-H "traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01" \
-H "X-Request-ID: req-d4e5f6a7-b8c9" \
-H "X-Correlation-ID: session-1731660000-abc123" \
-d '{"week_start": "2024-11-18"}' \
-i
Expected response headers (truncated):
HTTP/2 201
Content-Type: application/json
X-Request-ID: req-d4e5f6a7-b8c9
X-Correlation-ID: session-1731660000-abc123
X-Response-Time-Ms: 1847
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 743
X-RateLimit-Reset: 1731664800
Expected response body (truncated):
{
"meal_plan_id": "mp_07GY3P9",
"athlete_id": "ath_01HX9K2",
"week_start": "2024-11-18",
"days": [...],
"generated_at": "2024-11-15T08:40:22Z"
}
In your tracing backend, you will see a span named POST /v1/athletes/{id}/meal-plans with trace ID 4bf92f3577b34da6a3ce929d0e0e4736 nested under your root span.
Example 3 — Extracting latency from response headers for metrics
This shell snippet calls the shopping list endpoint and pushes the server-side latency to a StatsD-compatible sink. Adapt to your own metrics library.
RESPONSE=$(curl -s -D - -X POST \
https://api.athlete-nutrition-ai.com/v1/athletes/ath_01HX9K2/shopping-lists \
-H "Authorization: Bearer $ANA_API_KEY" \
-H "Content-Type: application/json" \
-d '{"meal_plan_id": "mp_07GY3P9"}')
LATENCY=$(echo "$RESPONSE" | grep -i 'x-response-time-ms' | awk '{print $2}' | tr -d '\r')
echo "athlete_nutrition_ai.shopping_list.latency_ms:${LATENCY}|ms" | \
nc -u -w1 localhost 8125
echo "Shopping list generated in ${LATENCY}ms"
Expected terminal output:
Shopping list generated in 312ms
Example 4 — Rate-limit headroom alert script
Run this script as a cron job or monitoring check to alert when your rate-limit budget is low.
#!/usr/bin/env bash
set -euo pipefail
HEADERS=$(curl -s -D - -o /dev/null \
https://api.athlete-nutrition-ai.com/v1/health \
-H "Authorization: Bearer $ANA_API_KEY")
REMAINING=$(echo "$HEADERS" | grep -i 'x-ratelimit-remaining' | awk '{print $2}' | tr -d '\r')
LIMIT=$(echo "$HEADERS" | grep -i 'x-ratelimit-limit' | awk '{print $2}' | tr -d '\r')
PCT=$(( REMAINING * 100 / LIMIT ))
echo "Rate-limit headroom: ${REMAINING}/${LIMIT} (${PCT}%)"
if [ "$PCT" -lt 20 ]; then
echo "WARNING: Rate-limit headroom below 20%. Consider upgrading your subscription."
exit 1
fi
Expected output (healthy):
Rate-limit headroom: 820/1000 (82%)
Expected output (warning):
Rate-limit headroom: 150/1000 (15%)
WARNING: Rate-limit headroom below 20%. Consider upgrading your subscription.
Issue 1 — Health check returns 503 Service Unavailable
Symptom: GET /v1/health returns HTTP 503 and "status": "unavailable".
Likely cause: A critical upstream dependency (database or AI model service) is down, not just degraded.
Fix:
- Check the
componentsobject in the response body to identify which component is unavailable. - Consult the athlete-nutrition-ai status page for active incidents.
- Implement exponential backoff with jitter in your integration before retrying user-facing operations.
- If the outage persists beyond 10 minutes, open a support ticket referencing the
X-Request-IDfrom your health check response.
Issue 2 — X-Request-ID in API responses does not match the value you sent
Symptom: You set X-Request-ID: my-custom-id in your request, but the response header contains a different UUID.
Likely cause: The value you provided exceeded 128 characters or contained characters outside the allowed set (printable ASCII, no whitespace). The API silently generated its own ID.
Fix:
- Ensure your
X-Request-IDvalues are ≤ 128 printable ASCII characters with no spaces. - Log the response
X-Request-IDheader immediately on receipt — do not rely solely on the value you sent.
Issue 3 — Trace spans do not appear nested under your root span
Symptom: Your tracing backend shows API calls as disconnected root spans rather than children of your application trace.
Likely cause: The traceparent header was not included in the request, was malformed, or was stripped by an intermediate proxy or API gateway.
Fix:
- Confirm the
traceparentheader format follows the W3C Trace Context spec:00-{32-hex-trace-id}-{16-hex-parent-id}-{2-hex-flags}. - Check that your API gateway or reverse proxy is not stripping custom headers. Add
traceparentandtracestateto its passthrough allowlist. - Send a test request with
curl -vand inspect the outbound headers to verify the header is present.
Issue 4 — X-Response-Time-Ms shows unexpectedly high values for meal plan generation
Symptom: POST /v1/athletes/{id}/meal-plans consistently returns X-Response-Time-Ms values above 5000ms.
Likely cause: The AI inference step is computationally intensive and scales with the complexity of the athlete's training calendar (number of upcoming events, dietary constraints). Very large calendars or many dietary rules increase processing time. The API may also be under load.
Fix:
- Check
GET /v1/health— ifai_modelis"degraded", wait for the status to recover. - Reduce the planning horizon in your request payload (e.g., request a 7-day plan instead of a 28-day plan) to lower inference complexity.
- Move meal plan generation to an asynchronous background job on your side so users are not blocked by a synchronous HTTP timeout.
- If high latency is consistent and the API status is healthy, contact support with the
X-Request-IDandX-Correlation-IDvalues from the slow requests.
Issue 5 — Rate-limit headers are missing from responses
Symptom: API responses do not include X-RateLimit-Remaining or X-RateLimit-Reset headers.
Likely cause: Your account is on the free or trial tier, which does not surface rate-limit headers, or you are hitting a cached CDN response that strips custom headers.
Fix:
- Log in to the developer dashboard and confirm your subscription tier. Rate-limit headers are available on Standard and above.
- If you are on an eligible tier, add
Cache-Control: no-storeto your requests to bypass any CDN layer. - Use
curl -vto inspect raw response headers and confirm whether the headers are present at the network level before assuming they are missing.
Issue 6 — Logs show 401 Unauthorized on the health endpoint
Symptom: GET /v1/health returns 401 when called from your monitoring system.
Likely cause: Your monitoring system is not sending the Authorization: Bearer header, or the API key has been rotated or revoked.
Fix:
- Verify the API key stored in your monitoring system's secret store matches the active key in your developer dashboard.
- Ensure the
Authorization: Bearer <token>header is being set — some monitoring tools require explicit header configuration separate from general HTTP settings. - If the key was rotated, update it in all downstream systems (monitoring, CI/CD, production environment variables) simultaneously to avoid gaps.