//nbkelley /homelab

AI-Driven Monitoring Pipeline#

What Was Established#

The monitoring pipeline is fully operational and running hourly. It collects rich structured data from four sources (Prometheus — 7 metrics, Uptime Kuma, UniFi, Synology), runs 4 parallel Ollama summarization calls, synthesises a final status report, and writes everything to Postgres. Hourly snapshots of raw UniFi and Prometheus data are stored in dedicated tables for delta computation. End-to-end runtime is ~13 minutes using gemma4:e4b CPU-only on the Pavilion — accepted as-is pending Mac Studio.

Key Decisions#

  • Architecture: Schedule Trigger → data collection (Prometheus: 7 parallel HTTP requests + merge; Uptime Kuma; UniFi; Synology: 4 HTTP requests + merge) → Code extractors → Synology 3-way merge → Postgres snapshot reads (unifi_snapshots + prometheus_snapshots) → Merge (Combine, by Position) with UniFi/Prometheus Code nodes → 4 prompt builders (with diff logic for UniFi) → 4 Ollama calls → Split Code nodes → Merge → Final Prompt Builder → Final Ollama call → Reshape → Postgres (homelab_analysis + unifi_snapshots write + prometheus_snapshots write)
  • Model: gemma4:e4b for all 5 Ollama calls (4 section + 1 synthesis). Was gemma3:27b, swapped to E4B for speed. 13 min runtime unchanged because bottleneck is large prompt tokens, not model size.
  • Split Code node architecture: Each section Ollama call returns two structured sections. A Split Code node after each Ollama node divides the response: ### Summary goes directly to Postgres as summary_*; ### Additional Information and Interesting Statistics for Next Ingestion (context bullets) is passed to the Final Prompt Builder. The final Ollama receives only the 4 context sections — not the summaries. This offloads summarization from the final call and keeps section summaries clean for the dashboard.
  • Single quotes in AI output: Must be escaped before INSERT into Postgres. Use a replace(/'/g, "''") pass in the reshape Code node.
  • Dashboard: Node.js + Express deployed at status.nbkelley.com, running on proxy VM (192.168.1.222) port 3002. See Homelab Dashboard.

n8n Workflow Structure#

1. Prometheus (7 parallel HTTP requests + tagging nodes + Merge)#

Each metric is a separate HTTP Request node, each followed by a tagging Code node (return [{ json: { metric: '<name>', data: $input.first().json.data.result } }]), all wired into a Merge (Append) node, then a single combining Code node.

Tag Prometheus Query
up up
cpu 100-(avg+by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m]))*100)
memory 100*(1-(node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes)*100)
disk disk usage query (% used)
load load average
net_rx network receive bytes/s
net_tx network transmit bytes/s

Base URL for all: http://192.168.1.167:9090/api/v1/query?query=<encoded_query>

Combining Code node identifies each result set by the metric tag field:

const items = $input.all();
const get = (metric) => items.find(i => i.json.metric === metric)?.json.data || [];
const upData = get('up'); const cpuData = get('cpu'); // ... etc
// Build hosts array joining all metrics by instance

Output: {source: 'prometheus', hosts: [{job, host, vlan, instance, status, cpu_percent, memory_used_pct, disk_used_pct, load1, net_rx_bps, net_tx_bps}]}

Note: netrx vs net_rx tag mismatch caused a bug — ensure all tagging nodes use exact names matching the get() calls.

2. Uptime Kuma (metrics endpoint)#

  • URL: http://192.168.1.58:3001/metrics
  • Auth: Basic Auth — username: empty, password: API key (created in Uptime Kuma → Settings → API Keys)
  • Returns Prometheus-format text with named monitors
  • Response is at $input.first().json.data as a raw text string

Parsing Code node:

const text = $input.first().json.data;
const lines = text.split('\n');
const statusLines = lines.filter(l => l.startsWith('monitor_status{'));
const uptimeLines = lines.filter(l => l.startsWith('monitor_uptime{'));
// Parse name, status, uptime_24h from prometheus text format

Output: {source: 'uptime_kuma', monitors: [{id, name, current_status, uptime_24h_percent, last_ping_ms}]}

17 monitors confirmed: Hinterflix, help.hinterflix.com, nbkelley.com, Ollama (192.168.2.192:11434), Open WebUI (192.168.2.192:3000), Prometheus (192.168.1.167:9090), Cloudflared LXC (192.168.1.95), and others.

Previous heartbeat API (/api/status-page/heartbeat/homelab) abandoned — returned IDs only, no monitor names.

3. UniFi (expanded extraction)#

  • URL: https://192.168.1.1/proxy/network/api/s/default/stat/device
  • Auth: X-API-KEY header
  • Other endpoints (/stat/sta, /stat/site) require session cookie auth — not worth adding

Expanded Code extractor pulls:

const data = $input.first().json.data;
const gateway = data.find(d => d.type === 'udm');
const ap = data.find(d => d.type === 'uap');
const switches = data.filter(d => d.type === 'usw');

const gw = {
  name: gateway.name,  // Olorín
  temperatures: gateway.temperatures,  // CPU temp (was ~70.7°C)
  uptime_stats: { WAN: { availability, latency_average } },
  network_table: [ /* per-VLAN: client counts, rx/tx bytes, DHCP leases */ ],
  sys_stats: { loadavg, mem_total, mem_used, mem_buffer },
  wan1: { 'rx_bytes-r', 'tx_bytes-r' },  // WAN throughput
  num_sta: total_clients
};
const apData = { name, num_sta, tx_retries, channel, sys_stats, uptime };
const switchData = switches.map(s => ({ name, port_table: [/* per-port TX/RX */], sys_stats, uptime }));

Output: {source: 'unifi', gateway: {...}, ap: {...}, switches: [...]}

Persistent finding: Gateway CPU temperature consistently ~70.7°C in reports — flagged each run.

4. Synology (multi-step, unchanged from previous)#

  • Auth: http://192.168.1.137:5000/webapi/entry.cgi?api=SYNO.API.Auth&version=6&method=login&account=monitoring&passwd=PASSWORD&session=monitoring&format=sid
  • 4 data endpoints using {{$json.data.sid}}: Storage (load_info), HddMan (get), Utilization (get), UPS (get)
  • Synology Code nodes merge into combined object before the Synology prompt builder

Combining Node (all 6 sources → 1 object)#

const items = $input.all();
const combined = {};
items.forEach(item => Object.assign(combined, item.json));
return [{ json: combined }];

Prompt Builders (4 nodes, one per source)#

Each prompt builder connects directly to its own source Code node (not the combining node).

Section field fix: Each prompt builder must include section in its body output so the final merge can identify which response came from which source:

const body = { model: 'gemma4:e4b', prompt, stream: false };
return [{ json: { body, section: 'prometheus' } }];  // section field required

Prometheus prompt — 2-3 sentences, note DOWN hosts and high resource usage:

const d = $input.first().json;
const down = d.hosts.filter(h => h.status === 'DOWN');
const prompt = `Summarize the following homelab host metrics in 2-3 sentences...
${d.hosts.map(h => `- ${h.host} (${h.vlan}): ${h.status}, CPU ${h.cpu_percent}%, Mem ${h.memory_used_pct}%, Disk ${h.disk_used_pct}%`).join('\n')}`;

Uptime Kuma prompt — note monitors below 90% uptime, use monitor names.

UniFi prompt — note gateway temp, WAN issues, TX retry rate, switch uptimes. Includes delta computation for WAN cumulative counters:

const d = $input.first().json;  // Combine merge puts all fields in one object
const gw = d.gateway;
const ap = d.ap;
const switches = d.switches;
const prev = d.wan_drops !== undefined ? {
  wan_drops: parseInt(d.wan_drops),
  wan_downtime_seconds: parseInt(d.wan_downtime_seconds),
  wan_rx_bytes: parseInt(d.wan_rx_bytes),
  wan_tx_bytes: parseInt(d.wan_tx_bytes)
} : null;
const dropsDelta = prev ? (gw.wan.drops - prev.wan_drops) : 'N/A (first run)';
const downtimeDelta = prev ? (gw.wan.downtime_seconds - prev.wan_downtime_seconds) : 'N/A';
const rxDeltaMB = prev ? ((gw.wan.rx_bytes_total - prev.wan_rx_bytes)/1024/1024).toFixed(2) : 'N/A';
const txDeltaMB = prev ? ((gw.wan.tx_bytes_total - prev.wan_tx_bytes)/1024/1024).toFixed(2) : 'N/A';

Rationale: UniFi API reports WAN drop counts and byte totals as since-reboot cumulative values. Delta-per-hour is the meaningful signal.

Synology prompt — disks, volume %, CPU/mem, UPS status.

Optimized Prompt Structure (all 4 section prompts)#

All 4 section prompt builders use a consistent two-section structure:

const prompt = `You are a homelab monitoring assistant analyzing [source] data. Respond in exactly two sections:

### Summary
2-3 sentences on current status. Note any anomalies, threshold breaches, or items requiring attention.

### Additional Information and Interesting Statistics for Next Ingestion
Key metrics as concise bullet points. Include specific values (temperatures, percentages, byte counts).
[source-specific data here]`;

const body = { model: 'gemma4:e4b', prompt, stream: false };
return [{ json: { body, section: 'prometheus' } }];  // section name varies

Split Code Nodes (one after each section Ollama call)#

Each Split Code node divides the Ollama response on the delimiter:

const response = $input.first().json.response;
const delimiter = '### Additional Information and Interesting Statistics for Next Ingestion';
const parts = response.split(delimiter);
return [{ json: {
  section: 'prometheus',  // hard-coded per node
  summary: parts[0].replace('### Summary', '').trim(),
  context: parts[1] ? parts[1].trim() : ''
}}];
  • summary → stored directly as summary_prometheus etc. in Postgres
  • context → passed to Final Prompt Builder for the synthesis call

Ollama HTTP Request (all 5 identical)#

  • Method: POST
  • URL: http://192.168.2.192:11434/api/generate
  • Body: model, prompt, stream=false (from $json.body.*)
  • Response field: response

Response Collection (Final Prompt Builder)#

Collects 4 Split Code node outputs (not the raw Ollama outputs). Stores summaries and builds context prompt:

const items = $input.all();
const summaries = items.map(item => ({ section: item.json.section, summary: item.json.summary }));
const contextBlocks = items.map(item => `## ${item.json.section}\n${item.json.context}`).join('\n\n');

const prompt = `You are a homelab monitoring assistant. Based on the following detailed metrics from each service, write a concise 2-3 sentence overall infrastructure status report. Focus on the most important issues and anomalies.\n\n${contextBlocks}`;

return [{ json: { summaries, prompt, body: { model: 'gemma4:e4b', prompt, stream: false } } }];

Key: Final Ollama receives only context bullets, not the summaries. The summaries are preserved in summaries[] and picked up by the Reshape node.

Final Ollama call#

  • Same endpoint, model gemma4:e4b
  • Prompt: synthesise the 4 context blocks into an overall status paragraph
  • Response: overall_summary

Reshape Code node (before Postgres)#

const ollama = $input.first().json;
const prev = $('Final Prompt Builder').first().json;
const esc = s => s ? s.replace(/'/g, "''") : '';  // escape single quotes for SQL
return [{ json: {
  overall_summary: esc(ollama.response),
  summary_prometheus: esc(prev.summaries.find(s => s.section === 'prometheus')?.summary),
  summary_uptime_kuma: esc(prev.summaries.find(s => s.section === 'uptime_kuma')?.summary),
  summary_unifi: esc(prev.summaries.find(s => s.section === 'unifi')?.summary),
  summary_synology: esc(prev.summaries.find(s => s.section === 'synology')?.summary),
  raw_metrics: JSON.stringify(prev.summaries)
}}];

Postgres Write (homelab_analysis)#

  • Node type: Postgres (Execute Query)
  • Credential: configured under n8n Overview → Credentials
  • INSERT into homelab_analysis using $json.* fields from reshape node

Postgres Snapshot Writes#

After the reshape node, two additional Postgres Execute Query nodes write raw data for delta computation on the next run:

unifi_snapshots INSERT — stores gateway WAN counters and environment data:

INSERT INTO unifi_snapshots (
  wan_drops, wan_downtime_seconds, wan_rx_bytes, wan_tx_bytes,
  wan_latency_ms, wan_availability_pct,
  gateway_temp_c, gateway_cpu_pct, gateway_mem_pct, client_count,
  vlan_data, ap_radio_data, switch_data
) VALUES (
  {{ $node["UniFi Code Node"].json.gateway.wan.drops }},
  {{ $node["UniFi Code Node"].json.gateway.wan.downtime_seconds }},
  {{ $node["UniFi Code Node"].json.gateway.wan.rx_bytes_total }},
  {{ $node["UniFi Code Node"].json.gateway.wan.tx_bytes_total }},
  {{ $node["UniFi Code Node"].json.gateway.wan.latency_ms }},
  {{ $node["UniFi Code Node"].json.gateway.wan.availability_pct }},
  {{ $node["UniFi Code Node"].json.gateway.temp_c }},
  {{ $node["UniFi Code Node"].json.gateway.cpu_pct }},
  {{ $node["UniFi Code Node"].json.gateway.mem_pct }},
  {{ $node["UniFi Code Node"].json.gateway.num_sta }},
  '{{ JSON.stringify($node["UniFi Code Node"].json.gateway.network_table) }}',
  '{{ JSON.stringify($node["UniFi Code Node"].json.ap) }}',
  '{{ JSON.stringify($node["UniFi Code Node"].json.switches) }}'
)

prometheus_snapshots INSERT — stores full host metrics for potential delta use:

INSERT INTO prometheus_snapshots (hosts)
VALUES ('{{ JSON.stringify($node["Prometheus Code Node"].json.hosts) }}')

Postgres Snapshot Reads (before prompt builders)#

Before the prompt builders, two Postgres Execute Query nodes read the previous run’s snapshot:

Previous UniFi snapshotSELECT * FROM unifi_snapshots ORDER BY recorded_at DESC LIMIT 1

Previous Prometheus snapshotSELECT * FROM prometheus_snapshots ORDER BY recorded_at DESC LIMIT 1

Merge (Combine, by Position) — UniFi#

A Merge node (mode: Combine, by Position) sits between the UniFi Code node and the UniFi prompt builder. It combines:

  • Input 1: UniFi Code node output (gateway, ap, switches)
  • Input 2: Previous UniFi Postgres snapshot row (individual columns from unifi_snapshots)

With Combine/Position, the two objects merge into one — all fields from both inputs land in $input.first().json. The prompt builder detects whether prev data exists by checking d.wan_drops !== undefined.

The same Merge (Combine) pattern is used for Prometheus Code node + previous prometheus_snapshots row, though no diff logic is currently implemented for Prometheus (it uses rates not cumulative counters).

Important: Using a direct connection from both the Code node and Postgres node into the prompt builder does not work — n8n only passes the first connected input. The Merge (Combine) node is required.

Timing & Performance#

  • End-to-end: ~13 minutes per hourly run
  • Model gemma4:e4b on Pavilion (CPU-only, i7, 12 cores)
  • Bottleneck: prompt eval — 1600+ token prompts take ~111s each to process
  • Ollama uses 2 threads by default; setting OLLAMA_NUM_THREADS=12 via override.conf had no effect; Modelfile num_thread 12 produced 1.77 t/s (worse); trust Ollama defaults
  • GPU disabled on Pavilion (MX550 4GB VRAM can’t fit E4B at ~5GB) — see Pavilion (AI PC) Configuration
  • Accepted as-is pending Mac Studio M1 Max 64GB arrival (purchased 2026-04-17)

Synology Hardware Reference (LonelyMountain DS923+)#

  • 3× Seagate ST4000VN006-3CW104 (4TB NAS drives), all healthy, volume ~29% used
  • UPS: CPS LX1500GU3, 100% charge, ~145 min runtime
  • Clients at startup (ACL for UPS): 192.168.1.69 (proxmox)

Known Issues / Open Questions#

  1. Synology SID expires — production pipeline needs periodic re-auth or persistent SID refresh
  2. Gateway (UCG Express, Olorín) CPU temperature consistently ~71°C — persistent signal, confirmed across multiple runs
  3. Pipeline runtime ~13 min for hourly job — will improve with Mac Studio arrival
  4. Static files in dashboard Docker image are baked at build time — CSS/HTML changes require rebuild (could add volume mount to fix)
  5. prometheus_snapshots: data is stored but no diff/delta logic is currently implemented — Prometheus uses rate() queries so raw values are less useful than for UniFi

Node Exporter Deployment, n8n, PostgreSQL, Pavilion (AI PC) Configuration, AI Infrastructure Overview, Homelab Dashboard

Sources#

  • ingested/chats/184-Local test proxy for MBTA tracker.md
  • ingested/chats/2026-04-29-servarr-diagnosis.md
  • ingested/chats/190-Proxmox OS and Storage Separation Guide.md
  • ingested/chats/185-Cloudflare Wrangler JSONC Configuration Guide.md Homelab AI - 2026-04-14 · ingested/chats/Homelab AI - 2026-04-14 Homelab AI - 2026-04-15 · ingested/chats/Homelab-AI---2026-04-15.md Homelab AI - 2026-04-16 · ingested/chats/Homelab-AI---2026-04-16.md Homelab AI - 2026-04-17 · ingested/chats/2026-04-17-31-Homelab AI.json Homelab AI - 2026-04-18 · ingested/chats/Homelab-AI---2026-04-18.md Homelab AI - 2026-04-19 · ingested/chats/Homelab-AI---2026-04-19.md