AI-Driven Monitoring Pipeline#
What Was Established#
The monitoring pipeline is fully operational and running hourly. It collects rich structured data from four sources (Prometheus — 7 metrics, Uptime Kuma, UniFi, Synology), runs 4 parallel Ollama summarization calls, synthesises a final status report, and writes everything to Postgres. Hourly snapshots of raw UniFi and Prometheus data are stored in dedicated tables for delta computation. End-to-end runtime is ~13 minutes using gemma4:e4b CPU-only on the Pavilion — accepted as-is pending Mac Studio.
Key Decisions#
- Architecture: Schedule Trigger → data collection (Prometheus: 7 parallel HTTP requests + merge; Uptime Kuma; UniFi; Synology: 4 HTTP requests + merge) → Code extractors → Synology 3-way merge → Postgres snapshot reads (unifi_snapshots + prometheus_snapshots) → Merge (Combine, by Position) with UniFi/Prometheus Code nodes → 4 prompt builders (with diff logic for UniFi) → 4 Ollama calls → Split Code nodes → Merge → Final Prompt Builder → Final Ollama call → Reshape → Postgres (homelab_analysis + unifi_snapshots write + prometheus_snapshots write)
- Model:
gemma4:e4bfor all 5 Ollama calls (4 section + 1 synthesis). Wasgemma3:27b, swapped to E4B for speed. 13 min runtime unchanged because bottleneck is large prompt tokens, not model size. - Split Code node architecture: Each section Ollama call returns two structured sections. A Split Code node after each Ollama node divides the response:
### Summarygoes directly to Postgres assummary_*;### Additional Information and Interesting Statistics for Next Ingestion(context bullets) is passed to the Final Prompt Builder. The final Ollama receives only the 4 context sections — not the summaries. This offloads summarization from the final call and keeps section summaries clean for the dashboard. - Single quotes in AI output: Must be escaped before INSERT into Postgres. Use a
replace(/'/g, "''")pass in the reshape Code node. - Dashboard: Node.js + Express deployed at
status.nbkelley.com, running on proxy VM (192.168.1.222) port 3002. See Homelab Dashboard.
n8n Workflow Structure#
1. Prometheus (7 parallel HTTP requests + tagging nodes + Merge)#
Each metric is a separate HTTP Request node, each followed by a tagging Code node (return [{ json: { metric: '<name>', data: $input.first().json.data.result } }]), all wired into a Merge (Append) node, then a single combining Code node.
| Tag | Prometheus Query |
|---|---|
up |
up |
cpu |
100-(avg+by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m]))*100) |
memory |
100*(1-(node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes)*100) |
disk |
disk usage query (% used) |
load |
load average |
net_rx |
network receive bytes/s |
net_tx |
network transmit bytes/s |
Base URL for all: http://192.168.1.167:9090/api/v1/query?query=<encoded_query>
Combining Code node identifies each result set by the metric tag field:
const items = $input.all();
const get = (metric) => items.find(i => i.json.metric === metric)?.json.data || [];
const upData = get('up'); const cpuData = get('cpu'); // ... etc
// Build hosts array joining all metrics by instance
Output: {source: 'prometheus', hosts: [{job, host, vlan, instance, status, cpu_percent, memory_used_pct, disk_used_pct, load1, net_rx_bps, net_tx_bps}]}
Note: netrx vs net_rx tag mismatch caused a bug — ensure all tagging nodes use exact names matching the get() calls.
2. Uptime Kuma (metrics endpoint)#
- URL:
http://192.168.1.58:3001/metrics - Auth: Basic Auth — username: empty, password: API key (created in Uptime Kuma → Settings → API Keys)
- Returns Prometheus-format text with named monitors
- Response is at
$input.first().json.dataas a raw text string
Parsing Code node:
const text = $input.first().json.data;
const lines = text.split('\n');
const statusLines = lines.filter(l => l.startsWith('monitor_status{'));
const uptimeLines = lines.filter(l => l.startsWith('monitor_uptime{'));
// Parse name, status, uptime_24h from prometheus text format
Output: {source: 'uptime_kuma', monitors: [{id, name, current_status, uptime_24h_percent, last_ping_ms}]}
17 monitors confirmed: Hinterflix, help.hinterflix.com, nbkelley.com, Ollama (192.168.2.192:11434), Open WebUI (192.168.2.192:3000), Prometheus (192.168.1.167:9090), Cloudflared LXC (192.168.1.95), and others.
Previous heartbeat API (/api/status-page/heartbeat/homelab) abandoned — returned IDs only, no monitor names.
3. UniFi (expanded extraction)#
- URL:
https://192.168.1.1/proxy/network/api/s/default/stat/device - Auth:
X-API-KEYheader - Other endpoints (
/stat/sta,/stat/site) require session cookie auth — not worth adding
Expanded Code extractor pulls:
const data = $input.first().json.data;
const gateway = data.find(d => d.type === 'udm');
const ap = data.find(d => d.type === 'uap');
const switches = data.filter(d => d.type === 'usw');
const gw = {
name: gateway.name, // Olorín
temperatures: gateway.temperatures, // CPU temp (was ~70.7°C)
uptime_stats: { WAN: { availability, latency_average } },
network_table: [ /* per-VLAN: client counts, rx/tx bytes, DHCP leases */ ],
sys_stats: { loadavg, mem_total, mem_used, mem_buffer },
wan1: { 'rx_bytes-r', 'tx_bytes-r' }, // WAN throughput
num_sta: total_clients
};
const apData = { name, num_sta, tx_retries, channel, sys_stats, uptime };
const switchData = switches.map(s => ({ name, port_table: [/* per-port TX/RX */], sys_stats, uptime }));Output: {source: 'unifi', gateway: {...}, ap: {...}, switches: [...]}
Persistent finding: Gateway CPU temperature consistently ~70.7°C in reports — flagged each run.
4. Synology (multi-step, unchanged from previous)#
- Auth:
http://192.168.1.137:5000/webapi/entry.cgi?api=SYNO.API.Auth&version=6&method=login&account=monitoring&passwd=PASSWORD&session=monitoring&format=sid - 4 data endpoints using
{{$json.data.sid}}: Storage (load_info), HddMan (get), Utilization (get), UPS (get) - Synology Code nodes merge into combined object before the Synology prompt builder
Combining Node (all 6 sources → 1 object)#
const items = $input.all();
const combined = {};
items.forEach(item => Object.assign(combined, item.json));
return [{ json: combined }];Prompt Builders (4 nodes, one per source)#
Each prompt builder connects directly to its own source Code node (not the combining node).
Section field fix: Each prompt builder must include section in its body output so the final merge can identify which response came from which source:
const body = { model: 'gemma4:e4b', prompt, stream: false };
return [{ json: { body, section: 'prometheus' } }]; // section field required
Prometheus prompt — 2-3 sentences, note DOWN hosts and high resource usage:
const d = $input.first().json;
const down = d.hosts.filter(h => h.status === 'DOWN');
const prompt = `Summarize the following homelab host metrics in 2-3 sentences...
${d.hosts.map(h => `- ${h.host} (${h.vlan}): ${h.status}, CPU ${h.cpu_percent}%, Mem ${h.memory_used_pct}%, Disk ${h.disk_used_pct}%`).join('\n')}`;Uptime Kuma prompt — note monitors below 90% uptime, use monitor names.
UniFi prompt — note gateway temp, WAN issues, TX retry rate, switch uptimes. Includes delta computation for WAN cumulative counters:
const d = $input.first().json; // Combine merge puts all fields in one object
const gw = d.gateway;
const ap = d.ap;
const switches = d.switches;
const prev = d.wan_drops !== undefined ? {
wan_drops: parseInt(d.wan_drops),
wan_downtime_seconds: parseInt(d.wan_downtime_seconds),
wan_rx_bytes: parseInt(d.wan_rx_bytes),
wan_tx_bytes: parseInt(d.wan_tx_bytes)
} : null;
const dropsDelta = prev ? (gw.wan.drops - prev.wan_drops) : 'N/A (first run)';
const downtimeDelta = prev ? (gw.wan.downtime_seconds - prev.wan_downtime_seconds) : 'N/A';
const rxDeltaMB = prev ? ((gw.wan.rx_bytes_total - prev.wan_rx_bytes)/1024/1024).toFixed(2) : 'N/A';
const txDeltaMB = prev ? ((gw.wan.tx_bytes_total - prev.wan_tx_bytes)/1024/1024).toFixed(2) : 'N/A';Rationale: UniFi API reports WAN drop counts and byte totals as since-reboot cumulative values. Delta-per-hour is the meaningful signal.
Synology prompt — disks, volume %, CPU/mem, UPS status.
Optimized Prompt Structure (all 4 section prompts)#
All 4 section prompt builders use a consistent two-section structure:
const prompt = `You are a homelab monitoring assistant analyzing [source] data. Respond in exactly two sections:
### Summary
2-3 sentences on current status. Note any anomalies, threshold breaches, or items requiring attention.
### Additional Information and Interesting Statistics for Next Ingestion
Key metrics as concise bullet points. Include specific values (temperatures, percentages, byte counts).
[source-specific data here]`;
const body = { model: 'gemma4:e4b', prompt, stream: false };
return [{ json: { body, section: 'prometheus' } }]; // section name varies
Split Code Nodes (one after each section Ollama call)#
Each Split Code node divides the Ollama response on the delimiter:
const response = $input.first().json.response;
const delimiter = '### Additional Information and Interesting Statistics for Next Ingestion';
const parts = response.split(delimiter);
return [{ json: {
section: 'prometheus', // hard-coded per node
summary: parts[0].replace('### Summary', '').trim(),
context: parts[1] ? parts[1].trim() : ''
}}];summary→ stored directly assummary_prometheusetc. in Postgrescontext→ passed to Final Prompt Builder for the synthesis call
Ollama HTTP Request (all 5 identical)#
- Method: POST
- URL:
http://192.168.2.192:11434/api/generate - Body:
model,prompt,stream=false(from$json.body.*) - Response field:
response
Response Collection (Final Prompt Builder)#
Collects 4 Split Code node outputs (not the raw Ollama outputs). Stores summaries and builds context prompt:
const items = $input.all();
const summaries = items.map(item => ({ section: item.json.section, summary: item.json.summary }));
const contextBlocks = items.map(item => `## ${item.json.section}\n${item.json.context}`).join('\n\n');
const prompt = `You are a homelab monitoring assistant. Based on the following detailed metrics from each service, write a concise 2-3 sentence overall infrastructure status report. Focus on the most important issues and anomalies.\n\n${contextBlocks}`;
return [{ json: { summaries, prompt, body: { model: 'gemma4:e4b', prompt, stream: false } } }];Key: Final Ollama receives only context bullets, not the summaries. The summaries are preserved in summaries[] and picked up by the Reshape node.
Final Ollama call#
- Same endpoint, model
gemma4:e4b - Prompt: synthesise the 4 context blocks into an overall status paragraph
- Response:
overall_summary
Reshape Code node (before Postgres)#
const ollama = $input.first().json;
const prev = $('Final Prompt Builder').first().json;
const esc = s => s ? s.replace(/'/g, "''") : ''; // escape single quotes for SQL
return [{ json: {
overall_summary: esc(ollama.response),
summary_prometheus: esc(prev.summaries.find(s => s.section === 'prometheus')?.summary),
summary_uptime_kuma: esc(prev.summaries.find(s => s.section === 'uptime_kuma')?.summary),
summary_unifi: esc(prev.summaries.find(s => s.section === 'unifi')?.summary),
summary_synology: esc(prev.summaries.find(s => s.section === 'synology')?.summary),
raw_metrics: JSON.stringify(prev.summaries)
}}];Postgres Write (homelab_analysis)#
- Node type: Postgres (Execute Query)
- Credential: configured under n8n Overview → Credentials
- INSERT into
homelab_analysisusing$json.*fields from reshape node
Postgres Snapshot Writes#
After the reshape node, two additional Postgres Execute Query nodes write raw data for delta computation on the next run:
unifi_snapshots INSERT — stores gateway WAN counters and environment data:
INSERT INTO unifi_snapshots (
wan_drops, wan_downtime_seconds, wan_rx_bytes, wan_tx_bytes,
wan_latency_ms, wan_availability_pct,
gateway_temp_c, gateway_cpu_pct, gateway_mem_pct, client_count,
vlan_data, ap_radio_data, switch_data
) VALUES (
{{ $node["UniFi Code Node"].json.gateway.wan.drops }},
{{ $node["UniFi Code Node"].json.gateway.wan.downtime_seconds }},
{{ $node["UniFi Code Node"].json.gateway.wan.rx_bytes_total }},
{{ $node["UniFi Code Node"].json.gateway.wan.tx_bytes_total }},
{{ $node["UniFi Code Node"].json.gateway.wan.latency_ms }},
{{ $node["UniFi Code Node"].json.gateway.wan.availability_pct }},
{{ $node["UniFi Code Node"].json.gateway.temp_c }},
{{ $node["UniFi Code Node"].json.gateway.cpu_pct }},
{{ $node["UniFi Code Node"].json.gateway.mem_pct }},
{{ $node["UniFi Code Node"].json.gateway.num_sta }},
'{{ JSON.stringify($node["UniFi Code Node"].json.gateway.network_table) }}',
'{{ JSON.stringify($node["UniFi Code Node"].json.ap) }}',
'{{ JSON.stringify($node["UniFi Code Node"].json.switches) }}'
)prometheus_snapshots INSERT — stores full host metrics for potential delta use:
INSERT INTO prometheus_snapshots (hosts)
VALUES ('{{ JSON.stringify($node["Prometheus Code Node"].json.hosts) }}')Postgres Snapshot Reads (before prompt builders)#
Before the prompt builders, two Postgres Execute Query nodes read the previous run’s snapshot:
Previous UniFi snapshot — SELECT * FROM unifi_snapshots ORDER BY recorded_at DESC LIMIT 1
Previous Prometheus snapshot — SELECT * FROM prometheus_snapshots ORDER BY recorded_at DESC LIMIT 1
Merge (Combine, by Position) — UniFi#
A Merge node (mode: Combine, by Position) sits between the UniFi Code node and the UniFi prompt builder. It combines:
- Input 1: UniFi Code node output (
gateway,ap,switches) - Input 2: Previous UniFi Postgres snapshot row (individual columns from
unifi_snapshots)
With Combine/Position, the two objects merge into one — all fields from both inputs land in $input.first().json. The prompt builder detects whether prev data exists by checking d.wan_drops !== undefined.
The same Merge (Combine) pattern is used for Prometheus Code node + previous prometheus_snapshots row, though no diff logic is currently implemented for Prometheus (it uses rates not cumulative counters).
Important: Using a direct connection from both the Code node and Postgres node into the prompt builder does not work — n8n only passes the first connected input. The Merge (Combine) node is required.
Timing & Performance#
- End-to-end: ~13 minutes per hourly run
- Model
gemma4:e4bon Pavilion (CPU-only, i7, 12 cores) - Bottleneck: prompt eval — 1600+ token prompts take ~111s each to process
- Ollama uses 2 threads by default; setting
OLLAMA_NUM_THREADS=12via override.conf had no effect; Modelfilenum_thread 12produced 1.77 t/s (worse); trust Ollama defaults - GPU disabled on Pavilion (MX550 4GB VRAM can’t fit E4B at ~5GB) — see Pavilion (AI PC) Configuration
- Accepted as-is pending Mac Studio M1 Max 64GB arrival (purchased 2026-04-17)
Synology Hardware Reference (LonelyMountain DS923+)#
- 3× Seagate ST4000VN006-3CW104 (4TB NAS drives), all healthy, volume ~29% used
- UPS: CPS LX1500GU3, 100% charge, ~145 min runtime
- Clients at startup (ACL for UPS): 192.168.1.69 (proxmox)
Known Issues / Open Questions#
- Synology SID expires — production pipeline needs periodic re-auth or persistent SID refresh
- Gateway (UCG Express, Olorín) CPU temperature consistently ~71°C — persistent signal, confirmed across multiple runs
- Pipeline runtime ~13 min for hourly job — will improve with Mac Studio arrival
- Static files in dashboard Docker image are baked at build time — CSS/HTML changes require rebuild (could add volume mount to fix)
- prometheus_snapshots: data is stored but no diff/delta logic is currently implemented — Prometheus uses rate() queries so raw values are less useful than for UniFi
Related Pages#
Node Exporter Deployment, n8n, PostgreSQL, Pavilion (AI PC) Configuration, AI Infrastructure Overview, Homelab Dashboard
Sources#
ingested/chats/184-Local test proxy for MBTA tracker.mdingested/chats/2026-04-29-servarr-diagnosis.mdingested/chats/190-Proxmox OS and Storage Separation Guide.mdingested/chats/185-Cloudflare Wrangler JSONC Configuration Guide.mdHomelab AI - 2026-04-14 ·ingested/chats/Homelab AI - 2026-04-14Homelab AI - 2026-04-15 ·ingested/chats/Homelab-AI---2026-04-15.mdHomelab AI - 2026-04-16 ·ingested/chats/Homelab-AI---2026-04-16.mdHomelab AI - 2026-04-17 ·ingested/chats/2026-04-17-31-Homelab AI.jsonHomelab AI - 2026-04-18 ·ingested/chats/Homelab-AI---2026-04-18.mdHomelab AI - 2026-04-19 ·ingested/chats/Homelab-AI---2026-04-19.md