//nbkelley /homelab

Wiki Pipeline Scripts

Wiki Pipeline Scripts#

What Was Established#

Eight Python scripts in /opt/wiki/homelab/scripts/ implement the full wiki pipeline: file conversion, document ingestion, conversation crystallization (standard, DeepSeek, and Claude formats), shared LLM infrastructure, wiki health-checking, and knowledge-graph integration. All scripts were ported from the work wiki pipeline (itself developed 2026-04-21 → 2026-04-26) with homelab-specific infrastructure baked in.

crystallize.py (Claude format) uses a two-step LLM approach: gemma4:e2b cleans, qwen3.6:35b crystallizes. crystallize_deepseek.py skips gemma — JSON parsing is handled deterministically in Python (load_conversation + _clean_text), so only qwen is needed.

Wiki System - Architecture

Wiki System - Architecture#

What Was Established#

The wiki system is designed around the LLM wiki pattern (Karpathy): raw sources (chat transcripts, notes, docs) are crystallized into structured markdown pages, embedded into pgvector, and retrieved semantically by agents in future sessions. A dedicated LXC (nk-wiki) will host the wiki VM, separating wiki infrastructure from other services.

Multi-Wiki Namespace Design#

Three wikis are planned, each with its own namespace in pgvector:

AI Infrastructure Overview

AI Infrastructure Overview#

What Was Established#

The homelab is transitioning into a multi-node agentic architecture, utilizing a mix of existing laptops, desktops, and a future Mac Studio to handle different tiers of LLM workloads (Batch vs. Interactive).

Key Decisions#

Nodes are specialized by their hardware capabilities (VRAM and CPU/RAM) to optimize for cost and performance:

  • Inference Node (Batch/Heavy + Embeddings): HP Pavilion 15t-e300 — hostname nk-celebrimbor, IP 192.168.2.192. Intel i7, 32GB RAM, NVIDIA MX550 (2GB VRAM, CUDA disabled). Runs gemma4:e4b for monitoring pipeline synthesis (~15-18 t/s, CPU-only) and nomic-embed-text for wiki semantic embeddings (768-dim, via Ollama on port 11434).
  • Orchestrator Node: Thinkpad T480. Intel i5/i7 8th Gen, 32GB RAM. Running headless Ubuntu. Hosts n8n and lightweight models (Gemma 4 E4B) for routing and decision-making.
  • Interactive Node (Potential): ROG Zephyrus (GU501). Intel i7, NVIDIA GTX 1080 Max-Q (8GB VRAM). Ideal for 7B/8B models requiring high tokens-per-second for real-time chat.
  • Primary Reasoning Node (Deployed 2026-04-24): Mac Studio M1 Max, 64GB Unified Memory — hostname Legolas, IP 192.168.1.45. Handles all wiki pipeline LLM calls: gemma4:e2b (text cleaning), qwen3.6:35b-a3b-coding-nvfp4 (JSON crystallization), minicpm-v:8b (PDF OCR/vision). Fast interactive inference — 31B models at ~25+ t/s vs Pavilion’s ~15 t/s CPU-only. See Mac Studio.
  • Parallelism Nodes: Various i5 8th Gen desktops. 32GB RAM, no GPU. Used for distributed pipeline stages or additional lightweight model instances.

Current Configuration#

  • Legolas (Mac Studio): Ollama at 192.168.1.45:11434. Running gemma4:e2b, qwen3.6:35b-a3b-coding-nvfp4, minicpm-v:8b for wiki pipeline. Deployed 2026-04-24.
  • nk-celebrimbor (Pavilion): headless Ubuntu, Ollama CPU-only (CUDA disabled — MX550 2GB VRAM too small). Running gemma4:e4b at ~15-18 t/s for hourly monitoring pipeline; nomic-embed-text for wiki embeddings.
  • T480: planned orchestrator role not yet active.

Ollama Configuration, Open WebUI Deployment, Mac Studio, Pavilion (AI PC) Configuration

Mac Studio

Mac Studio#

What Was Established#

Mac Studio M1 Max 64GB purchased 2026-04-17 for $2,299 (used/refurbished market). Deployed 2026-04-24 as the primary AI inference node for the homelab, replacing the Pavilion as the fast interactive reasoning machine.

Hardware#

Detail Value
Model Mac Studio (2022, M1 Max)
Hostname Legolas
IP 192.168.1.45
Memory 64GB Unified Memory
Memory bandwidth ~400 GB/s
Purchase price $2,299 (used)
Status Deployed, operational 2026-04-24

Why This Hardware#

64GB unified memory is the key spec for LLM inference on Apple Silicon. Memory bandwidth determines tokens/second — the M1 Max at ~400 GB/s delivers ~25+ t/s for 27-31B models vs the Pavilion’s ~15 t/s CPU-only for E4B. At the time of purchase, 64GB Mac Studio/Mac Mini new from Apple was backordered to late June. The used M1 Max at $2,299 was judged a reasonable buy given scarcity.

Pavilion (AI PC) Configuration

Pavilion (AI PC) Configuration#

What Was Established#

The Pavilion machine uses a USB-based or specific Ethernet interface (enx6c1f7197a66) that occasionally fails to bring the link up automatically on boot.

Current Configuration#

Netplan Configuration#

Ensure /etc/netplan/01-netcfg.yaml is correctly configured with the active interface name and permissions are set to 600.

network:
  version: 2
  ethernets:
    enx6c1f7197a66:
      dhcp4: true

Apply with:

sudo chmod 600 /etc/netplan/01-netcfg.yaml
sudo netplan apply

Boot-time Interface Fix#

If the interface remains DOWN after reboot, use a systemd service to force the link up.

Local Model Training & Fine-Tuning Guide

Local Model Training & Fine-Tuning Guide#

What Was Established#

Guide for fine-tuning local LLMs (DeepSeek) using Hugging Face transformers, with emphasis on VRAM-efficient techniques for single-GPU setups.

Key Decisions#

  • Framework: Hugging Face transformers + Trainer API for fine-tuning
  • Model: deepseek-ai/deepseek-llm-7b (example model)
  • Efficiency: LoRA (Low-Rank Adaptation) + 4-bit quantization via bitsandbytes to fit large models on consumer GPUs

Setup#

pip install torch transformers datasets accelerate peft bitsandbytes

Verify GPU: nvidia-smi — need CUDA 11.8+.

AI-Driven Monitoring Pipeline

AI-Driven Monitoring Pipeline#

What Was Established#

The monitoring pipeline is fully operational and running hourly. It collects rich structured data from four sources (Prometheus — 7 metrics, Uptime Kuma, UniFi, Synology), runs 4 parallel Ollama summarization calls, synthesises a final status report, and writes everything to Postgres. Hourly snapshots of raw UniFi and Prometheus data are stored in dedicated tables for delta computation. End-to-end runtime is ~13 minutes using gemma4:e4b CPU-only on the Pavilion — accepted as-is pending Mac Studio.

Book Discovery Pipeline

Book Discovery Pipeline#

What Was Established#

A multi-agent, multi-node pipeline designed to identify high-prestige, upcoming literary works by analyzing critical reviews before they hit the mainstream.

Key Decisions#

  • Architecture: Two-tier agent system. Lightweight models (E4B) on the Orchestrator (T480) handle routing/filtering; heavier models (26B) on the Inference node (Pavilion) handle deep analysis.
  • Tech Stack: n8n (Orchestration), PostgreSQL (Data Storage), Hugo (Static Site Generation), Python/JS (Custom Logic).
  • Data Sources: RSS feeds (Literary Hub, etc.), Web Scraping (for indie blogs), and Goodreads API/Scraping for popularity comparison.

Current Configuration#

The Pipeline Chain#

  1. Ingestion: n8n fetches RSS feeds and scrapes blogs.
  2. Filtering (E4B): Classifies content (Review vs. News). Discards non-reviews.
  3. Extraction (26B): Extracts title, author, publisher, and critical language.
  4. Scoring (26B): Analyates “prestige signals” (e.g., phrases like “formally ambitious”) and compares against Goodreads popularity.
  5. Aggregation: Aggregates data into PostgreSQL.
  6. Publication: n8n generates a Markdown file and commits it to a Hugo repository.

Data Schema (PostgreSQL)#

  • sources: RSS metadata.
  • articles: Raw ingested content.
  • books: Normalized book records.
  • reviews: Links articles to books + extracted critical language.
  • prestige_scores: Historical scoring for trend tracking.

Open Questions#

  • How to effectively scrape Substack/paywalled content without high costs.
  • Determining the optimal frequency for the pipeline run (Weekly vs. Bi-weekly).

Open WebUI Deployment, Ollama Configuration, AI-Driven Monitoring Pipeline

Ollama Configuration

Ollama Configuration#

What Was Established#

Ollama is used as the primary model backend across the fleet. The configuration focuses on making the API accessible to other nodes (like the T480 orchestrator) and running models like Gemma 4.

Key Decisions#

To allow remote access from other machines in the homelab (e.g., from a Docker container or another PC), the Ollama service must be configured to listen on all network interfaces, not just localhost.

Open WebUI Deployment

Open WebUI Deployment#

What Was Established#

Open WebUI is deployed via Docker to provide a ChatGPT-like interface for interacting with local Ollama instances. It is configured to connect to the host’s Ollama API.

Key Decisions#

Because the WebUI runs inside a Docker container, it cannot reach localhost:11434 of the host machine directly. The OLLMA_BASE_URL must point to the host’s actual LAN IP or use the host.docker.internal gateway.

Current Configuration#

Docker Deployment#

docker run -d \
  --name open-webui \
  --restart always \
  -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --add-host=host.docker.internal:host-gateway \
  -e OLLAMA_BASE_URL=http://<YOUR_HOST_IP>:11434 \
  ghcr.io/open-webui/open-webui:main

Note: Replace <YOUR_HOST_IP> with the actual IP of the machine (e.g., 192.168.172.168) to ensure the container can route to the Ollama service.