Pavilion (AI PC) Configuration#

What Was Established#

The Pavilion machine uses a USB-based or specific Ethernet interface (enx6c1f7197a66) that occasionally fails to bring the link up automatically on boot.

Current Configuration#

Netplan Configuration#

Ensure /etc/netplan/01-netcfg.yaml is correctly configured with the active interface name and permissions are set to 600.

network:
  version: 2
  ethernets:
    enx6c1f7197a66:
      dhcp4: true

Apply with:

sudo chmod 600 /etc/netplan/01-netcfg.yaml
sudo netplan apply

Boot-time Interface Fix#

If the interface remains DOWN after reboot, use a systemd service to force the link up.

Create /etc/systemd/system/eth-up.service:

[Unit]
Description=Bring up ethernet on boot
After=network.target

[Service]
Type=oneshot
ExecStart=/sbin/ip link set enx6c1f7197a66 up
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Enable the service:

sudo systemctl enable eth-up.service

Ollama Configuration#

Detail	Value
Version	0.20.6
Endpoint	http://192.168.2.192:11434
Mode	CPU-only (CUDA disabled)
Active model	gemma4:e4b
Inference speed	~15-18 t/s (E4B)

Why CUDA is disabled#

The MX550 has 4GB VRAM. gemma4:e4b at Q4 is ~5GB — it tries to fit the full model on GPU, panics, and crashes Ollama rather than falling back to CPU. CUDA must be disabled entirely.

/etc/systemd/system/ollama.service.d/override.conf:

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="CUDA_VISIBLE_DEVICES="

Thread count#

Ollama reports NumThreads:2 internally but actual performance is ~15-18 t/s. Attempts to force 12 threads via OLLAMA_NUM_THREADS=12 (env var) had no effect. Modelfile num_thread 12 produced 1.77 t/s (much worse). Trust Ollama’s defaults — don’t tune threads.

Models available#

gemma4:e4b — primary model for monitoring pipeline (all 5 Ollama calls)
nomic-embed-text — wiki semantic embeddings (768-dim vectors); called by wiki pipeline on wiki-llm via http://192.168.2.192:11434/api/embeddings
gemma4:26b / gemma3:27b — previous models, not currently active

nomic-embed-text payload ceiling#

HTTP 500 above ~11KB input. The wiki pipeline works around this by splitting page content into 5000-char chunks, embedding each chunk independently, and averaging the resulting vectors into a single 768-dim embedding. Pages with more than 8 chunks sample beginning + middle + end before averaging.

Custom thread-count models (gemma4:e4b-12t, gemma4:e4b-4t) were created during thread-count experiments and then removed — they performed worse than defaults.

Known issue: SSH from home network#

SSH from Mac to 192.168.2.192 fails when Mac is on home WiFi (Mithrandir VLAN). SSH works fine via VPN from external networks, and from the Proxmox shell. Likely inter-VLAN routing issue on the Mac side. Workaround: use VPN or SSH via Proxmox jump host.

Node Exporter Deployment, AI-Driven Monitoring Pipeline, AI Infrastructure Overview, Mac Studio

Sources#

ingested/chats/Homelab-AI---2026-04-14.md Homelab AI - 2026-04-14 · ingested/chats/Homelab AI - 2026-04-14