Pavilion (AI PC) Configuration#
What Was Established#
The Pavilion machine uses a USB-based or specific Ethernet interface (enx6c1f7197a66) that occasionally fails to bring the link up automatically on boot.
Current Configuration#
Netplan Configuration#
Ensure /etc/netplan/01-netcfg.yaml is correctly configured with the active interface name and permissions are set to 600.
network:
version: 2
ethernets:
enx6c1f7197a66:
dhcp4: trueApply with:
sudo chmod 600 /etc/netplan/01-netcfg.yaml
sudo netplan applyBoot-time Interface Fix#
If the interface remains DOWN after reboot, use a systemd service to force the link up.
Create /etc/systemd/system/eth-up.service:
[Unit]
Description=Bring up ethernet on boot
After=network.target
[Service]
Type=oneshot
ExecStart=/sbin/ip link set enx6c1f7197a66 up
RemainAfterExit=yes
[Install]
WantedBy=multi-user.targetEnable the service:
sudo systemctl enable eth-up.serviceOllama Configuration#
| Detail | Value |
|---|---|
| Version | 0.20.6 |
| Endpoint | http://192.168.2.192:11434 |
| Mode | CPU-only (CUDA disabled) |
| Active model | gemma4:e4b |
| Inference speed | ~15-18 t/s (E4B) |
Why CUDA is disabled#
The MX550 has 4GB VRAM. gemma4:e4b at Q4 is ~5GB — it tries to fit the full model on GPU, panics, and crashes Ollama rather than falling back to CPU. CUDA must be disabled entirely.
/etc/systemd/system/ollama.service.d/override.conf:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="CUDA_VISIBLE_DEVICES="Thread count#
Ollama reports NumThreads:2 internally but actual performance is ~15-18 t/s. Attempts to force 12 threads via OLLAMA_NUM_THREADS=12 (env var) had no effect. Modelfile num_thread 12 produced 1.77 t/s (much worse). Trust Ollama’s defaults — don’t tune threads.
Models available#
gemma4:e4b— primary model for monitoring pipeline (all 5 Ollama calls)nomic-embed-text— wiki semantic embeddings (768-dim vectors); called by wiki pipeline on wiki-llm viahttp://192.168.2.192:11434/api/embeddingsgemma4:26b/gemma3:27b— previous models, not currently active
nomic-embed-text payload ceiling#
HTTP 500 above ~11KB input. The wiki pipeline works around this by splitting page content into 5000-char chunks, embedding each chunk independently, and averaging the resulting vectors into a single 768-dim embedding. Pages with more than 8 chunks sample beginning + middle + end before averaging.
Custom thread-count models (gemma4:e4b-12t, gemma4:e4b-4t) were created during thread-count experiments and then removed — they performed worse than defaults.
Known issue: SSH from home network#
SSH from Mac to 192.168.2.192 fails when Mac is on home WiFi (Mithrandir VLAN). SSH works fine via VPN from external networks, and from the Proxmox shell. Likely inter-VLAN routing issue on the Mac side. Workaround: use VPN or SSH via Proxmox jump host.
Related Pages#
Node Exporter Deployment, AI-Driven Monitoring Pipeline, AI Infrastructure Overview, Mac Studio
Sources#
ingested/chats/Homelab-AI---2026-04-14.mdHomelab AI - 2026-04-14 ·ingested/chats/Homelab AI - 2026-04-14