Mac Studio#
What Was Established#
Mac Studio M1 Max 64GB purchased 2026-04-17 for $2,299 (used/refurbished market). Deployed 2026-04-24 as the primary AI inference node for the homelab, replacing the Pavilion as the fast interactive reasoning machine.
Hardware#
| Detail | Value |
|---|---|
| Model | Mac Studio (2022, M1 Max) |
| Hostname | Legolas |
| IP | 192.168.1.45 |
| Memory | 64GB Unified Memory |
| Memory bandwidth | ~400 GB/s |
| Purchase price | $2,299 (used) |
| Status | Deployed, operational 2026-04-24 |
Why This Hardware#
64GB unified memory is the key spec for LLM inference on Apple Silicon. Memory bandwidth determines tokens/second — the M1 Max at ~400 GB/s delivers ~25+ t/s for 27-31B models vs the Pavilion’s ~15 t/s CPU-only for E4B. At the time of purchase, 64GB Mac Studio/Mac Mini new from Apple was backordered to late June. The used M1 Max at $2,299 was judged a reasonable buy given scarcity.
Role#
- Primary LLM inference: all wiki pipeline LLM calls route to Legolas via Ollama at port 11434
- Multiple simultaneous models: 64GB allows multiple models resident at once (no reload wait)
- Monitoring pipeline: will take over final synthesis call from Pavilion (not yet migrated)
Models (active as of 2026-04-26)#
| Model | Use |
|---|---|
gemma4:e2b |
Text cleaning / Markdown cleanup (step 1 of wiki pipeline) |
qwen3.6:35b-a3b-coding-nvfp4 |
JSON crystallization + wiki page generation (step 2) |
minicpm-v:8b |
PDF visual pages + image OCR (multimodal) |
Workflow Split#
- Legolas (Mac Studio): real-time inference for wiki pipeline, all three model tiers
- Pavilion (nk-celebrimbor): long-context batch jobs, nomic-embed-text embeddings, scheduled Gemma pipeline
MLX Notes#
As of April 2026, MLX support for Gemma 4 31B is rough — mlx-community 4-bit models fail to load, LM Studio MLX backend doesn’t support Gemma 4, chat templates need manual handling. Ollama is the safer path initially.
Known Issues#
qwen3.6:35b Ollama freeze (macOS)#
Symptom: heartbeat stops; gemma4:e2b responds fine; curl http://192.168.1.45:11434/api/generate with qwen3.6:35b returns nothing (silent hang, not an error). ollama ps shows model as loaded.
Cause: Ollama process gets into a bad state on macOS — model appears loaded but accepts no new requests.
Fix (macOS — NOT systemctl, Legolas is not Linux):
pkill -f ollama
# or via brew:
brew services restart ollamaAfter restart, verify:
curl http://192.168.1.45:11434/api/generate \
-d '{"model":"qwen3.6:35b-a3b-coding-nvfp4","prompt":"Reply OK","stream":false}'Then clear stale lock files on wiki-llm before relaunching:
rm -f /opt/wiki/work/raw/**/*.ingest.lock10s settle delay after consecutive vision calls#
After consecutive minicpm-v:8b requests, gemma4:e2b’s stream closes without done=true. Root cause not diagnosed. Workaround: 10s sleep between final minicpm-v call and the gemma call in convert_to_md.py.
Related Pages#
AI Infrastructure Overview, Pavilion (AI PC) Configuration, AI-Driven Monitoring Pipeline, Wiki Pipeline Scripts
Sources#
Homelab AI - 2026-04-17 · ingested/chats/Homelab-AI---2026-04-17.md
Work wiki session crystallization 2026-04-26 · wiki/ai/wiki-pipeline-scripts.md