Book Discovery Pipeline#
What Was Established#
A multi-agent, multi-node pipeline designed to identify high-prestige, upcoming literary works by analyzing critical reviews before they hit the mainstream.
Key Decisions#
- Architecture: Two-tier agent system. Lightweight models (E4B) on the Orchestrator (T480) handle routing/filtering; heavier models (26B) on the Inference node (Pavilion) handle deep analysis.
- Tech Stack: n8n (Orchestration), PostgreSQL (Data Storage), Hugo (Static Site Generation), Python/JS (Custom Logic).
- Data Sources: RSS feeds (Literary Hub, etc.), Web Scraping (for indie blogs), and Goodreads API/Scraping for popularity comparison.
Current Configuration#
The Pipeline Chain#
- Ingestion: n8n fetches RSS feeds and scrapes blogs.
- Filtering (E4B): Classifies content (Review vs. News). Discards non-reviews.
- Extraction (26B): Extracts title, author, publisher, and critical language.
- Scoring (26B): Analyates “prestige signals” (e.g., phrases like “formally ambitious”) and compares against Goodreads popularity.
- Aggregation: Aggregates data into PostgreSQL.
- Publication: n8n generates a Markdown file and commits it to a Hugo repository.
Data Schema (PostgreSQL)#
sources: RSS metadata.articles: Raw ingested content.books: Normalized book records.reviews: Links articles to books + extracted critical language.prestige_scores: Historical scoring for trend tracking.
Open Questions#
- How to effectively scrape Substack/paywalled content without high costs.
- Determining the optimal frequency for the pipeline run (Weekly vs. Bi-weekly).
Related Pages#
Open WebUI Deployment, Ollama Configuration, AI-Driven Monitoring Pipeline
Sources#
Homelab AI - 2026-04-13 · ingested/chats/Homelab AI - 2026-04-13