//nbkelley /homelab

Book Discovery Pipeline

Book Discovery Pipeline#

What Was Established#

A multi-agent, multi-node pipeline designed to identify high-prestige, upcoming literary works by analyzing critical reviews before they hit the mainstream.

Key Decisions#

  • Architecture: Two-tier agent system. Lightweight models (E4B) on the Orchestrator (T480) handle routing/filtering; heavier models (26B) on the Inference node (Pavilion) handle deep analysis.
  • Tech Stack: n8n (Orchestration), PostgreSQL (Data Storage), Hugo (Static Site Generation), Python/JS (Custom Logic).
  • Data Sources: RSS feeds (Literary Hub, etc.), Web Scraping (for indie blogs), and Goodreads API/Scraping for popularity comparison.

Current Configuration#

The Pipeline Chain#

  1. Ingestion: n8n fetches RSS feeds and scrapes blogs.
  2. Filtering (E4B): Classifies content (Review vs. News). Discards non-reviews.
  3. Extraction (26B): Extracts title, author, publisher, and critical language.
  4. Scoring (26B): Analyates “prestige signals” (e.g., phrases like “formally ambitious”) and compares against Goodreads popularity.
  5. Aggregation: Aggregates data into PostgreSQL.
  6. Publication: n8n generates a Markdown file and commits it to a Hugo repository.

Data Schema (PostgreSQL)#

  • sources: RSS metadata.
  • articles: Raw ingested content.
  • books: Normalized book records.
  • reviews: Links articles to books + extracted critical language.
  • prestige_scores: Historical scoring for trend tracking.

Open Questions#

  • How to effectively scrape Substack/paywalled content without high costs.
  • Determining the optimal frequency for the pipeline run (Weekly vs. Bi-weekly).

Open WebUI Deployment, Ollama Configuration, AI-Driven Monitoring Pipeline