Book Discovery Pipeline

Mon, 13 Apr 2026 00:00:00 +0000

Book Discovery Pipeline#

What Was Established#

A multi-agent, multi-node pipeline designed to identify high-prestige, upcoming literary works by analyzing critical reviews before they hit the mainstream.

Key Decisions#

Architecture: Two-tier agent system. Lightweight models (E4B) on the Orchestrator (T480) handle routing/filtering; heavier models (26B) on the Inference node (Pavilion) handle deep analysis.
Tech Stack: n8n (Orchestration), PostgreSQL (Data Storage), Hugo (Static Site Generation), Python/JS (Custom Logic).
Data Sources: RSS feeds (Literary Hub, etc.), Web Scraping (for indie blogs), and Goodreads API/Scraping for popularity comparison.

Current Configuration#

The Pipeline Chain#

Ingestion: n8n fetches RSS feeds and scrapes blogs.
Filtering (E4B): Classifies content (Review vs. News). Discards non-reviews.
Extraction (26B): Extracts title, author, publisher, and critical language.
Scoring (26B): Analyates “prestige signals” (e.g., phrases like “formally ambitious”) and compares against Goodreads popularity.
Aggregation: Aggregates data into PostgreSQL.
Publication: n8n generates a Markdown file and commits it to a Hugo repository.

Data Schema (PostgreSQL)#

sources: RSS metadata.
articles: Raw ingested content.
books: Normalized book records.
reviews: Links articles to books + extracted critical language.
prestige_scores: Historical scoring for trend tracking.

Open Questions#

How to effectively scrape Substack/paywalled content without high costs.
Determining the optimal frequency for the pipeline run (Weekly vs. Bi-weekly).

Open WebUI Deployment, Ollama Configuration, AI-Driven Monitoring Pipeline

Agents on homelab

Book Discovery Pipeline

Book Discovery Pipeline#

What Was Established#

Key Decisions#

Current Configuration#

The Pipeline Chain#

Data Schema (PostgreSQL)#

Open Questions#

Related Pages#