Files
x-newsletter/CLAUDE.md
ksalk 3c31f41122 Add CLAUDE.md documentation for AI-assisted development
Documents project architecture, data flow, configuration requirements, and common modification patterns for future Claude Code sessions.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-12 12:05:52 +00:00

5.7 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

X-Newsletter is a Node.js application that generates daily tech newsletters by scraping posts from X (Twitter) accounts, processing them by topic, and sending HTML emails with AI-generated summaries.

The application supports two data sources:

  • XScraper (primary): Playwright-based browser scraper that directly loads X.com profiles
  • NitterRssFetcher (fallback): Uses RSS feeds from Nitter instances for public X data

Architecture

The pipeline follows a clear sequential flow:

XScraper/NitterRssFetcher → TweetProcessor → SummaryGenerator → EmailService

Key Components

Data Flow:

  • NewsletterPipeline (src/core/): Orchestrates the entire pipeline, handling errors at each stage
  • XScraper (src/services/scraper/): Uses Playwright to scrape X.com profiles for tweets
  • TweetProcessor (src/core/): Filters tweets (retweets/replies) and groups them by topic
  • SummaryGenerator (src/services/ai/): Generates AI summaries per topic and daily insights using OpenRouter API
  • EmailService (src/services/email/): Sends newsletters via Brevo SMTP with HTML templates

Configuration & Data:

  • src/config/: Environment-based configuration using Zod validation
  • src/config/accounts.ts: List of 50+ tech accounts organized by category (AI/ML, SWE, General Tech)
  • src/config/topics.ts: Topic definitions matching account categories
  • src/types/: TypeScript interfaces for tweets, summaries, newsletters, and pipeline results

Supporting Services:

  • CronScheduler (src/services/scheduler/): Schedules daily newsletter runs using node-cron
  • OpenRouterClient (src/services/ai/): Wraps OpenRouter API for LLM calls
  • Logger (src/utils/): Pino-based structured logging
  • RetryUtil (src/utils/): Exponential backoff retry logic for failed requests

Topic Categories

Three topics defined in config/topics.ts:

  • ai_ml: AI/Machine Learning (tracked by ~14 accounts)
  • swe: Software Engineering (tracked by ~13 accounts)
  • general_tech: General Tech/Startups (tracked by ~12 accounts)

Development Commands

# Install dependencies
npm install

# Build (TypeScript compilation)
npm run build

# Run scheduled service (waits for cron schedule)
npm start

# Run pipeline immediately (once)
npm run dev --run-now

# Test run without sending email
npm run dry-run

# Development with tsx (faster iteration)
npm run dev

Configuration

All configuration via environment variables (see .env.example):

Critical (will cause startup failure if missing):

  • OPENROUTER_API_KEY: API key for AI summaries
  • BREVO_SMTP_USER & BREVO_SMTP_KEY: Email credentials
  • EMAIL_RECIPIENTS: Comma-separated list of recipients

Important:

  • NITTER_INSTANCES: Fallback RSS sources (comma-separated URLs)
  • CRON_SCHEDULE: Cron expression for daily run (default: 0 7 * * * = 7 AM)
  • CRON_TIMEZONE: Timezone for schedule (default: Europe/Warsaw)
  • OPENROUTER_MODEL: LLM model choice (supports Claude, GPT-4, Gemini)

Feature Flags:

  • ENABLE_AI_SUMMARIES: Toggle AI generation (gracefully falls back to basic summaries)
  • INCLUDE_RETWEETS / INCLUDE_REPLIES: Filter tweet types (default: false for both)
  • DRY_RUN: Skip actual email sending (useful for testing)

Error Handling

The pipeline collects errors at each stage and includes them in the newsletter:

  • Stage: rss, process, ai, email
  • Strategy: Partial failures don't stop the pipeline (e.g., one account failing doesn't block others)
  • Fallbacks:
    • If AI summary fails, uses basic tweet count summary
    • If email fails, logs error but marks run as unsuccessful
    • Missing tweets from a source doesn't fail the run if others succeed

Testing the Pipeline

# Dry run (validates all steps without sending email)
npm run dry-run

# Test immediate execution
npm run dev --run-now

# Check logs (Pino pretty-printed)
npm run dev | npx pino-pretty

Code Style & Standards

  • TypeScript: strict: true - no implicit any, full type safety
  • Imports: ESM modules with .js extensions in imports
  • Async: Heavy use of async/await with proper error handling
  • Logging: Structured logs via Pino (use logger.info/warn/error with objects)
  • Validation: Zod schemas in config validation

Key Implementation Notes

Playwright Scraper:

  • Runs headless with sandbox disabled for production environments
  • Includes respectful delays between account scrapes (1-3 seconds random)
  • Handles missing tweet selectors gracefully
  • Extracts tweet ID from URL, content, timestamp, and links

Retry Logic:

  • RSS fetch retries across multiple Nitter instances on failure
  • Exponential backoff (500ms base delay, configurable max attempts)
  • Automatically rotates Nitter instances on rate limits (429)

AI Integration:

  • Uses OpenRouter (provider-agnostic LLM API)
  • Builds structured prompts including tweet context and topic info
  • Parses JSON responses for highlights, summary, and trends
  • Falls back to basic summaries if generation fails

Email:

  • HTML template with styled sections per topic
  • Includes highlights with author info and links
  • Newsletter metadata (date, stats)
  • All templates in src/services/email/templates.ts

Common Modifications

Adding new accounts: Edit src/config/accounts.ts (add to appropriate category)

Changing topics: Edit src/config/topics.ts and update TopicId type in src/types/index.ts

Modifying email template: Edit src/services/email/templates.ts - uses template literals with styled HTML

Changing AI model: Set OPENROUTER_MODEL env var (any OpenRouter-supported model works)

Disabling AI: Set ENABLE_AI_SUMMARIES=false - system continues with basic summaries