Files

ksalk 3c31f41122 Add CLAUDE.md documentation for AI-assisted development

Documents project architecture, data flow, configuration requirements, and common modification patterns for future Claude Code sessions.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2026-01-12 12:05:52 +00:00

5.7 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

X-Newsletter is a Node.js application that generates daily tech newsletters by scraping posts from X (Twitter) accounts, processing them by topic, and sending HTML emails with AI-generated summaries.

The application supports two data sources:

XScraper (primary): Playwright-based browser scraper that directly loads X.com profiles
NitterRssFetcher (fallback): Uses RSS feeds from Nitter instances for public X data

Architecture

The pipeline follows a clear sequential flow:

XScraper/NitterRssFetcher → TweetProcessor → SummaryGenerator → EmailService

Key Components

Data Flow:

NewsletterPipeline (src/core/): Orchestrates the entire pipeline, handling errors at each stage
XScraper (src/services/scraper/): Uses Playwright to scrape X.com profiles for tweets
TweetProcessor (src/core/): Filters tweets (retweets/replies) and groups them by topic
SummaryGenerator (src/services/ai/): Generates AI summaries per topic and daily insights using OpenRouter API
EmailService (src/services/email/): Sends newsletters via Brevo SMTP with HTML templates

Configuration & Data:

src/config/: Environment-based configuration using Zod validation
src/config/accounts.ts: List of 50+ tech accounts organized by category (AI/ML, SWE, General Tech)
src/config/topics.ts: Topic definitions matching account categories
src/types/: TypeScript interfaces for tweets, summaries, newsletters, and pipeline results

Supporting Services:

CronScheduler (src/services/scheduler/): Schedules daily newsletter runs using node-cron
OpenRouterClient (src/services/ai/): Wraps OpenRouter API for LLM calls
Logger (src/utils/): Pino-based structured logging
RetryUtil (src/utils/): Exponential backoff retry logic for failed requests

Topic Categories

Three topics defined in config/topics.ts:

ai_ml: AI/Machine Learning (tracked by ~14 accounts)
swe: Software Engineering (tracked by ~13 accounts)
general_tech: General Tech/Startups (tracked by ~12 accounts)

Development Commands

# Install dependencies
npm install

# Build (TypeScript compilation)
npm run build

# Run scheduled service (waits for cron schedule)
npm start

# Run pipeline immediately (once)
npm run dev --run-now

# Test run without sending email
npm run dry-run

# Development with tsx (faster iteration)
npm run dev

Configuration

All configuration via environment variables (see .env.example):

Critical (will cause startup failure if missing):

OPENROUTER_API_KEY: API key for AI summaries
BREVO_SMTP_USER & BREVO_SMTP_KEY: Email credentials
EMAIL_RECIPIENTS: Comma-separated list of recipients

Important:

NITTER_INSTANCES: Fallback RSS sources (comma-separated URLs)
CRON_SCHEDULE: Cron expression for daily run (default: 0 7 * * * = 7 AM)
CRON_TIMEZONE: Timezone for schedule (default: Europe/Warsaw)
OPENROUTER_MODEL: LLM model choice (supports Claude, GPT-4, Gemini)

Feature Flags:

ENABLE_AI_SUMMARIES: Toggle AI generation (gracefully falls back to basic summaries)
INCLUDE_RETWEETS / INCLUDE_REPLIES: Filter tweet types (default: false for both)
DRY_RUN: Skip actual email sending (useful for testing)

Error Handling

The pipeline collects errors at each stage and includes them in the newsletter:

Stage: rss, process, ai, email
Strategy: Partial failures don't stop the pipeline (e.g., one account failing doesn't block others)
Fallbacks:
- If AI summary fails, uses basic tweet count summary
- If email fails, logs error but marks run as unsuccessful
- Missing tweets from a source doesn't fail the run if others succeed

Testing the Pipeline

# Dry run (validates all steps without sending email)
npm run dry-run

# Test immediate execution
npm run dev --run-now

# Check logs (Pino pretty-printed)
npm run dev | npx pino-pretty

Code Style & Standards

TypeScript: strict: true - no implicit any, full type safety
Imports: ESM modules with .js extensions in imports
Async: Heavy use of async/await with proper error handling
Logging: Structured logs via Pino (use logger.info/warn/error with objects)
Validation: Zod schemas in config validation

Key Implementation Notes

Playwright Scraper:

Runs headless with sandbox disabled for production environments
Includes respectful delays between account scrapes (1-3 seconds random)
Handles missing tweet selectors gracefully
Extracts tweet ID from URL, content, timestamp, and links

Retry Logic:

RSS fetch retries across multiple Nitter instances on failure
Exponential backoff (500ms base delay, configurable max attempts)
Automatically rotates Nitter instances on rate limits (429)

AI Integration:

Uses OpenRouter (provider-agnostic LLM API)
Builds structured prompts including tweet context and topic info
Parses JSON responses for highlights, summary, and trends
Falls back to basic summaries if generation fails

Email:

HTML template with styled sections per topic
Includes highlights with author info and links
Newsletter metadata (date, stats)
All templates in src/services/email/templates.ts

Common Modifications

Adding new accounts: Edit src/config/accounts.ts (add to appropriate category)

Changing topics: Edit src/config/topics.ts and update TopicId type in src/types/index.ts

Modifying email template: Edit src/services/email/templates.ts - uses template literals with styled HTML

Changing AI model: Set OPENROUTER_MODEL env var (any OpenRouter-supported model works)

Disabling AI: Set ENABLE_AI_SUMMARIES=false - system continues with basic summaries

5.7 KiB Raw Blame History