Documents project architecture, data flow, configuration requirements, and common modification patterns for future Claude Code sessions. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
5.7 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
X-Newsletter is a Node.js application that generates daily tech newsletters by scraping posts from X (Twitter) accounts, processing them by topic, and sending HTML emails with AI-generated summaries.
The application supports two data sources:
- XScraper (primary): Playwright-based browser scraper that directly loads X.com profiles
- NitterRssFetcher (fallback): Uses RSS feeds from Nitter instances for public X data
Architecture
The pipeline follows a clear sequential flow:
XScraper/NitterRssFetcher → TweetProcessor → SummaryGenerator → EmailService
Key Components
Data Flow:
NewsletterPipeline(src/core/): Orchestrates the entire pipeline, handling errors at each stageXScraper(src/services/scraper/): Uses Playwright to scrape X.com profiles for tweetsTweetProcessor(src/core/): Filters tweets (retweets/replies) and groups them by topicSummaryGenerator(src/services/ai/): Generates AI summaries per topic and daily insights using OpenRouter APIEmailService(src/services/email/): Sends newsletters via Brevo SMTP with HTML templates
Configuration & Data:
src/config/: Environment-based configuration using Zod validationsrc/config/accounts.ts: List of 50+ tech accounts organized by category (AI/ML, SWE, General Tech)src/config/topics.ts: Topic definitions matching account categoriessrc/types/: TypeScript interfaces for tweets, summaries, newsletters, and pipeline results
Supporting Services:
CronScheduler(src/services/scheduler/): Schedules daily newsletter runs using node-cronOpenRouterClient(src/services/ai/): Wraps OpenRouter API for LLM callsLogger(src/utils/): Pino-based structured loggingRetryUtil(src/utils/): Exponential backoff retry logic for failed requests
Topic Categories
Three topics defined in config/topics.ts:
ai_ml: AI/Machine Learning (tracked by ~14 accounts)swe: Software Engineering (tracked by ~13 accounts)general_tech: General Tech/Startups (tracked by ~12 accounts)
Development Commands
# Install dependencies
npm install
# Build (TypeScript compilation)
npm run build
# Run scheduled service (waits for cron schedule)
npm start
# Run pipeline immediately (once)
npm run dev --run-now
# Test run without sending email
npm run dry-run
# Development with tsx (faster iteration)
npm run dev
Configuration
All configuration via environment variables (see .env.example):
Critical (will cause startup failure if missing):
OPENROUTER_API_KEY: API key for AI summariesBREVO_SMTP_USER&BREVO_SMTP_KEY: Email credentialsEMAIL_RECIPIENTS: Comma-separated list of recipients
Important:
NITTER_INSTANCES: Fallback RSS sources (comma-separated URLs)CRON_SCHEDULE: Cron expression for daily run (default:0 7 * * *= 7 AM)CRON_TIMEZONE: Timezone for schedule (default:Europe/Warsaw)OPENROUTER_MODEL: LLM model choice (supports Claude, GPT-4, Gemini)
Feature Flags:
ENABLE_AI_SUMMARIES: Toggle AI generation (gracefully falls back to basic summaries)INCLUDE_RETWEETS/INCLUDE_REPLIES: Filter tweet types (default: false for both)DRY_RUN: Skip actual email sending (useful for testing)
Error Handling
The pipeline collects errors at each stage and includes them in the newsletter:
- Stage: rss, process, ai, email
- Strategy: Partial failures don't stop the pipeline (e.g., one account failing doesn't block others)
- Fallbacks:
- If AI summary fails, uses basic tweet count summary
- If email fails, logs error but marks run as unsuccessful
- Missing tweets from a source doesn't fail the run if others succeed
Testing the Pipeline
# Dry run (validates all steps without sending email)
npm run dry-run
# Test immediate execution
npm run dev --run-now
# Check logs (Pino pretty-printed)
npm run dev | npx pino-pretty
Code Style & Standards
- TypeScript:
strict: true- no implicit any, full type safety - Imports: ESM modules with
.jsextensions in imports - Async: Heavy use of async/await with proper error handling
- Logging: Structured logs via Pino (use logger.info/warn/error with objects)
- Validation: Zod schemas in config validation
Key Implementation Notes
Playwright Scraper:
- Runs headless with sandbox disabled for production environments
- Includes respectful delays between account scrapes (1-3 seconds random)
- Handles missing tweet selectors gracefully
- Extracts tweet ID from URL, content, timestamp, and links
Retry Logic:
- RSS fetch retries across multiple Nitter instances on failure
- Exponential backoff (500ms base delay, configurable max attempts)
- Automatically rotates Nitter instances on rate limits (429)
AI Integration:
- Uses OpenRouter (provider-agnostic LLM API)
- Builds structured prompts including tweet context and topic info
- Parses JSON responses for highlights, summary, and trends
- Falls back to basic summaries if generation fails
Email:
- HTML template with styled sections per topic
- Includes highlights with author info and links
- Newsletter metadata (date, stats)
- All templates in src/services/email/templates.ts
Common Modifications
Adding new accounts: Edit src/config/accounts.ts (add to appropriate category)
Changing topics: Edit src/config/topics.ts and update TopicId type in src/types/index.ts
Modifying email template: Edit src/services/email/templates.ts - uses template literals with styled HTML
Changing AI model: Set OPENROUTER_MODEL env var (any OpenRouter-supported model works)
Disabling AI: Set ENABLE_AI_SUMMARIES=false - system continues with basic summaries