# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview X-Newsletter is a Node.js application that generates daily tech newsletters by scraping posts from X (Twitter) accounts, processing them by topic, and sending HTML emails with AI-generated summaries. The application supports two data sources: - **XScraper** (primary): Playwright-based browser scraper that directly loads X.com profiles - **NitterRssFetcher** (fallback): Uses RSS feeds from Nitter instances for public X data ## Architecture The pipeline follows a clear sequential flow: ``` XScraper/NitterRssFetcher → TweetProcessor → SummaryGenerator → EmailService ``` ### Key Components **Data Flow:** - `NewsletterPipeline` (src/core/): Orchestrates the entire pipeline, handling errors at each stage - `XScraper` (src/services/scraper/): Uses Playwright to scrape X.com profiles for tweets - `TweetProcessor` (src/core/): Filters tweets (retweets/replies) and groups them by topic - `SummaryGenerator` (src/services/ai/): Generates AI summaries per topic and daily insights using OpenRouter API - `EmailService` (src/services/email/): Sends newsletters via Brevo SMTP with HTML templates **Configuration & Data:** - `src/config/`: Environment-based configuration using Zod validation - `src/config/accounts.ts`: List of 50+ tech accounts organized by category (AI/ML, SWE, General Tech) - `src/config/topics.ts`: Topic definitions matching account categories - `src/types/`: TypeScript interfaces for tweets, summaries, newsletters, and pipeline results **Supporting Services:** - `CronScheduler` (src/services/scheduler/): Schedules daily newsletter runs using node-cron - `OpenRouterClient` (src/services/ai/): Wraps OpenRouter API for LLM calls - `Logger` (src/utils/): Pino-based structured logging - `RetryUtil` (src/utils/): Exponential backoff retry logic for failed requests ### Topic Categories Three topics defined in config/topics.ts: - `ai_ml`: AI/Machine Learning (tracked by ~14 accounts) - `swe`: Software Engineering (tracked by ~13 accounts) - `general_tech`: General Tech/Startups (tracked by ~12 accounts) ## Development Commands ```bash # Install dependencies npm install # Build (TypeScript compilation) npm run build # Run scheduled service (waits for cron schedule) npm start # Run pipeline immediately (once) npm run dev --run-now # Test run without sending email npm run dry-run # Development with tsx (faster iteration) npm run dev ``` ## Configuration All configuration via environment variables (see `.env.example`): **Critical (will cause startup failure if missing):** - `OPENROUTER_API_KEY`: API key for AI summaries - `BREVO_SMTP_USER` & `BREVO_SMTP_KEY`: Email credentials - `EMAIL_RECIPIENTS`: Comma-separated list of recipients **Important:** - `NITTER_INSTANCES`: Fallback RSS sources (comma-separated URLs) - `CRON_SCHEDULE`: Cron expression for daily run (default: `0 7 * * *` = 7 AM) - `CRON_TIMEZONE`: Timezone for schedule (default: `Europe/Warsaw`) - `OPENROUTER_MODEL`: LLM model choice (supports Claude, GPT-4, Gemini) **Feature Flags:** - `ENABLE_AI_SUMMARIES`: Toggle AI generation (gracefully falls back to basic summaries) - `INCLUDE_RETWEETS` / `INCLUDE_REPLIES`: Filter tweet types (default: false for both) - `DRY_RUN`: Skip actual email sending (useful for testing) ## Error Handling The pipeline collects errors at each stage and includes them in the newsletter: - **Stage**: rss, process, ai, email - **Strategy**: Partial failures don't stop the pipeline (e.g., one account failing doesn't block others) - **Fallbacks**: - If AI summary fails, uses basic tweet count summary - If email fails, logs error but marks run as unsuccessful - Missing tweets from a source doesn't fail the run if others succeed ## Testing the Pipeline ```bash # Dry run (validates all steps without sending email) npm run dry-run # Test immediate execution npm run dev --run-now # Check logs (Pino pretty-printed) npm run dev | npx pino-pretty ``` ## Code Style & Standards - **TypeScript**: `strict: true` - no implicit any, full type safety - **Imports**: ESM modules with `.js` extensions in imports - **Async**: Heavy use of async/await with proper error handling - **Logging**: Structured logs via Pino (use logger.info/warn/error with objects) - **Validation**: Zod schemas in config validation ## Key Implementation Notes **Playwright Scraper:** - Runs headless with sandbox disabled for production environments - Includes respectful delays between account scrapes (1-3 seconds random) - Handles missing tweet selectors gracefully - Extracts tweet ID from URL, content, timestamp, and links **Retry Logic:** - RSS fetch retries across multiple Nitter instances on failure - Exponential backoff (500ms base delay, configurable max attempts) - Automatically rotates Nitter instances on rate limits (429) **AI Integration:** - Uses OpenRouter (provider-agnostic LLM API) - Builds structured prompts including tweet context and topic info - Parses JSON responses for highlights, summary, and trends - Falls back to basic summaries if generation fails **Email:** - HTML template with styled sections per topic - Includes highlights with author info and links - Newsletter metadata (date, stats) - All templates in src/services/email/templates.ts ## Common Modifications **Adding new accounts:** Edit `src/config/accounts.ts` (add to appropriate category) **Changing topics:** Edit `src/config/topics.ts` and update `TopicId` type in `src/types/index.ts` **Modifying email template:** Edit `src/services/email/templates.ts` - uses template literals with styled HTML **Changing AI model:** Set `OPENROUTER_MODEL` env var (any OpenRouter-supported model works) **Disabling AI:** Set `ENABLE_AI_SUMMARIES=false` - system continues with basic summaries