From 3c31f41122f6e890f60a8ce4b0a28abe6bedefed Mon Sep 17 00:00:00 2001 From: ksalk Date: Mon, 12 Jan 2026 12:05:52 +0000 Subject: [PATCH] Add CLAUDE.md documentation for AI-assisted development Documents project architecture, data flow, configuration requirements, and common modification patterns for future Claude Code sessions. Co-Authored-By: Claude Haiku 4.5 --- CLAUDE.md | 157 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 157 insertions(+) create mode 100644 CLAUDE.md diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..e42bac9 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,157 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +X-Newsletter is a Node.js application that generates daily tech newsletters by scraping posts from X (Twitter) accounts, processing them by topic, and sending HTML emails with AI-generated summaries. + +The application supports two data sources: +- **XScraper** (primary): Playwright-based browser scraper that directly loads X.com profiles +- **NitterRssFetcher** (fallback): Uses RSS feeds from Nitter instances for public X data + +## Architecture + +The pipeline follows a clear sequential flow: + +``` +XScraper/NitterRssFetcher → TweetProcessor → SummaryGenerator → EmailService +``` + +### Key Components + +**Data Flow:** +- `NewsletterPipeline` (src/core/): Orchestrates the entire pipeline, handling errors at each stage +- `XScraper` (src/services/scraper/): Uses Playwright to scrape X.com profiles for tweets +- `TweetProcessor` (src/core/): Filters tweets (retweets/replies) and groups them by topic +- `SummaryGenerator` (src/services/ai/): Generates AI summaries per topic and daily insights using OpenRouter API +- `EmailService` (src/services/email/): Sends newsletters via Brevo SMTP with HTML templates + +**Configuration & Data:** +- `src/config/`: Environment-based configuration using Zod validation +- `src/config/accounts.ts`: List of 50+ tech accounts organized by category (AI/ML, SWE, General Tech) +- `src/config/topics.ts`: Topic definitions matching account categories +- `src/types/`: TypeScript interfaces for tweets, summaries, newsletters, and pipeline results + +**Supporting Services:** +- `CronScheduler` (src/services/scheduler/): Schedules daily newsletter runs using node-cron +- `OpenRouterClient` (src/services/ai/): Wraps OpenRouter API for LLM calls +- `Logger` (src/utils/): Pino-based structured logging +- `RetryUtil` (src/utils/): Exponential backoff retry logic for failed requests + +### Topic Categories + +Three topics defined in config/topics.ts: +- `ai_ml`: AI/Machine Learning (tracked by ~14 accounts) +- `swe`: Software Engineering (tracked by ~13 accounts) +- `general_tech`: General Tech/Startups (tracked by ~12 accounts) + +## Development Commands + +```bash +# Install dependencies +npm install + +# Build (TypeScript compilation) +npm run build + +# Run scheduled service (waits for cron schedule) +npm start + +# Run pipeline immediately (once) +npm run dev --run-now + +# Test run without sending email +npm run dry-run + +# Development with tsx (faster iteration) +npm run dev +``` + +## Configuration + +All configuration via environment variables (see `.env.example`): + +**Critical (will cause startup failure if missing):** +- `OPENROUTER_API_KEY`: API key for AI summaries +- `BREVO_SMTP_USER` & `BREVO_SMTP_KEY`: Email credentials +- `EMAIL_RECIPIENTS`: Comma-separated list of recipients + +**Important:** +- `NITTER_INSTANCES`: Fallback RSS sources (comma-separated URLs) +- `CRON_SCHEDULE`: Cron expression for daily run (default: `0 7 * * *` = 7 AM) +- `CRON_TIMEZONE`: Timezone for schedule (default: `Europe/Warsaw`) +- `OPENROUTER_MODEL`: LLM model choice (supports Claude, GPT-4, Gemini) + +**Feature Flags:** +- `ENABLE_AI_SUMMARIES`: Toggle AI generation (gracefully falls back to basic summaries) +- `INCLUDE_RETWEETS` / `INCLUDE_REPLIES`: Filter tweet types (default: false for both) +- `DRY_RUN`: Skip actual email sending (useful for testing) + +## Error Handling + +The pipeline collects errors at each stage and includes them in the newsletter: +- **Stage**: rss, process, ai, email +- **Strategy**: Partial failures don't stop the pipeline (e.g., one account failing doesn't block others) +- **Fallbacks**: + - If AI summary fails, uses basic tweet count summary + - If email fails, logs error but marks run as unsuccessful + - Missing tweets from a source doesn't fail the run if others succeed + +## Testing the Pipeline + +```bash +# Dry run (validates all steps without sending email) +npm run dry-run + +# Test immediate execution +npm run dev --run-now + +# Check logs (Pino pretty-printed) +npm run dev | npx pino-pretty +``` + +## Code Style & Standards + +- **TypeScript**: `strict: true` - no implicit any, full type safety +- **Imports**: ESM modules with `.js` extensions in imports +- **Async**: Heavy use of async/await with proper error handling +- **Logging**: Structured logs via Pino (use logger.info/warn/error with objects) +- **Validation**: Zod schemas in config validation + +## Key Implementation Notes + +**Playwright Scraper:** +- Runs headless with sandbox disabled for production environments +- Includes respectful delays between account scrapes (1-3 seconds random) +- Handles missing tweet selectors gracefully +- Extracts tweet ID from URL, content, timestamp, and links + +**Retry Logic:** +- RSS fetch retries across multiple Nitter instances on failure +- Exponential backoff (500ms base delay, configurable max attempts) +- Automatically rotates Nitter instances on rate limits (429) + +**AI Integration:** +- Uses OpenRouter (provider-agnostic LLM API) +- Builds structured prompts including tweet context and topic info +- Parses JSON responses for highlights, summary, and trends +- Falls back to basic summaries if generation fails + +**Email:** +- HTML template with styled sections per topic +- Includes highlights with author info and links +- Newsletter metadata (date, stats) +- All templates in src/services/email/templates.ts + +## Common Modifications + +**Adding new accounts:** Edit `src/config/accounts.ts` (add to appropriate category) + +**Changing topics:** Edit `src/config/topics.ts` and update `TopicId` type in `src/types/index.ts` + +**Modifying email template:** Edit `src/services/email/templates.ts` - uses template literals with styled HTML + +**Changing AI model:** Set `OPENROUTER_MODEL` env var (any OpenRouter-supported model works) + +**Disabling AI:** Set `ENABLE_AI_SUMMARIES=false` - system continues with basic summaries