# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

X-Newsletter is a Node.js application that generates daily tech newsletters by scraping posts from X (Twitter) accounts, processing them by topic, and sending HTML emails with AI-generated summaries.

The application supports two data sources:
- **XScraper** (primary): Playwright-based browser scraper that directly loads X.com profiles
- **NitterRssFetcher** (fallback): Uses RSS feeds from Nitter instances for public X data

## Architecture

The pipeline follows a clear sequential flow:

```
XScraper/NitterRssFetcher → TweetProcessor → SummaryGenerator → EmailService
```

### Key Components

**Data Flow:**
- `NewsletterPipeline` (src/core/): Orchestrates the entire pipeline, handling errors at each stage
- `XScraper` (src/services/scraper/): Uses Playwright to scrape X.com profiles for tweets
- `TweetProcessor` (src/core/): Filters tweets (retweets/replies) and groups them by topic
- `SummaryGenerator` (src/services/ai/): Generates AI summaries per topic and daily insights using OpenRouter API
- `EmailService` (src/services/email/): Sends newsletters via Brevo SMTP with HTML templates

**Configuration & Data:**
- `src/config/`: Environment-based configuration using Zod validation
- `src/config/accounts.ts`: List of 50+ tech accounts organized by category (AI/ML, SWE, General Tech)
- `src/config/topics.ts`: Topic definitions matching account categories
- `src/types/`: TypeScript interfaces for tweets, summaries, newsletters, and pipeline results

**Supporting Services:**
- `CronScheduler` (src/services/scheduler/): Schedules daily newsletter runs using node-cron
- `OpenRouterClient` (src/services/ai/): Wraps OpenRouter API for LLM calls
- `Logger` (src/utils/): Pino-based structured logging
- `RetryUtil` (src/utils/): Exponential backoff retry logic for failed requests

### Topic Categories

Three topics defined in config/topics.ts:
- `ai_ml`: AI/Machine Learning (tracked by ~14 accounts)
- `swe`: Software Engineering (tracked by ~13 accounts)
- `general_tech`: General Tech/Startups (tracked by ~12 accounts)

## Development Commands

```bash
# Install dependencies
npm install

# Build (TypeScript compilation)
npm run build

# Run scheduled service (waits for cron schedule)
npm start

# Run pipeline immediately (once)
npm run dev --run-now

# Test run without sending email
npm run dry-run

# Development with tsx (faster iteration)
npm run dev
```

## Configuration

All configuration via environment variables (see `.env.example`):

**Critical (will cause startup failure if missing):**
- `OPENROUTER_API_KEY`: API key for AI summaries
- `BREVO_SMTP_USER` & `BREVO_SMTP_KEY`: Email credentials
- `EMAIL_RECIPIENTS`: Comma-separated list of recipients

**Important:**
- `NITTER_INSTANCES`: Fallback RSS sources (comma-separated URLs)
- `CRON_SCHEDULE`: Cron expression for daily run (default: `0 7 * * *` = 7 AM)
- `CRON_TIMEZONE`: Timezone for schedule (default: `Europe/Warsaw`)
- `OPENROUTER_MODEL`: LLM model choice (supports Claude, GPT-4, Gemini)

**Feature Flags:**
- `ENABLE_AI_SUMMARIES`: Toggle AI generation (gracefully falls back to basic summaries)
- `INCLUDE_RETWEETS` / `INCLUDE_REPLIES`: Filter tweet types (default: false for both)
- `DRY_RUN`: Skip actual email sending (useful for testing)

## Error Handling

The pipeline collects errors at each stage and includes them in the newsletter:
- **Stage**: rss, process, ai, email
- **Strategy**: Partial failures don't stop the pipeline (e.g., one account failing doesn't block others)
- **Fallbacks**:
  - If AI summary fails, uses basic tweet count summary
  - If email fails, logs error but marks run as unsuccessful
  - Missing tweets from a source doesn't fail the run if others succeed

## Testing the Pipeline

```bash
# Dry run (validates all steps without sending email)
npm run dry-run

# Test immediate execution
npm run dev --run-now

# Check logs (Pino pretty-printed)
npm run dev | npx pino-pretty
```

## Code Style & Standards

- **TypeScript**: `strict: true` - no implicit any, full type safety
- **Imports**: ESM modules with `.js` extensions in imports
- **Async**: Heavy use of async/await with proper error handling
- **Logging**: Structured logs via Pino (use logger.info/warn/error with objects)
- **Validation**: Zod schemas in config validation

## Key Implementation Notes

**Playwright Scraper:**
- Runs headless with sandbox disabled for production environments
- Includes respectful delays between account scrapes (1-3 seconds random)
- Handles missing tweet selectors gracefully
- Extracts tweet ID from URL, content, timestamp, and links

**Retry Logic:**
- RSS fetch retries across multiple Nitter instances on failure
- Exponential backoff (500ms base delay, configurable max attempts)
- Automatically rotates Nitter instances on rate limits (429)

**AI Integration:**
- Uses OpenRouter (provider-agnostic LLM API)
- Builds structured prompts including tweet context and topic info
- Parses JSON responses for highlights, summary, and trends
- Falls back to basic summaries if generation fails

**Email:**
- HTML template with styled sections per topic
- Includes highlights with author info and links
- Newsletter metadata (date, stats)
- All templates in src/services/email/templates.ts

## Common Modifications

**Adding new accounts:** Edit `src/config/accounts.ts` (add to appropriate category)

**Changing topics:** Edit `src/config/topics.ts` and update `TopicId` type in `src/types/index.ts`

**Modifying email template:** Edit `src/services/email/templates.ts` - uses template literals with styled HTML

**Changing AI model:** Set `OPENROUTER_MODEL` env var (any OpenRouter-supported model works)

**Disabling AI:** Set `ENABLE_AI_SUMMARIES=false` - system continues with basic summaries