Add CLAUDE.md documentation for AI-assisted development

Documents project architecture, data flow, configuration requirements, and common modification patterns for future Claude Code sessions.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-12 12:05:52 +00:00
parent fabfc2b520
commit 3c31f41122

157
CLAUDE.md Normal file
View File

@@ -0,0 +1,157 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
X-Newsletter is a Node.js application that generates daily tech newsletters by scraping posts from X (Twitter) accounts, processing them by topic, and sending HTML emails with AI-generated summaries.
The application supports two data sources:
- **XScraper** (primary): Playwright-based browser scraper that directly loads X.com profiles
- **NitterRssFetcher** (fallback): Uses RSS feeds from Nitter instances for public X data
## Architecture
The pipeline follows a clear sequential flow:
```
XScraper/NitterRssFetcher → TweetProcessor → SummaryGenerator → EmailService
```
### Key Components
**Data Flow:**
- `NewsletterPipeline` (src/core/): Orchestrates the entire pipeline, handling errors at each stage
- `XScraper` (src/services/scraper/): Uses Playwright to scrape X.com profiles for tweets
- `TweetProcessor` (src/core/): Filters tweets (retweets/replies) and groups them by topic
- `SummaryGenerator` (src/services/ai/): Generates AI summaries per topic and daily insights using OpenRouter API
- `EmailService` (src/services/email/): Sends newsletters via Brevo SMTP with HTML templates
**Configuration & Data:**
- `src/config/`: Environment-based configuration using Zod validation
- `src/config/accounts.ts`: List of 50+ tech accounts organized by category (AI/ML, SWE, General Tech)
- `src/config/topics.ts`: Topic definitions matching account categories
- `src/types/`: TypeScript interfaces for tweets, summaries, newsletters, and pipeline results
**Supporting Services:**
- `CronScheduler` (src/services/scheduler/): Schedules daily newsletter runs using node-cron
- `OpenRouterClient` (src/services/ai/): Wraps OpenRouter API for LLM calls
- `Logger` (src/utils/): Pino-based structured logging
- `RetryUtil` (src/utils/): Exponential backoff retry logic for failed requests
### Topic Categories
Three topics defined in config/topics.ts:
- `ai_ml`: AI/Machine Learning (tracked by ~14 accounts)
- `swe`: Software Engineering (tracked by ~13 accounts)
- `general_tech`: General Tech/Startups (tracked by ~12 accounts)
## Development Commands
```bash
# Install dependencies
npm install
# Build (TypeScript compilation)
npm run build
# Run scheduled service (waits for cron schedule)
npm start
# Run pipeline immediately (once)
npm run dev --run-now
# Test run without sending email
npm run dry-run
# Development with tsx (faster iteration)
npm run dev
```
## Configuration
All configuration via environment variables (see `.env.example`):
**Critical (will cause startup failure if missing):**
- `OPENROUTER_API_KEY`: API key for AI summaries
- `BREVO_SMTP_USER` & `BREVO_SMTP_KEY`: Email credentials
- `EMAIL_RECIPIENTS`: Comma-separated list of recipients
**Important:**
- `NITTER_INSTANCES`: Fallback RSS sources (comma-separated URLs)
- `CRON_SCHEDULE`: Cron expression for daily run (default: `0 7 * * *` = 7 AM)
- `CRON_TIMEZONE`: Timezone for schedule (default: `Europe/Warsaw`)
- `OPENROUTER_MODEL`: LLM model choice (supports Claude, GPT-4, Gemini)
**Feature Flags:**
- `ENABLE_AI_SUMMARIES`: Toggle AI generation (gracefully falls back to basic summaries)
- `INCLUDE_RETWEETS` / `INCLUDE_REPLIES`: Filter tweet types (default: false for both)
- `DRY_RUN`: Skip actual email sending (useful for testing)
## Error Handling
The pipeline collects errors at each stage and includes them in the newsletter:
- **Stage**: rss, process, ai, email
- **Strategy**: Partial failures don't stop the pipeline (e.g., one account failing doesn't block others)
- **Fallbacks**:
- If AI summary fails, uses basic tweet count summary
- If email fails, logs error but marks run as unsuccessful
- Missing tweets from a source doesn't fail the run if others succeed
## Testing the Pipeline
```bash
# Dry run (validates all steps without sending email)
npm run dry-run
# Test immediate execution
npm run dev --run-now
# Check logs (Pino pretty-printed)
npm run dev | npx pino-pretty
```
## Code Style & Standards
- **TypeScript**: `strict: true` - no implicit any, full type safety
- **Imports**: ESM modules with `.js` extensions in imports
- **Async**: Heavy use of async/await with proper error handling
- **Logging**: Structured logs via Pino (use logger.info/warn/error with objects)
- **Validation**: Zod schemas in config validation
## Key Implementation Notes
**Playwright Scraper:**
- Runs headless with sandbox disabled for production environments
- Includes respectful delays between account scrapes (1-3 seconds random)
- Handles missing tweet selectors gracefully
- Extracts tweet ID from URL, content, timestamp, and links
**Retry Logic:**
- RSS fetch retries across multiple Nitter instances on failure
- Exponential backoff (500ms base delay, configurable max attempts)
- Automatically rotates Nitter instances on rate limits (429)
**AI Integration:**
- Uses OpenRouter (provider-agnostic LLM API)
- Builds structured prompts including tweet context and topic info
- Parses JSON responses for highlights, summary, and trends
- Falls back to basic summaries if generation fails
**Email:**
- HTML template with styled sections per topic
- Includes highlights with author info and links
- Newsletter metadata (date, stats)
- All templates in src/services/email/templates.ts
## Common Modifications
**Adding new accounts:** Edit `src/config/accounts.ts` (add to appropriate category)
**Changing topics:** Edit `src/config/topics.ts` and update `TopicId` type in `src/types/index.ts`
**Modifying email template:** Edit `src/services/email/templates.ts` - uses template literals with styled HTML
**Changing AI model:** Set `OPENROUTER_MODEL` env var (any OpenRouter-supported model works)
**Disabling AI:** Set `ENABLE_AI_SUMMARIES=false` - system continues with basic summaries