Add CLAUDE.md documentation for AI-assisted development
Documents project architecture, data flow, configuration requirements, and common modification patterns for future Claude Code sessions. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
157
CLAUDE.md
Normal file
157
CLAUDE.md
Normal file
@@ -0,0 +1,157 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
X-Newsletter is a Node.js application that generates daily tech newsletters by scraping posts from X (Twitter) accounts, processing them by topic, and sending HTML emails with AI-generated summaries.
|
||||
|
||||
The application supports two data sources:
|
||||
- **XScraper** (primary): Playwright-based browser scraper that directly loads X.com profiles
|
||||
- **NitterRssFetcher** (fallback): Uses RSS feeds from Nitter instances for public X data
|
||||
|
||||
## Architecture
|
||||
|
||||
The pipeline follows a clear sequential flow:
|
||||
|
||||
```
|
||||
XScraper/NitterRssFetcher → TweetProcessor → SummaryGenerator → EmailService
|
||||
```
|
||||
|
||||
### Key Components
|
||||
|
||||
**Data Flow:**
|
||||
- `NewsletterPipeline` (src/core/): Orchestrates the entire pipeline, handling errors at each stage
|
||||
- `XScraper` (src/services/scraper/): Uses Playwright to scrape X.com profiles for tweets
|
||||
- `TweetProcessor` (src/core/): Filters tweets (retweets/replies) and groups them by topic
|
||||
- `SummaryGenerator` (src/services/ai/): Generates AI summaries per topic and daily insights using OpenRouter API
|
||||
- `EmailService` (src/services/email/): Sends newsletters via Brevo SMTP with HTML templates
|
||||
|
||||
**Configuration & Data:**
|
||||
- `src/config/`: Environment-based configuration using Zod validation
|
||||
- `src/config/accounts.ts`: List of 50+ tech accounts organized by category (AI/ML, SWE, General Tech)
|
||||
- `src/config/topics.ts`: Topic definitions matching account categories
|
||||
- `src/types/`: TypeScript interfaces for tweets, summaries, newsletters, and pipeline results
|
||||
|
||||
**Supporting Services:**
|
||||
- `CronScheduler` (src/services/scheduler/): Schedules daily newsletter runs using node-cron
|
||||
- `OpenRouterClient` (src/services/ai/): Wraps OpenRouter API for LLM calls
|
||||
- `Logger` (src/utils/): Pino-based structured logging
|
||||
- `RetryUtil` (src/utils/): Exponential backoff retry logic for failed requests
|
||||
|
||||
### Topic Categories
|
||||
|
||||
Three topics defined in config/topics.ts:
|
||||
- `ai_ml`: AI/Machine Learning (tracked by ~14 accounts)
|
||||
- `swe`: Software Engineering (tracked by ~13 accounts)
|
||||
- `general_tech`: General Tech/Startups (tracked by ~12 accounts)
|
||||
|
||||
## Development Commands
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
npm install
|
||||
|
||||
# Build (TypeScript compilation)
|
||||
npm run build
|
||||
|
||||
# Run scheduled service (waits for cron schedule)
|
||||
npm start
|
||||
|
||||
# Run pipeline immediately (once)
|
||||
npm run dev --run-now
|
||||
|
||||
# Test run without sending email
|
||||
npm run dry-run
|
||||
|
||||
# Development with tsx (faster iteration)
|
||||
npm run dev
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
All configuration via environment variables (see `.env.example`):
|
||||
|
||||
**Critical (will cause startup failure if missing):**
|
||||
- `OPENROUTER_API_KEY`: API key for AI summaries
|
||||
- `BREVO_SMTP_USER` & `BREVO_SMTP_KEY`: Email credentials
|
||||
- `EMAIL_RECIPIENTS`: Comma-separated list of recipients
|
||||
|
||||
**Important:**
|
||||
- `NITTER_INSTANCES`: Fallback RSS sources (comma-separated URLs)
|
||||
- `CRON_SCHEDULE`: Cron expression for daily run (default: `0 7 * * *` = 7 AM)
|
||||
- `CRON_TIMEZONE`: Timezone for schedule (default: `Europe/Warsaw`)
|
||||
- `OPENROUTER_MODEL`: LLM model choice (supports Claude, GPT-4, Gemini)
|
||||
|
||||
**Feature Flags:**
|
||||
- `ENABLE_AI_SUMMARIES`: Toggle AI generation (gracefully falls back to basic summaries)
|
||||
- `INCLUDE_RETWEETS` / `INCLUDE_REPLIES`: Filter tweet types (default: false for both)
|
||||
- `DRY_RUN`: Skip actual email sending (useful for testing)
|
||||
|
||||
## Error Handling
|
||||
|
||||
The pipeline collects errors at each stage and includes them in the newsletter:
|
||||
- **Stage**: rss, process, ai, email
|
||||
- **Strategy**: Partial failures don't stop the pipeline (e.g., one account failing doesn't block others)
|
||||
- **Fallbacks**:
|
||||
- If AI summary fails, uses basic tweet count summary
|
||||
- If email fails, logs error but marks run as unsuccessful
|
||||
- Missing tweets from a source doesn't fail the run if others succeed
|
||||
|
||||
## Testing the Pipeline
|
||||
|
||||
```bash
|
||||
# Dry run (validates all steps without sending email)
|
||||
npm run dry-run
|
||||
|
||||
# Test immediate execution
|
||||
npm run dev --run-now
|
||||
|
||||
# Check logs (Pino pretty-printed)
|
||||
npm run dev | npx pino-pretty
|
||||
```
|
||||
|
||||
## Code Style & Standards
|
||||
|
||||
- **TypeScript**: `strict: true` - no implicit any, full type safety
|
||||
- **Imports**: ESM modules with `.js` extensions in imports
|
||||
- **Async**: Heavy use of async/await with proper error handling
|
||||
- **Logging**: Structured logs via Pino (use logger.info/warn/error with objects)
|
||||
- **Validation**: Zod schemas in config validation
|
||||
|
||||
## Key Implementation Notes
|
||||
|
||||
**Playwright Scraper:**
|
||||
- Runs headless with sandbox disabled for production environments
|
||||
- Includes respectful delays between account scrapes (1-3 seconds random)
|
||||
- Handles missing tweet selectors gracefully
|
||||
- Extracts tweet ID from URL, content, timestamp, and links
|
||||
|
||||
**Retry Logic:**
|
||||
- RSS fetch retries across multiple Nitter instances on failure
|
||||
- Exponential backoff (500ms base delay, configurable max attempts)
|
||||
- Automatically rotates Nitter instances on rate limits (429)
|
||||
|
||||
**AI Integration:**
|
||||
- Uses OpenRouter (provider-agnostic LLM API)
|
||||
- Builds structured prompts including tweet context and topic info
|
||||
- Parses JSON responses for highlights, summary, and trends
|
||||
- Falls back to basic summaries if generation fails
|
||||
|
||||
**Email:**
|
||||
- HTML template with styled sections per topic
|
||||
- Includes highlights with author info and links
|
||||
- Newsletter metadata (date, stats)
|
||||
- All templates in src/services/email/templates.ts
|
||||
|
||||
## Common Modifications
|
||||
|
||||
**Adding new accounts:** Edit `src/config/accounts.ts` (add to appropriate category)
|
||||
|
||||
**Changing topics:** Edit `src/config/topics.ts` and update `TopicId` type in `src/types/index.ts`
|
||||
|
||||
**Modifying email template:** Edit `src/services/email/templates.ts` - uses template literals with styled HTML
|
||||
|
||||
**Changing AI model:** Set `OPENROUTER_MODEL` env var (any OpenRouter-supported model works)
|
||||
|
||||
**Disabling AI:** Set `ENABLE_AI_SUMMARIES=false` - system continues with basic summaries
|
||||
Reference in New Issue
Block a user