Documents project architecture, data flow, configuration requirements, and common modification patterns for future Claude Code sessions. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
158 lines
5.7 KiB
Markdown
158 lines
5.7 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Project Overview
|
|
|
|
X-Newsletter is a Node.js application that generates daily tech newsletters by scraping posts from X (Twitter) accounts, processing them by topic, and sending HTML emails with AI-generated summaries.
|
|
|
|
The application supports two data sources:
|
|
- **XScraper** (primary): Playwright-based browser scraper that directly loads X.com profiles
|
|
- **NitterRssFetcher** (fallback): Uses RSS feeds from Nitter instances for public X data
|
|
|
|
## Architecture
|
|
|
|
The pipeline follows a clear sequential flow:
|
|
|
|
```
|
|
XScraper/NitterRssFetcher → TweetProcessor → SummaryGenerator → EmailService
|
|
```
|
|
|
|
### Key Components
|
|
|
|
**Data Flow:**
|
|
- `NewsletterPipeline` (src/core/): Orchestrates the entire pipeline, handling errors at each stage
|
|
- `XScraper` (src/services/scraper/): Uses Playwright to scrape X.com profiles for tweets
|
|
- `TweetProcessor` (src/core/): Filters tweets (retweets/replies) and groups them by topic
|
|
- `SummaryGenerator` (src/services/ai/): Generates AI summaries per topic and daily insights using OpenRouter API
|
|
- `EmailService` (src/services/email/): Sends newsletters via Brevo SMTP with HTML templates
|
|
|
|
**Configuration & Data:**
|
|
- `src/config/`: Environment-based configuration using Zod validation
|
|
- `src/config/accounts.ts`: List of 50+ tech accounts organized by category (AI/ML, SWE, General Tech)
|
|
- `src/config/topics.ts`: Topic definitions matching account categories
|
|
- `src/types/`: TypeScript interfaces for tweets, summaries, newsletters, and pipeline results
|
|
|
|
**Supporting Services:**
|
|
- `CronScheduler` (src/services/scheduler/): Schedules daily newsletter runs using node-cron
|
|
- `OpenRouterClient` (src/services/ai/): Wraps OpenRouter API for LLM calls
|
|
- `Logger` (src/utils/): Pino-based structured logging
|
|
- `RetryUtil` (src/utils/): Exponential backoff retry logic for failed requests
|
|
|
|
### Topic Categories
|
|
|
|
Three topics defined in config/topics.ts:
|
|
- `ai_ml`: AI/Machine Learning (tracked by ~14 accounts)
|
|
- `swe`: Software Engineering (tracked by ~13 accounts)
|
|
- `general_tech`: General Tech/Startups (tracked by ~12 accounts)
|
|
|
|
## Development Commands
|
|
|
|
```bash
|
|
# Install dependencies
|
|
npm install
|
|
|
|
# Build (TypeScript compilation)
|
|
npm run build
|
|
|
|
# Run scheduled service (waits for cron schedule)
|
|
npm start
|
|
|
|
# Run pipeline immediately (once)
|
|
npm run dev --run-now
|
|
|
|
# Test run without sending email
|
|
npm run dry-run
|
|
|
|
# Development with tsx (faster iteration)
|
|
npm run dev
|
|
```
|
|
|
|
## Configuration
|
|
|
|
All configuration via environment variables (see `.env.example`):
|
|
|
|
**Critical (will cause startup failure if missing):**
|
|
- `OPENROUTER_API_KEY`: API key for AI summaries
|
|
- `BREVO_SMTP_USER` & `BREVO_SMTP_KEY`: Email credentials
|
|
- `EMAIL_RECIPIENTS`: Comma-separated list of recipients
|
|
|
|
**Important:**
|
|
- `NITTER_INSTANCES`: Fallback RSS sources (comma-separated URLs)
|
|
- `CRON_SCHEDULE`: Cron expression for daily run (default: `0 7 * * *` = 7 AM)
|
|
- `CRON_TIMEZONE`: Timezone for schedule (default: `Europe/Warsaw`)
|
|
- `OPENROUTER_MODEL`: LLM model choice (supports Claude, GPT-4, Gemini)
|
|
|
|
**Feature Flags:**
|
|
- `ENABLE_AI_SUMMARIES`: Toggle AI generation (gracefully falls back to basic summaries)
|
|
- `INCLUDE_RETWEETS` / `INCLUDE_REPLIES`: Filter tweet types (default: false for both)
|
|
- `DRY_RUN`: Skip actual email sending (useful for testing)
|
|
|
|
## Error Handling
|
|
|
|
The pipeline collects errors at each stage and includes them in the newsletter:
|
|
- **Stage**: rss, process, ai, email
|
|
- **Strategy**: Partial failures don't stop the pipeline (e.g., one account failing doesn't block others)
|
|
- **Fallbacks**:
|
|
- If AI summary fails, uses basic tweet count summary
|
|
- If email fails, logs error but marks run as unsuccessful
|
|
- Missing tweets from a source doesn't fail the run if others succeed
|
|
|
|
## Testing the Pipeline
|
|
|
|
```bash
|
|
# Dry run (validates all steps without sending email)
|
|
npm run dry-run
|
|
|
|
# Test immediate execution
|
|
npm run dev --run-now
|
|
|
|
# Check logs (Pino pretty-printed)
|
|
npm run dev | npx pino-pretty
|
|
```
|
|
|
|
## Code Style & Standards
|
|
|
|
- **TypeScript**: `strict: true` - no implicit any, full type safety
|
|
- **Imports**: ESM modules with `.js` extensions in imports
|
|
- **Async**: Heavy use of async/await with proper error handling
|
|
- **Logging**: Structured logs via Pino (use logger.info/warn/error with objects)
|
|
- **Validation**: Zod schemas in config validation
|
|
|
|
## Key Implementation Notes
|
|
|
|
**Playwright Scraper:**
|
|
- Runs headless with sandbox disabled for production environments
|
|
- Includes respectful delays between account scrapes (1-3 seconds random)
|
|
- Handles missing tweet selectors gracefully
|
|
- Extracts tweet ID from URL, content, timestamp, and links
|
|
|
|
**Retry Logic:**
|
|
- RSS fetch retries across multiple Nitter instances on failure
|
|
- Exponential backoff (500ms base delay, configurable max attempts)
|
|
- Automatically rotates Nitter instances on rate limits (429)
|
|
|
|
**AI Integration:**
|
|
- Uses OpenRouter (provider-agnostic LLM API)
|
|
- Builds structured prompts including tweet context and topic info
|
|
- Parses JSON responses for highlights, summary, and trends
|
|
- Falls back to basic summaries if generation fails
|
|
|
|
**Email:**
|
|
- HTML template with styled sections per topic
|
|
- Includes highlights with author info and links
|
|
- Newsletter metadata (date, stats)
|
|
- All templates in src/services/email/templates.ts
|
|
|
|
## Common Modifications
|
|
|
|
**Adding new accounts:** Edit `src/config/accounts.ts` (add to appropriate category)
|
|
|
|
**Changing topics:** Edit `src/config/topics.ts` and update `TopicId` type in `src/types/index.ts`
|
|
|
|
**Modifying email template:** Edit `src/services/email/templates.ts` - uses template literals with styled HTML
|
|
|
|
**Changing AI model:** Set `OPENROUTER_MODEL` env var (any OpenRouter-supported model works)
|
|
|
|
**Disabling AI:** Set `ENABLE_AI_SUMMARIES=false` - system continues with basic summaries
|