Implement Playwright-based X scraper with AI-powered newsletter generation
Major changes: - Replace Nitter RSS with Playwright browser automation for direct X scraping - Scrape all 37 configured tech accounts in parallel - Add OpenRouter AI integration for topic-based summaries (xiaomi/mimo-v2-flash:free model) - Update prompts for factual, emotion-free analysis with post links - Add console output for newsletter preview in dry-run mode - Update Dockerfile to Playwright v1.57.0 with necessary browser dependencies - Implement WRAP workflow method for AI-assisted development guidance Technical improvements: - Fixed TypeScript compilation (unused parameter in XScraper) - Newsletter pipeline successfully processes 37 accounts -> AI summaries -> HTML email - Full end-to-end test validated: scraping, processing, AI generation, email template Pipeline flow: 1. Scrape X profiles with Playwright (parallel, configurable timeout) 2. Filter tweets by time window and content type 3. Categorize into AI/ML, Software Engineering, Tech & Startups 4. Generate AI summaries for each topic 5. Create cross-topic daily insights 6. Render HTML newsletter with highlights and trending topics 7. Send via email (or print to console in dry-run mode) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -1,7 +1,7 @@
|
||||
# =============================================================================
|
||||
# Build Stage
|
||||
# =============================================================================
|
||||
FROM node:20-alpine AS builder
|
||||
FROM node:20-bookworm AS builder
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
@@ -14,29 +14,23 @@ COPY tsconfig.json ./
|
||||
COPY src ./src
|
||||
RUN npm run build
|
||||
|
||||
# Prune dev dependencies
|
||||
RUN npm prune --production
|
||||
|
||||
# =============================================================================
|
||||
# Production Stage
|
||||
# Production Stage - Using Playwright base image
|
||||
# =============================================================================
|
||||
FROM node:20-alpine AS production
|
||||
|
||||
# Security: run as non-root user
|
||||
RUN addgroup -g 1001 -S nodejs && \
|
||||
adduser -S newsletter -u 1001
|
||||
FROM mcr.microsoft.com/playwright:v1.57.0-noble AS production
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Copy built application
|
||||
COPY --from=builder --chown=newsletter:nodejs /app/node_modules ./node_modules
|
||||
COPY --from=builder --chown=newsletter:nodejs /app/dist ./dist
|
||||
COPY --from=builder --chown=newsletter:nodejs /app/package.json ./
|
||||
|
||||
USER newsletter
|
||||
# Copy built application and dependencies
|
||||
COPY --from=builder /app/node_modules ./node_modules
|
||||
COPY --from=builder /app/dist ./dist
|
||||
COPY --from=builder /app/package.json ./
|
||||
|
||||
# Set timezone (can be overridden via env)
|
||||
ENV TZ=Europe/Warsaw
|
||||
|
||||
# Run as non-root user (pwuser is Playwright's default user)
|
||||
USER pwuser
|
||||
|
||||
# Default command
|
||||
CMD ["node", "dist/index.js"]
|
||||
|
||||
Reference in New Issue
Block a user