Technical approach: - Python scraper for AI/tech sources - Custom NLP pipeline with BERT embeddings to filter for AI-specific content - Hierarchical clustering to group related stories - ChatGPT API for generating cluster titles and short summaries - Served as a static HTML via Cloudflare Pages - Lightweight analytics with GoatCounter and Umami (understanding these two frameworks to choose one over the other) - Experimental JSON-based search (considering proper search if this scales)
The project started when I realized I was wasting hours daily checking multiple sources as a PM trying to track AI developments. Built this over 3 months between work commitments.
Interesting challenges: - Finding the right threshold for story similarity (still tuning this) - Balancing comprehensive coverage with noise filtering - Keeping the page lightweight while maintaining content density
Would appreciate feedback on clustering accuracy, false positive/negative rates, and overall UX.
Link: https://currentai.news