Built this during my PhD research after getting frustrated with traditional web scrapers breaking every time sites updated their HTML. DR Web Engine uses declarative JSON5 queries instead of imperative scraping code.
Key features:
- JSON5 query language (can still use XPath for field selectors)
- AI-powered element selection (describe what you want in natural language)
- Plugin architecture for extensibility
- Modern browser automation (Playwright-based)
- Handles dynamic content and JavaScript
The core insight: define WHAT data to extract in a structured format, not HOW to extract it step-by-step. Makes scraping more maintainable and less brittle.
Started as academic research but rebuilt it for real-world use. Published the methodology in arXiv and now actively developing the open source version.
Looking for feedback on the query language design and real-world use cases!