frontpage.

Hey HN,

I built QGen, a tool that extracts structured Q&A datasets from documents using RAG (retrieval-augmented generation). I’d love feedback from anyone working with ML, document processing, or AI pipelines.

Problem

Turning PDFs, Word docs, and other unstructured files into high-quality Q&A pairs for model training is slow and error-prone. QGen automates this process, making it fast and scalable.

How It Works

- Document ingestion – PDF, Word, Excel, PPT, OCR - Embedding & retrieval – semantic search over chunks - Q&A generation – LLM generates and filters candidate pairs - Quality scoring – four-dimensional metrics for relevance, coverage, consistency - Export / API – JSON, CSV, SQL, XML; on-prem or cloud deployment

Who It’s For

- Startups prototyping AI - Data scientists training domain-specific models - Enterprises processing large document sets

Early Feedback & Limitations

- Sometimes questions are too shallow - Domain adaptation (legal, medical, research) needs tuning - Runtime can be high for large batches

I’m especially curious about what features you’d want, what trade-offs matter most, and how you’d integrate this into your workflow.

Try It / Feedback

Comment, or email contact@qelab.org to try QGen or share thoughts

Disclaimer: Parts of this post were drafted and formatted with AI assistance.

Show HN: Chrome extension to GIF YouTube videos in-player

MEPs vote to ban plant-based food terms

React is transitioning from Meta to Linux Foundation

Look mom HR application, look mom no job – phishing using Zoom docs

Derivation of Hamiltonians for Accelerators (1998) [pdf]

FCC kicks off 'Space Month' with vow to fast-track satellite licensing

Google declares AI bug hunting season open, sets a $30K max reward

'Blind' Achieves 12M Global Users, Driven by Indian Market Expansion

Alienware Command Centre Alternative for Linux

Teens arrested in London preschool ransomware attack

A 9KB (3KB gzip) single HTML notebook, perfect for minimalists

3D-Printed Flexible Fingers for Robotic Soft Gripping of Agricultural Products

Polymarket Founder Is Youngest Self-Made Billionaire After Deal with NYSE Owner

Using GraphViz for Claude.md

CQASM: A Quantum Programming Language

Show HN: Dipmatter – Deep Researcher for Finding Early Signals and Startup Ideas

Q&A wiht Jean-Baptiste Fressoz on "The myth of energy transition"

Jefferies reveals $715M fund exposure to First Brands invoices

Base Power raises $1B in latest financing round led by Addition

The Mexican Fisherman (2024)

Show HN: Polymathic Resource List – 150 Tools and 90 Books

BuhoGO – Fast Bitcoin Payments

Show HN: SVG Generator Agent

Grok Code Fast 1 is now available in Visual Studio

Why Americans don't want to move for jobs anymore

How China Threatens to Force Taiwan into a Total Blackout

Tag normalization in automatic tagging sysems

Show HN: Loopdesk – AI video editor with chat-based workflows and GPU rendering

Show HN: I built a website to find out what's trending on different platforms

Forget youthful brilliance – the human mind peaks at 60

Show HN: QGen – turn documents into AI-ready Q&A datasets(SaaS and on-prem)