frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Will Future Generations Think We're Gross?

https://chillphysicsenjoyer.substack.com/p/will-future-generations-think-were
1•crescit_eundo•1m ago•0 comments

Kernel Key Retention Service

https://www.kernel.org/doc/html/latest/security/keys/core.html
1•networked•1m ago•0 comments

State Department will delete Xitter posts from before Trump returned to office

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
1•righthand•4m ago•0 comments

Show HN: Verifiable server roundtrip demo for a decision interruption system

https://github.com/veeduzyl-hue/decision-assistant-roundtrip-demo
1•veeduzyl•5m ago•0 comments

Impl Rust – Avro IDL Tool in Rust via Antlr

https://www.youtube.com/watch?v=vmKvw73V394
1•todsacerdoti•5m ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
2•vinhnx•6m ago•0 comments

minikeyvalue

https://github.com/commaai/minikeyvalue/tree/prod
3•tosh•11m ago•0 comments

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

https://github.com/eval-exec/neomacs
1•evalexec•16m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
2•ShinyaKoyano•20m ago•1 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
2•m00dy•21m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•22m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
3•okaywriting•29m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
2•todsacerdoti•32m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•32m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
1•ms7892•33m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark
1•MikeVeerman•34m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/
1•Raiplus•34m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU
1•indiantinker•35m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
3•pseudolus•35m ago•1 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•39m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
2•bkls•39m ago•0 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•40m ago•0 comments

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

https://rokn.io/posts/building-movie-recommendation-agent
4•roknovosel•41m ago•0 comments

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z
2•beardyw•49m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/
1•taubek•49m ago•0 comments

OldMapsOnline

https://www.oldmapsonline.org/en
2•surprisetalk•51m ago•0 comments

What It's Like to Be a Worm

https://www.asimov.press/p/sentience
2•surprisetalk•51m ago•0 comments

Don't go to physics grad school and other cautionary tales

https://scottlocklin.wordpress.com/2025/12/19/dont-go-to-physics-grad-school-and-other-cautionary...
2•surprisetalk•51m ago•0 comments

Lawyer sets new standard for abuse of AI; judge tosses case

https://arstechnica.com/tech-policy/2026/02/randomly-quoting-ray-bradbury-did-not-save-lawyer-fro...
5•pseudolus•52m ago•0 comments

AI anxiety batters software execs, costing them combined $62B: report

https://nypost.com/2026/02/04/business/ai-anxiety-batters-software-execs-costing-them-62b-report/
1•1vuio0pswjnm7•52m ago•0 comments
Open in hackernews

Show HN: ClearDoc – Extract fields from any document using OCR and LLM

http://cleardoc.v5ent.com/
1•Mignet•6mo ago
Hi HN!

I recently launched a prototype of *ClearDoc*, an AI-powered tool to extract structured data from unstructured documents like invoices, bills of lading, certificates, etc.

It uses *OCR (PaddleOCR)* and *LLMs* to detect and align key fields — even for complex documents with tables, nested fields, or in different languages.

It doesn't require templates and can be *self-hosted* (demo runs on my own GPU).

Live demo (no sign-up): http://cleardoc.v5ent.com/ Demo video: https://www.youtube.com/watch?v=u83T6iewfNs

Right now: - Fields are auto-aligned visually on the document - Works with PDFs, images, scans - No custom field design/editing in the demo yet

Would love feedback on: - Which use cases matter most to you? - What would make this valuable enough to adopt?

Thanks!

Comments

Mignet•6mo ago
pls feel free to report any issue
Mignet•6mo ago
*Building an AI-Powered Document Understanding Tool – Feedback Welcome*

Hi HN!

I'm working on a tool called *ClearDoc*, which uses AI to extract structured data from unstructured documents like invoices, bills of lading, and certificates. The biggest challenge we've faced so far is accurately extracting data from complex documents, especially those with tables and nested fields.

### What I’m Looking to Discuss: - How do you approach extracting data from complex documents like invoices or contracts? - If you’ve worked with OCR or document processing tools, what have been your biggest challenges?

We’ve built a demo that uses PaddleOCR and LLMs to extract and align data. I’d love to get your thoughts on how we could improve the accuracy of data extraction, or whether you think a no-template approach is valuable.

If you’re interested, feel free to try out the demo (no sign-up required) and let me know your thoughts!

[ClearDoc Demo](https://cleardoc.v5ent.com/)

Looking forward to your feedback!

#AI #OCR #MachineLearning #DocumentProcessing

Mignet•6mo ago
hi HN! — just pushed a new update to *ClearDoc*, my AI tool to extract *structured data from unstructured documents* (like invoices, logistics forms, certificates, etc.)

---

### What’s New:

*HTTPS Enabled:* The live demo is now secure at [https://cleardoc.v5ent.com](https://cleardoc.v5ent.com), so no more browser warnings.

*Improved Homepage Messaging:* Based on user feedback, the homepage now has a much clearer value proposition and simplified CTA. For example, “Reasoning Output” is now simply “View Extracted Data.”

*Performance Tweaks:* Faster processing, better alignment, and cleaner output.

*Coming soon: Confidence Scores + Feedback Loop* So users will see which extracted fields the AI is “most sure” about — and be able to correct any errors to improve future results.

---

### What is ClearDoc?

ClearDoc helps you *turn messy PDFs/images into clean JSON* — without templates, without fine-tuning, and fully self-hostable.

It combines: - OCR (PaddleOCR) - LLM (OpenAI-compatible) - Field alignment + visual overlays - JSON Schema output (customizable)

Demo: https://cleardoc.v5ent.com Video: https://www.youtube.com/watch?v=u83T6iewfNs

---

### Who is this for?

- Developers building document-based tools - Finance / accounting teams who copy-paste data - Logistics / trade teams processing paperwork - Anyone who hates manually parsing PDFs

---

### I'm looking for:

1. Early users with real docs they want to process 2. Edge cases you'd like to see it handle 3. Feedback on the extraction quality / experience

I’d love to hear what you think — or help if you're facing similar problems.

Thanks — Charles