frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Near-Instantly Aborting the Worst Pain Imaginable with Psychedelics

https://psychotechnology.substack.com/p/near-instantly-aborting-the-worst
1•eatitraw•4m ago•0 comments

Show HN: Nginx-defender – realtime abuse blocking for Nginx

https://github.com/Anipaleja/nginx-defender
2•anipaleja•5m ago•0 comments

The Super Sharp Blade

https://netzhansa.com/the-super-sharp-blade/
1•robin_reala•6m ago•0 comments

Smart Homes Are Terrible

https://www.theatlantic.com/ideas/2026/02/smart-homes-technology/685867/
1•tusslewake•8m ago•0 comments

What I haven't figured out

https://macwright.com/2026/01/29/what-i-havent-figured-out
1•stevekrouse•8m ago•0 comments

KPMG pressed its auditor to pass on AI cost savings

https://www.irishtimes.com/business/2026/02/06/kpmg-pressed-its-auditor-to-pass-on-ai-cost-savings/
1•cainxinth•8m ago•0 comments

Open-source Claude skill that optimizes Hinge profiles. Pretty well.

https://twitter.com/b1rdmania/status/2020155122181869666
2•birdmania•9m ago•1 comments

First Proof

https://arxiv.org/abs/2602.05192
2•samasblack•11m ago•1 comments

I squeezed a BERT sentiment analyzer into 1GB RAM on a $5 VPS

https://mohammedeabdelaziz.github.io/articles/trendscope-market-scanner
1•mohammede•12m ago•0 comments

Kagi Translate

https://translate.kagi.com
2•microflash•13m ago•0 comments

Building Interactive C/C++ workflows in Jupyter through Clang-REPL [video]

https://fosdem.org/2026/schedule/event/QX3RPH-building_interactive_cc_workflows_in_jupyter_throug...
1•stabbles•14m ago•0 comments

Tactical tornado is the new default

https://olano.dev/blog/tactical-tornado/
2•facundo_olano•16m ago•0 comments

Full-Circle Test-Driven Firmware Development with OpenClaw

https://blog.adafruit.com/2026/02/07/full-circle-test-driven-firmware-development-with-openclaw/
1•ptorrone•16m ago•0 comments

Automating Myself Out of My Job – Part 2

https://blog.dsa.club/automation-series/automating-myself-out-of-my-job-part-2/
1•funnyfoobar•16m ago•0 comments

Google staff call for firm to cut ties with ICE

https://www.bbc.com/news/articles/cvgjg98vmzjo
41•tartoran•16m ago•5 comments

Dependency Resolution Methods

https://nesbitt.io/2026/02/06/dependency-resolution-methods.html
1•zdw•17m ago•0 comments

Crypto firm apologises for sending Bitcoin users $40B by mistake

https://www.msn.com/en-ie/money/other/crypto-firm-apologises-for-sending-bitcoin-users-40-billion...
1•Someone•17m ago•0 comments

Show HN: iPlotCSV: CSV Data, Visualized Beautifully for Free

https://www.iplotcsv.com/demo
2•maxmoq•18m ago•0 comments

There's no such thing as "tech" (Ten years later)

https://www.anildash.com/2026/02/06/no-such-thing-as-tech/
1•headalgorithm•19m ago•0 comments

List of unproven and disproven cancer treatments

https://en.wikipedia.org/wiki/List_of_unproven_and_disproven_cancer_treatments
1•brightbeige•19m ago•0 comments

Me/CFS: The blind spot in proactive medicine (Open Letter)

https://github.com/debugmeplease/debug-ME
1•debugmeplease•19m ago•1 comments

Ask HN: What are the word games do you play everyday?

1•gogo61•22m ago•1 comments

Show HN: Paper Arena – A social trading feed where only AI agents can post

https://paperinvest.io/arena
1•andrenorman•24m ago•0 comments

TOSTracker – The AI Training Asymmetry

https://tostracker.app/analysis/ai-training
1•tldrthelaw•28m ago•0 comments

The Devil Inside GitHub

https://blog.melashri.net/micro/github-devil/
2•elashri•28m ago•0 comments

Show HN: Distill – Migrate LLM agents from expensive to cheap models

https://github.com/ricardomoratomateos/distill
1•ricardomorato•28m ago•0 comments

Show HN: Sigma Runtime – Maintaining 100% Fact Integrity over 120 LLM Cycles

https://github.com/sigmastratum/documentation/tree/main/sigma-runtime/SR-053
1•teugent•28m ago•0 comments

Make a local open-source AI chatbot with access to Fedora documentation

https://fedoramagazine.org/how-to-make-a-local-open-source-ai-chatbot-who-has-access-to-fedora-do...
1•jadedtuna•30m ago•0 comments

Introduce the Vouch/Denouncement Contribution Model by Mitchellh

https://github.com/ghostty-org/ghostty/pull/10559
1•samtrack2019•30m ago•0 comments

Software Factories and the Agentic Moment

https://factory.strongdm.ai/
1•mellosouls•31m ago•1 comments
Open in hackernews

Show HN: ClearDoc – Extract fields from any document using OCR and LLM

http://cleardoc.v5ent.com/
1•Mignet•6mo ago
Hi HN!

I recently launched a prototype of *ClearDoc*, an AI-powered tool to extract structured data from unstructured documents like invoices, bills of lading, certificates, etc.

It uses *OCR (PaddleOCR)* and *LLMs* to detect and align key fields — even for complex documents with tables, nested fields, or in different languages.

It doesn't require templates and can be *self-hosted* (demo runs on my own GPU).

Live demo (no sign-up): http://cleardoc.v5ent.com/ Demo video: https://www.youtube.com/watch?v=u83T6iewfNs

Right now: - Fields are auto-aligned visually on the document - Works with PDFs, images, scans - No custom field design/editing in the demo yet

Would love feedback on: - Which use cases matter most to you? - What would make this valuable enough to adopt?

Thanks!

Comments

Mignet•6mo ago
pls feel free to report any issue
Mignet•6mo ago
*Building an AI-Powered Document Understanding Tool – Feedback Welcome*

Hi HN!

I'm working on a tool called *ClearDoc*, which uses AI to extract structured data from unstructured documents like invoices, bills of lading, and certificates. The biggest challenge we've faced so far is accurately extracting data from complex documents, especially those with tables and nested fields.

### What I’m Looking to Discuss: - How do you approach extracting data from complex documents like invoices or contracts? - If you’ve worked with OCR or document processing tools, what have been your biggest challenges?

We’ve built a demo that uses PaddleOCR and LLMs to extract and align data. I’d love to get your thoughts on how we could improve the accuracy of data extraction, or whether you think a no-template approach is valuable.

If you’re interested, feel free to try out the demo (no sign-up required) and let me know your thoughts!

[ClearDoc Demo](https://cleardoc.v5ent.com/)

Looking forward to your feedback!

#AI #OCR #MachineLearning #DocumentProcessing

Mignet•6mo ago
hi HN! — just pushed a new update to *ClearDoc*, my AI tool to extract *structured data from unstructured documents* (like invoices, logistics forms, certificates, etc.)

---

### What’s New:

*HTTPS Enabled:* The live demo is now secure at [https://cleardoc.v5ent.com](https://cleardoc.v5ent.com), so no more browser warnings.

*Improved Homepage Messaging:* Based on user feedback, the homepage now has a much clearer value proposition and simplified CTA. For example, “Reasoning Output” is now simply “View Extracted Data.”

*Performance Tweaks:* Faster processing, better alignment, and cleaner output.

*Coming soon: Confidence Scores + Feedback Loop* So users will see which extracted fields the AI is “most sure” about — and be able to correct any errors to improve future results.

---

### What is ClearDoc?

ClearDoc helps you *turn messy PDFs/images into clean JSON* — without templates, without fine-tuning, and fully self-hostable.

It combines: - OCR (PaddleOCR) - LLM (OpenAI-compatible) - Field alignment + visual overlays - JSON Schema output (customizable)

Demo: https://cleardoc.v5ent.com Video: https://www.youtube.com/watch?v=u83T6iewfNs

---

### Who is this for?

- Developers building document-based tools - Finance / accounting teams who copy-paste data - Logistics / trade teams processing paperwork - Anyone who hates manually parsing PDFs

---

### I'm looking for:

1. Early users with real docs they want to process 2. Edge cases you'd like to see it handle 3. Feedback on the extraction quality / experience

I’d love to hear what you think — or help if you're facing similar problems.

Thanks — Charles