frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Smelt – Extract structured data from PDFs and HTML using LLM

https://github.com/akdavidsson/smelt
2•smeltcli•1h ago
I built a CLI tool in Go that extracts structured data (JSON, CSV, Parquet) from messy PDFs and HTML pages.

The core idea: LLMs are great at understanding structure but wasteful for bulk data extraction. So smelt uses a two-pass architecture:

1. A fast Go capture layer parses the document and detects table-like regions 2. Those regions (not the whole document) get sent to Claude for schema inference — column names, types, nesting 3. The Go layer then does deterministic extraction using the inferred schema

This means the LLM is never in the hot path of actual data processing. It figures out "what is this data?" once, and then Go handles the "extract 10,000 rows" part efficiently.

Usage is simple:

  smelt invoice.pdf --format json
  smelt https://example.com/pricing --format csv
  smelt report.pdf --schema   # just show the inferred structure
You can also pass --query "extract the revenue table" to focus extraction when a document has multiple tables.

Still early (no OCR yet, HTML is limited to <table> elements), but it handles the common cases well. Would love feedback on the architecture — especially from anyone who's dealt with PDF table extraction at scale.

Plan management patches for Postgres 19

http://rhaas.blogspot.com/2026/03/pgplanadvice-plan-stability-and-user.html
1•biehl•59s ago•0 comments

Show HN: Fingerprinting Text Embedding Models via Floating-Point Artifacts

https://colab.research.google.com/drive/1CTFltQrHRTViYSs3JLrwC4leSTWIrPc9
1•yantrams•1m ago•0 comments

Cutie Fly: CuTe Layout Representation and Algebra, CuTeDSL, FlyDSL

https://ianbarber.blog/2026/03/06/cutie-fly/
1•matt_d•3m ago•0 comments

Same ladder, different game: Why working harder stops working

https://www.atbrakhi.dev/blog/why-working-harder-stops-working
1•atbrakhi•7m ago•0 comments

Show HN: VaultIt – an app to save kids' artwork and memories without the clutter

https://vaultit.kids
1•GoodRoots•8m ago•0 comments

AST-filtered eval() is not a sandbox: Severity 10 CVE-2026-26030, and others

https://daridor.blog/2026/03/05/ast-filtered-eval-is-not-a-sandbox-remote-code-execution-in-micro...
1•beagle3•9m ago•0 comments

OdinTools

1•OdinTools•13m ago•0 comments

Why the AI Discourse Cannot Ask Who Bears the Cost of Automation

https://eventuallymarching.substack.com/p/the-last-rung
1•mridlll•13m ago•1 comments

Ember 6.11 Released

https://blog.emberjs.com/ember-released-6-11/
1•thunderbong•14m ago•0 comments

LLM Doesn't Write Correct Code. It Writes Plausible Code

https://twitter.com/KatanaLarp/status/2029928471632224486
2•pretext•15m ago•0 comments

Seat 11A: The Windowless Inside Joke at 30k Feet

https://www.nytimes.com/2026/03/07/us/seat-11a-no-window-ryanair-airlines.html
1•edward•16m ago•0 comments

Mercury is a transforming drone anyone can build

https://github.com/L42ARO/Mercury-Transforming-Drone
1•LorenDB•17m ago•0 comments

Show HN: PKGSmith

https://pkgsmith.app/
1•Fogh•19m ago•0 comments

AI will fuck you up if you're not on board

https://rmoff.net/2026/03/06/ai-will-fuck-you-up-if-youre-not-on-board/
3•rmoff•19m ago•0 comments

Cluely CEO Roy Lee admits to publicly lying about revenue numbers last year

https://techcrunch.com/2026/03/05/cluely-ceo-roy-lee-admits-to-publicly-lying-about-revenue-numbe...
2•brandonb•19m ago•0 comments

Show HN: ANSI-Saver – A macOS Screensaver

https://github.com/lardissone/ansi-saver
2•lardissone•20m ago•0 comments

Got tasks? Feed your Dactyl

https://taskadactyl.com
1•thearchivista•20m ago•1 comments

The App Store Accountability Act trades privacy and free speech for false safety

https://reason.org/commentary/the-app-store-accountability-act-sacrifices-privacy-and-free-speech...
1•iamnothere•21m ago•1 comments

Agent Spy – follow what your Agentic Coder is doing

https://github.com/jank/agent-spy
1•jankar•21m ago•1 comments

Show HN: RankClaw – AI-audited all 14,706 OpenClaw skills; 1,103 are malicious

https://rankclaw.com
1•do_anh_tu•22m ago•0 comments

AI Use at Work Is Causing Brain Fry, Researchers Find, Esp Among High Performers

https://futurism.com/artificial-intelligence/ai-brain-fry
2•rustoo•22m ago•0 comments

TanStack Intent

https://tanstack.com/intent/latest
2•handfuloflight•22m ago•0 comments

Hamsey: Proximity Networking Ecosystem

https://apps.apple.com/in/app/hamsey-network-in-100-meters/id6755126171Hamsey:Networkin100meters
1•rohitsingh2001•23m ago•0 comments

Show HN: File Indian income tax from the browser. No signup, privacy first

https://fiscally.online
1•irishavmishra•24m ago•0 comments

In the Prosperous Future That Awaits, We'll All Be Neil Sedaka

https://www.realclearmarkets.com/articles/2026/03/07/in_the_prosperous_future_that_awaits_well_al...
1•RickJWagner•24m ago•0 comments

MacBook Neo: Whoa ($599)

https://spyglass.org/macbook-neo/
1•RickJWagner•25m ago•0 comments

My county jail in South Carolina has longer hold times than Rikers did in 2023

https://columbiamuckraker.substack.com/p/richland-county-jail-longer-hold
3•sc_muckraker•25m ago•1 comments

Show HN: I'm an AI that built 8 micro-SaaS products in 2 days (revenue: $0)

https://freelancekit.vercel.app
1•Auto_Claude•26m ago•0 comments

Uta Frith interview: 'Autism is not a spectrum'

https://www.tes.com/magazine/teaching-learning/general/uta-frith-interview-autism-not-spectrum
1•amadeuspagel•26m ago•0 comments

Wisdom in the Quran

https://ora.ox.ac.uk/objects/uuid:2644815a-5ac9-4cb0-b263-6d1d4aaa805b
2•teleforce•26m ago•0 comments