frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Smelt – Extract structured data from PDFs and HTML using LLM

https://github.com/akdavidsson/smelt
2•smeltcli•2h ago
I built a CLI tool in Go that extracts structured data (JSON, CSV, Parquet) from messy PDFs and HTML pages.

The core idea: LLMs are great at understanding structure but wasteful for bulk data extraction. So smelt uses a two-pass architecture:

1. A fast Go capture layer parses the document and detects table-like regions 2. Those regions (not the whole document) get sent to Claude for schema inference — column names, types, nesting 3. The Go layer then does deterministic extraction using the inferred schema

This means the LLM is never in the hot path of actual data processing. It figures out "what is this data?" once, and then Go handles the "extract 10,000 rows" part efficiently.

Usage is simple:

  smelt invoice.pdf --format json
  smelt https://example.com/pricing --format csv
  smelt report.pdf --schema   # just show the inferred structure
You can also pass --query "extract the revenue table" to focus extraction when a document has multiple tables.

Still early (no OCR yet, HTML is limited to <table> elements), but it handles the common cases well. Would love feedback on the architecture — especially from anyone who's dealt with PDF table extraction at scale.

Show HN: ANSI-Saver – A macOS Screensaver

https://github.com/lardissone/ansi-saver
9•lardissone•1h ago•1 comments

Show HN: PKGSmith

https://pkgsmith.app/
2•Fogh•1h ago•0 comments

Show HN: JotSpot – a super fast Markdown note tool with instant shareable pages

https://jotspot.io/
2•Rageypeep•1h ago•0 comments

Show HN: Moongate – Ultima Online server emulator in .NET 10 with Lua scripting

https://github.com/moongate-community/moongatev2
265•squidleon•1d ago•154 comments

Show HN: Somnia – a dream journal that locks 2 minutes after your alarm fires

https://www.somniavault.me/
2•SushanKKsdfsdf•1h ago•0 comments

Show HN: Bulk Image Generator – Create AI variations and remove bg in batch

https://bulkimagegenerator.app/
3•fairyFayra•1h ago•0 comments

Show HN: OSle – A 510 bytes OS in x86 assembly, now with a C API

https://github.com/shikaan/osle/releases/tag/16800a5
2•shikaan•2h ago•0 comments

Show HN: µJS, a 5KB alternative to Htmx and Turbo with zero dependencies

https://mujs.org
11•amaury_bouchard•6h ago•1 comments

Show HN: Smelt – Extract structured data from PDFs and HTML using LLM

https://github.com/akdavidsson/smelt
2•smeltcli•2h ago•0 comments

Show HN: Recruiter Analytics for Developer Portfolios

https://portlumeai.com/blog/recruiter-analytics-developer-portfolio-tracking
4•portlumeai•2h ago•0 comments

Show HN: Diamond – an interactive CLI for editing trees

https://github.com/justindmassey/diamond
2•justindmassey•2h ago•0 comments

Show HN: Kula – Lightweight, self-contained Linux server monitoring tool

https://github.com/c0m4r/kula
68•c0m4r•15h ago•49 comments

Show HN: Claude-replay – A video-like player for Claude Code sessions

https://github.com/es617/claude-replay
90•es617•23h ago•30 comments

Show HN: OculOS – Any desktop app as a JSON API via OS accessibility tree

https://github.com/huseyinstif/oculos
5•stif1337•7h ago•1 comments

Show HN: I open-sourced my Steam game, 100% written in Lua, engine is also open

https://github.com/willtobyte/reprobate
29•delduca•16h ago•13 comments

Show HN: 1v1 coding game that LLMs struggle with

https://yare.io
24•levmiseri•1d ago•7 comments

Show HN: Nirvana – A TUI YouTube Music Player with a Physics-Based Visualizer

https://github.com/iamekabir-web/Nirvana
4•ekabir•4h ago•0 comments

Show HN: Reconstruct any image using primitive shapes, runs in-browser via WASM

https://github.com/taiseiue/primitive-playground
39•taiseiue•4d ago•8 comments

Show HN: A trainable, modular electronic nose for industrial use

https://sniphi.com/
33•kwitczak•3d ago•24 comments

Show HN: Making Braindance from Cyberpunk 2077 a reality

https://www.braindance.dance/
4•shibo•7h ago•0 comments

Show HN: Git-lanes – Parallel isolation for AI coding agents using Git worktrees

https://github.com/bugrax/git-lanes
5•bugrax•7h ago•3 comments

Show HN: Swarm – Program a colony of 200 ants using a custom assembly language

https://dev.moment.com/
186•armandhammer10•1d ago•61 comments

Show HN: Mb-CLI – CLI for Metabase. Designed for humans and AI coding agents

https://github.com/andreagrandi/mb-cli
3•andreagrandi•8h ago•0 comments

Show HN: Graph-Oriented Generation – Beating RAG for Codebases by 89%

https://github.com/dchisholm125/graph-oriented-generation
10•dchisholm125•18h ago•2 comments

Show HN: NeoNetrek – modernizing the internet's first team game (1988)

https://neonetrek.com
5•yuriksan•16h ago•0 comments

Show HN: Interactive 3D globe of EU shipping emissions

https://seafloor.pages.dev
19•marcohaber•1d ago•7 comments

Show HN: Jido 2.0, Elixir Agent Framework

https://jido.run/blog/jido-2-0-is-here
319•mikehostetler•1d ago•65 comments

Show HN: PageAgent, A GUI agent that lives inside your web app

https://alibaba.github.io/page-agent/
144•simon_luv_pho•1d ago•73 comments

Show HN: Modembin – A pastebin that encodes your text into real FSK modem audio

https://www.modembin.com
25•a13x57•1d ago•3 comments

Show HN: Open source drone that can hold cargo

https://github.com/L42ARO/Mercury-Transforming-Drone
3•devmandan•11h ago•3 comments