frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

I built a tool to convert files into structured data for LLMs and automation

https://filedecomposer.com/
1•pedrovinis•4h ago

Comments

pedrovinis•4h ago
Hey HN,

I recently built File Decomposer — a tool that takes files like PDFs, DOCX, EPUB, HTML, etc., and converts them into structured Markdown or JSON.

The main goal is to help devs, AI engineers, and indie hackers who deal with unstructured documents and want clean, usable data to:

feed into LLMs (for RAG/chatbots)

create searchable knowledge bases

automate workflows

or just stop wasting time copying/pasting from PDFs

It handles:

Large files (multi-hundred-page PDFs, technical docs, books)

Structure preservation (headings, lists, sections)

JSON formatting that’s easy to parse or feed into your pipelines

Built this after wasting too much time cleaning up documents to get usable data into my AI projects. Happy to answer questions, and very open to feedback or edge cases you'd like supported.

You can try it here: https://filedecomposer.com/

Would love to hear how others are tackling this problem, or if there are ways I can make this tool more useful for your workflows.

Google accidentally leaks Material 3 Expressive UI ahead of Android 16

https://timesofindia.indiatimes.com/technology/tech-news/google-accidentally-reveals-new-android-design-language-material-3-expressive-heres-what-changes-it-may-bring-to-android-16/articleshow/120919076.cms
1•byte-bolter•1m ago•0 comments

AWS to invest $4B in cloud infrastructure in Chile, its 3rd Latam region

https://www.reuters.com/business/energy/amazon-spend-4-billion-cloud-infrastructure-chile-2025-05-07/
1•gray_amps•4m ago•0 comments

System converts fabric images into machine-readable knitting instructions

https://techxplore.com/news/2025-05-fabric-images-machine-readable.html
1•geox•5m ago•0 comments

Google showcases AI coding agent at I/O, plans Gemini chat on XR headsets

https://www.reuters.com/business/google-is-developing-software-ai-agent-ahead-annual-conference-information-2025-05-12/
1•bit_qntum•5m ago•0 comments

Improvements in reasoning AI models may slow down soon, analysis finds

https://techcrunch.com/2025/05/12/improvements-in-reasoning-ai-models-may-slow-down-soon-analysis-finds/
1•GreenGames•6m ago•0 comments

Being Legit: On Impostor Syndrome, Impossible Tech, and the Myth of the Obvious

https://www.tedtanner.org/being-legit-on-impostor-syndrome-impossible-tech-and-the-myth-of-the-obvious/
1•tctjr•7m ago•1 comments

Choice at Different Abstraction Levels

https://www.overcomingbias.com/p/choice-at-different-abstraction-levels
1•jger15•10m ago•0 comments

Show HN: AGI Hits a Structural Wall – A Billion-Dollar Problem

2•mmschlereth•10m ago•1 comments

Last Contact (2007)

https://web.archive.org/web/20080725045740/http://solarisbooks.com/books/newbookscifi/last-contact.asp
1•vermilingua•11m ago•0 comments

How to Reduce AI Coding Errors with a Task Manager

https://shipixen.com/tutorials/reduce-ai-coding-errors-with-taskmaster-ai
1•tortilla•11m ago•0 comments

Confidently Wrong

https://aabiji.github.io/html/wrong.html
1•aabiji•11m ago•0 comments

Show HN: I built an all-in-one feedback system to ship the right features faster

https://upvoicy.com/
1•optinghost•12m ago•0 comments

The Linux Scheduler: A Decade of Wasted Cores [pdf]

https://people.ece.ubc.ca/sasha/papers/eurosys16-final29.pdf
2•aabiji•13m ago•1 comments

Rescinding the Amended Water Use Standards for Residential Dishwashers [pdf]

https://public-inspection.federalregister.gov/2025-08591.pdf
1•impish9208•15m ago•0 comments

Stacked Pull Requests on GitHub

https://github.com/ejoffe/spr
1•pabs3•18m ago•0 comments

Leftwing pundit Hasan Piker: US border agents questioned him on Trump and Gaza

https://www.theguardian.com/us-news/2025/may/12/hasan-piker-border-trump-gaza
2•mitchbob•21m ago•0 comments

OpenAI's Sam Altman on Building the 'Core AI Subscription' for Your Life

https://www.youtube.com/watch?v=ctcMA6chfDY
1•Brysonbw•23m ago•0 comments

Elon Musk's Boring Company Is in Talks with Government over Amtrak Project

https://www.nytimes.com/2025/05/12/technology/elon-musk-boring-company-amtrak.html
2•nxobject•24m ago•0 comments

MeshWalkie Combines ESP32, GNSS and LoRa in UV-K6 Radio Enclosure

https://linuxgizmos.com/meshwalkie-combines-esp32-gnss-and-lora-in-uv-k6-radio-enclosure/
3•teleforce•24m ago•0 comments

A Cache-Accelerated Framework for Interactive Visualization of Tera-Scale Data

https://arxiv.org/abs/2504.18001
1•PaulHoule•25m ago•0 comments

Show HN: Video Summarization Using Local Gemma3

https://github.com/vast-data/mattsvlm
2•RamboRogers•26m ago•1 comments

Modular verification of MongoDB Transactions using TLA+

http://muratbuffalo.blogspot.com/2025/05/modular-verification-of-mongodb.html
2•todsacerdoti•26m ago•0 comments

House of Lords pushes back against government's AI plans

https://www.theguardian.com/technology/2025/may/12/house-of-lords-pushes-back-ai-plans-data-bill
1•chrisjj•27m ago•0 comments

AI in Baseball Training

https://techxplore.com/news/2025-05-revolutionizing-baseball-ai-simulated-pitchers.html
1•MarcoDewey•36m ago•0 comments

A programming language made for me

https://zylinski.se/posts/a-programming-language-for-me/
1•MaximilianEmel•38m ago•0 comments

The Leaderboard Illusion

https://cohere.com/research/lmarena
2•yenniejun111•45m ago•0 comments

Universe Set to Decay 10^22 Times Sooner Than Previously Estimated

https://scienceblog.com/universe-set-to-decay-1022-times-sooner-than-previously-estimated/
3•vo2maxer•46m ago•1 comments

Google just changed its 'G' logo

https://www.theverge.com/news/664958/google-g-logo-gradient-design-change
2•wmstack•46m ago•0 comments

Unending ransomware attacks are a symptom, not the sickness

https://www.theregister.com/2025/05/12/opinion_column_ransomware/
3•chrisjj•47m ago•0 comments

The Psychology of Everyday Things (1987) [video]

https://archive.org/details/The_Psychology_of_Everyday_Things_Donald_A._Normal_Institute_for_Cognitive_Scien
2•alasr•49m ago•0 comments