news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

LLM PDF OCR Markdown Book – Turn Scanned PDFs into ePub/Kindle with LLM

https://github.com/jollychang/LLM-PDF-OCR-markdown-book

2•jollychang•1h ago

Comments

jollychang•1h ago

Hi HN! I built a Python CLI that batch-converts PDF page images into clean Markdown and then into EPUB/AZW3/MOBI. It leans on Alibaba DashScope’s multimodal models for OCR, auto-rotates/downsizes images with Pillow, retries requests with backoff, and resumes where it left off. The tool also merges pages into a single book.md, strips headers/footers, and calls pandoc (plus Calibre if present) for final ebooks.

You can feed it PNG/JPG pages directly or run pdftoppm -png -r 300 input.pdf output-prefix first. Usage, parameters, and setup (Python deps, pandoc, Calibre) are documented in the README. Source: [add your repo URL or archive link]. Feedback on robustness, model compatibility, and additional cleanup heuristics would be awesome!

What Researchers Suspect May Be Fueling Cancer Among Millennials

https://www.msn.com/en-us/health/medical/what-researchers-suspect-may-be-fueling-cancer-among-mil...

1•m463•2m ago•0 comments

Nestri – Open-source cloud gaming platform

https://github.com/nestrilabs/nestri

1•manlymuppet•2m ago•0 comments

US Government shutdown begins as partisan division rules Washington

https://www.reuters.com/world/us/us-government-begins-shut-down-most-operations-after-congress-fa...

1•TowerTall•2m ago•0 comments

Intelligent Kubernetes Load Balancing at Databricks

https://www.databricks.com/blog/intelligent-kubernetes-load-balancing-databricks

4•ayf•6m ago•0 comments

Built-In Mapping for More Powerful, Intuitive Code [pdf]

https://github.com/mlochbaum/ILanguage/blob/master/doc/BuiltInMapping/BuiltInMapping.pdf

1•Bogdanp•10m ago•0 comments

Hacktoberfest 2025

https://hacktoberfest.com

1•gnu_amir•12m ago•0 comments

Payload on Workers: a full-fledged CMS, running on Cloudflare's stack

https://blog.cloudflare.com/payload-cms-workers/

1•chmaynard•14m ago•0 comments

Ask HN: How can Netflix have a bug like this?

1•aaronlifshin•19m ago•0 comments

Thank You for Being Annoying

https://www.experimental-history.com/p/thank-you-for-being-annoying

1•calvinfo•30m ago•0 comments

Apple Watch's High Blood Pressure Notifications Approved in Canada

https://www.macrumors.com/2025/09/30/apple-watch-hypertension-health-canada/

2•tosh•32m ago•0 comments

Fake microscopy images generated by AI are indistinguishable from the real thing

https://www.chemistryworld.com/news/fake-microscopy-images-generated-by-ai-are-indistinguishable-...

1•wahvinci•40m ago•0 comments

Is European AI a Lost Cause? Not Necessarily

https://www.noemamag.com/is-european-ai-a-lost-cause-not-necessarily/

1•Brajeshwar•43m ago•0 comments

Ransomware Detection in Google Drive

https://workspace.google.com/blog/product-announcements/ai-ransomware-detection-in-google-drive

1•PessimalDecimal•43m ago•0 comments

Show HN: AiWanAnimate – A new AI tool to create animated videos

https://aiwananimate.me

1•Evanmo666•44m ago•0 comments

Former OpenAI and DeepMind researchers raise whopping $300M

https://techcrunch.com/2025/09/30/former-openai-and-deepmind-researchers-raise-whopping-300m-seed...

2•sarathcp•46m ago•0 comments

Generalised solutions and law of conservation of difficulty (2008)

https://terrytao.wordpress.com/2008/01/04/pcm-article-generalised-solutions/

1•measurablefunc•47m ago•0 comments

Fusion: An Analytics Object Store Optimized for Query Pushdown

https://doi.org/10.1145/3669940.3707234

2•matt_d•51m ago•0 comments

Hacker News Guidelines

3•solsane•56m ago•0 comments

Type Theory Forall – Philip Wadler – Type Classes, Monads, Logic, Future of PL [video]

https://www.youtube.com/watch?v=Q6A848_3TwA

2•matt_d•1h ago•0 comments

Beyond All Reason

https://www.beyondallreason.info

1•Jimmc414•1h ago•0 comments

China Appears to Be Shutting Down Purchases of U.S. Soybeans

https://www.forbes.com/sites/kenroberts/2025/09/30/china-appears-to-be-shutting-down-purchases-of...

5•kaycebasques•1h ago•2 comments

Sperm Racing

https://www.theguardian.com/commentisfree/2025/sep/30/sperm-racing-is-all-the-rage-among-the-tech...

1•andsoitis•1h ago•0 comments

US federal government shuts dowm

https://www.bbc.com/news/live/clylje0rmp2t

25•david927•1h ago•21 comments

F-Droid says Google's new sideloading restrictions will kill the project

https://arstechnica.com/gadgets/2025/09/f-droid-calls-for-regulators-to-stop-googles-crackdown-on...

4•TheCleric•1h ago•0 comments

Package Maintainers Call for Improvements to GitHub's New NPM Security Plan

https://socket.dev/blog/package-maintainers-call-for-improvements-to-npm-security-plan

1•feross•1h ago•1 comments

The Broadway Musical Is in Trouble

https://www.nytimes.com/2025/09/22/theater/broadway-musicals-finances.html

2•alex_hirner•1h ago•1 comments

South Korean government services offline following data center fire

https://www.techradar.com/pro/hundreds-of-south-korean-government-services-go-offline-following-d...

2•anigbrowl•1h ago•0 comments

China Goes on Offense: Beijing's Plans to Exploit American Retreat

https://www.foreignaffairs.com/united-states/china-goes-offense

5•kaycebasques•1h ago•0 comments

Show HN: Simple Meditation Timer

https://www.bodhigpt.com/tools/meditation-timer

1•whatcha•1h ago•0 comments

RubyGems Threatens to Split

https://www.heise.de/en/news/Who-owns-an-open-source-project-RubyGems-threatens-to-split-10685184...

5•KingOfCoders•1h ago•0 comments