frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Statin drugs safer than previously thought

https://www.semafor.com/article/02/06/2026/statin-drugs-safer-than-previously-thought
1•stareatgoats•1m ago•0 comments

Handy when you just want to distract yourself for a moment

https://d6.h5go.life/
1•TrendSpotterPro•2m ago•0 comments

More States Are Taking Aim at a Controversial Early Reading Method

https://www.edweek.org/teaching-learning/more-states-are-taking-aim-at-a-controversial-early-read...
1•lelanthran•4m ago•0 comments

AI will not save developer productivity

https://www.infoworld.com/article/4125409/ai-will-not-save-developer-productivity.html
1•indentit•9m ago•0 comments

How I do and don't use agents

https://twitter.com/jessfraz/status/2019975917863661760
1•tosh•15m ago•0 comments

BTDUex Safe? The Back End Withdrawal Anomalies

1•aoijfoqfw•18m ago•0 comments

Show HN: Compile-Time Vibe Coding

https://github.com/Michael-JB/vibecode
3•michaelchicory•20m ago•1 comments

Show HN: Ensemble – macOS App to Manage Claude Code Skills, MCPs, and Claude.md

https://github.com/O0000-code/Ensemble
1•IO0oI•23m ago•1 comments

PR to support XMPP channels in OpenClaw

https://github.com/openclaw/openclaw/pull/9741
1•mickael•24m ago•0 comments

Twenty: A Modern Alternative to Salesforce

https://github.com/twentyhq/twenty
1•tosh•26m ago•0 comments

Raspberry Pi: More memory-driven price rises

https://www.raspberrypi.com/news/more-memory-driven-price-rises/
1•calcifer•31m ago•0 comments

Level Up Your Gaming

https://d4.h5go.life/
1•LinkLens•35m ago•1 comments

Di.day is a movement to encourage people to ditch Big Tech

https://itsfoss.com/news/di-day-celebration/
3•MilnerRoute•36m ago•0 comments

Show HN: AI generated personal affirmations playing when your phone is locked

https://MyAffirmations.Guru
4•alaserm•37m ago•3 comments

Show HN: GTM MCP Server- Let AI Manage Your Google Tag Manager Containers

https://github.com/paolobietolini/gtm-mcp-server
1•paolobietolini•38m ago•0 comments

Launch of X (Twitter) API Pay-per-Use Pricing

https://devcommunity.x.com/t/announcing-the-launch-of-x-api-pay-per-use-pricing/256476
1•thinkingemote•38m ago•0 comments

Facebook seemingly randomly bans tons of users

https://old.reddit.com/r/facebookdisabledme/
1•dirteater_•40m ago•1 comments

Global Bird Count Event

https://www.birdcount.org/
1•downboots•40m ago•0 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
2•soheilpro•42m ago•0 comments

Jon Stewart – One of My Favorite People – What Now? with Trevor Noah Podcast [video]

https://www.youtube.com/watch?v=44uC12g9ZVk
2•consumer451•45m ago•0 comments

P2P crypto exchange development company

1•sonniya•58m ago•0 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
2•jesperordrup•1h ago•0 comments

Write for Your Readers Even If They Are Agents

https://commonsware.com/blog/2026/02/06/write-for-your-readers-even-if-they-are-agents.html
1•ingve•1h ago•0 comments

Knowledge-Creating LLMs

https://tecunningham.github.io/posts/2026-01-29-knowledge-creating-llms.html
1•salkahfi•1h ago•0 comments

Maple Mono: Smooth your coding flow

https://font.subf.dev/en/
1•signa11•1h ago•0 comments

Sid Meier's System for Real-Time Music Composition and Synthesis

https://patents.google.com/patent/US5496962A/en
1•GaryBluto•1h ago•1 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
7•keepamovin•1h ago•1 comments

Show HN: Empusa – Visual debugger to catch and resume AI agent retry loops

https://github.com/justin55afdfdsf5ds45f4ds5f45ds4/EmpusaAI
1•justinlord•1h ago•0 comments

Show HN: Bitcoin wallet on NXP SE050 secure element, Tor-only open source

https://github.com/0xdeadbeefnetwork/sigil-web
2•sickthecat•1h ago•1 comments

White House Explores Opening Antitrust Probe on Homebuilders

https://www.bloomberg.com/news/articles/2026-02-06/white-house-explores-opening-antitrust-probe-i...
1•petethomas•1h ago•0 comments
Open in hackernews

Why extracting data from PDFs is still a nightmare

https://arstechnica.com/ai/2025/03/why-extracting-data-from-pdfs-is-still-a-nightmare-for-data-experts/
8•lxm•9mo ago

Comments

bediger4000•9mo ago
My credit union will only give out monthly statements as PDFs. For some reason, the pdf-to-text converters don't strip out rows or lines of text, but rather columns, so a semi-automated solution is out-of-the-question. I've resorted to mouse-copying entries in Firefox, then pasting the text into a word processor to get rows of data I can work with.

The tellers are magnificently ignorant about this, as is the telephone helpline. To them, the PDF actually is the data the credit union uses. No other form of data exists, except possibly in an Excel spreadsheet, and they can't give data in that format. I blame the prevalence of Windows for this. Between the use of file name "extension" to indicate format of the file, hiding the "extension" in file browsers, the single document at a time orientation, and almost exclusive use of WYSIWYG systems like Word and Excel, it's pretty hard to understand that a difference between "the data" and "the formatting" exists.

cratermoon•9mo ago
"Unfortunately Mistral-OCR has still the usual VLM curse: with challenging manuscripts, it hallucinates completely."

My personal experience relates to the statement from the article, "According to several studies, approximately 80–90 percent of the world's organizational data is stored as unstructured data in documents, much of it locked away in formats that resist easy extraction."

In my experience, this means Microsoft Word and PowerPoint documents authored by people who put more focus on the appearance than the structure of the content. Take one of these documents and generate PDF from it and any hint of structure that existed is gone.

There was an article on HN not too long ago discussing this history of Word documents and the lack of structure, but I can't find the link. ETA: https://ia.net/topics/markdown-and-the-slow-fade-of-the-form...