frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Microsoft's first in-house voice model, MAI-Voice-1

https://copilot.microsoft.com/labs/audio-expression
1•kitcar•1m ago•0 comments

Non-newsletter #1: This One's for the Survivors

https://mailchi.mp/gizra/this-ones-for-the-survivors
1•amitaibu•2m ago•0 comments

Debian 13: My list of new features

https://samueloph.dev/blog/debian-13-my-list-of-exciting-new-things/
1•jandeboevrie•4m ago•0 comments

Acne vaccines could offer robust defence

https://www.nature.com/articles/d41586-025-02652-1
1•bookofjoe•7m ago•0 comments

Large language models can reconstruct forbidden knowledge

https://www.fastcompany.com/91391442/how-large-language-models-can-reconstruct-forbidden-knowledge
1•toss1•8m ago•0 comments

China vs. the West: Unity vs. Freedom

https://www.boris.fyi/unity-vs-freedom
1•sirobg•9m ago•0 comments

Citrix forgot to tell you CVE-2025–6543 has been used as a zero day since May

https://doublepulsar.com/citrix-forgot-to-tell-you-cve-2025-6543-has-been-used-as-a-zero-day-sinc...
2•speckx•10m ago•0 comments

My startup banking story (2023)

https://mitchellh.com/writing/my-startup-banking-story
1•dvrp•10m ago•0 comments

Start and track Copilot coding agent tasks from Raycast

https://github.blog/changelog/2025-08-28-start-and-track-copilot-coding-agent-tasks-from-raycast/
1•timrogers•11m ago•0 comments

Donald Trump's Big Gay Government

https://www.nytimes.com/2025/08/26/style/gay-men-trump-administration-republicans.html
1•whack•13m ago•2 comments

RFC 8594: The Sunset HTTP Header Field

https://datatracker.ietf.org/doc/html/rfc8594
1•aiven•13m ago•1 comments

Vivaldi slams Google, Microsoft for shoving AI into browsers, vows to stay clear

https://www.neowin.net/news/vivaldi-slams-google-and-microsoft-for-cramming-ai-into-browsers-says...
3•bundie•16m ago•2 comments

Show HN: Put text in between images (Nano Banana)

https://www.textbetween.com/
1•westche2222•17m ago•1 comments

Engineers send quantum signals with standard Internet Protocol

https://phys.org/news/2025-08-quantum-standard-internet-protocol.html
5•layer8•19m ago•0 comments

New evidence strongly suggest AI is killing jobs for young programmers

https://www.understandingai.org/p/new-evidence-strongly-suggest-ai
5•CharlesW•20m ago•0 comments

FBI, Dutch cops seize fake ID marketplace that sold identity docs for $9

https://www.theregister.com/2025/08/28/fbi_dutch_cops_seize_veriftools/
3•rntn•22m ago•0 comments

Show HN: Universal Chat UI for AI Agents

https://www.craffted.dev/
2•ddaras•26m ago•2 comments

Adding limestone to farmland boosts carbon capture and crop yields, study finds

https://phys.org/news/2025-08-adding-limestone-farmland-boosts-carbon.html
1•PaulHoule•26m ago•0 comments

Marshmellow Laser Feast: experimental art collective

1•cl3misch•27m ago•0 comments

AMA with Z.ai, the Lab Behind GLM Models

https://old.reddit.com/r/LocalLLaMA/comments/1n2ghx4/ama_with_zai_the_lab_behind_glm_models/
2•bratao•29m ago•1 comments

What's your database infra in 2025? (2 min survey)

https://survey.springtail.io/database-survey-2025
1•gszundi•30m ago•1 comments

New Xcode beta adds GPT-5, Claude account support

https://sixcolors.com/link/2025/08/apples-new-xcode-beta-adds-gpt-5-claude-account-support/
2•CharlesW•31m ago•0 comments

Cakedesk Invoicing App

https://cakedesk.app
1•carlosjobim•33m ago•0 comments

Ask HN: Where can I see a live octopus in Maine?

3•Octopus88•33m ago•1 comments

Ask HN: What is the future of software salaries in the age of AI coding agents?

1•jplusequalt•39m ago•1 comments

Simulating wealth distribution in an agent-based system

https://notebooks.manganiello.tech/fabio/wealth-inequality.ipynb
3•blacklight•40m ago•0 comments

What I learned vibe coding a WASM CSV Parser

https://www.importcsv.com/blog/wasm-csv-parser-complete-story
3•aray07•40m ago•0 comments

Show HN: A flat monthly subscription for open-source LLMs

https://synthetic.new/newsletter/entries/subscriptions
4•reissbaker•45m ago•0 comments

Eliza Labs Sues X For Anti-trust

https://www.reuters.com/legal/litigation/musks-x-hit-with-antitrust-lawsuit-by-software-startup-e...
3•moonmagick•46m ago•0 comments

Amazon Facing Lawsuit over Prime Video Movie Purchases

https://www.newsweek.com/amazon-facing-lawsuit-over-prime-video-movie-purchases-2120882
2•c420•46m ago•0 comments
Open in hackernews

LLMs solving problems OCR+NLP couldn't

https://cloudsquid.substack.com/p/ocr-is-legacy-tech
18•universesquid•6h ago

Comments

behnamoh•5h ago
This is a nothing burger blog post that likely made it to the front page because it mentions "LLM" in the title. Worse yet, it's an ad actually.
WesleyLivesay•5h ago
You beat me to this comment, but you are absolutely correct.
OtherShrezzing•5h ago
The first thing I do on HN posts with lots of upvotes and few comments is scroll to the bottom and check if the closing paragraph has a link to some saas product. If it does, I close the tab.
thaeli•5h ago
Ironically, this check would be a pretty good use for a LLM.
tiahura•5h ago
"I still believe that processing documents will be a solved problem in a couple years time."

Current 80/20-rule-ignoring AI dogma in a nutshell.

tovej•5h ago
Are LLMs not NLP? They process natural language, no?

And I assume the multimodal tools still use OCR for text extraction, or am I missing something?

My understanding is that they're still doing OCR+NLP, just differently than traditional approaches.

universesquid•4h ago
1.) technically yes, most models used for that task are NLP but not LLMs in the modern sense though 2.) Actually they don't. Multimodal LLMs parse PDFs by taking multiple screenshots on each page.
Tractor8626•5h ago
OCR doesn't have prompt injection problem
mattigames•5h ago
It's only prompt injection if it comes from state sponsored hackers, otherwise it's just surprise prompt augmentation.
endymion-light•5h ago
I don't mind people doing blog-posts advertising they're own companies - but I feel like i'd like a little bit more substance within this topic. It is interesting in a way, I find I turn to things like gemini 2.5 within simple OCR/NLP and now more substantial image editing than specific models.

I think that's more because of the current state of the industry, a lot of those models are either internal, paywall locked or annoying to use. I don't want to waste effort in trying to sign up for a 4 week trail of X service to perform a one off task.

Unfortunately, this post didn't really elucidate or go into an interesting topic within this space.

I'm not expecting a research paper, but it would be great to get some stats, graphs, examples and meat on the bones. I opened this up expecting some actual examples of problems within OCR & NLP and showing how X multi-modal model solves them.

universesquid•3h ago
cool thanks for that comment, I might update this in a couple weeks time since it seems to interest people but general feedback that it's too shallow. Wanted to give some high level intuition I gained after working on document processing for a while now as many people are still surprised that e.g. layouts aren't a real problem anymore but will take the hint that hn is a crowd that wants more depth! :)
daft_pink•5h ago
Really looking for something we can run locally in terms of OCR LLM, I think a lot of people doing a lot of OCR and document extraction aren’t looking to upload every file into the cloud and the use is more narrow than typing into a chatbot.

While Gemini is nice, it would be nice to have a pipeline that works locally on a reasonably RAM’d unified memory Mac or Framework AMD board.

eithed•5h ago
OCRs don't hallucinate outputs = if it says "212.99mm" on architecture diagram it doesn't suddenly turn into "2413m" on the other end, because LLM thought this feels better. I remember reading on HN where that was happening in a such case (but sadly my google foo fails me to find a link)
strangecasts•4h ago
The case you might be thinking of is the JBIG2 implementation bug [1, 2] in Xerox photocopiers where the pattern-matching would incorrectly treat certain characters as interchangeable, leading to numbers getting rewritten in spreadsheets.

[1] https://www.bbc.com/news/technology-23588202

[2] https://www.dkriesel.com/en/blog/2013/0810_xerox_investigati...

eithed•4h ago
That's exactly it! Thank you!