frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Two Billion Email Addresses Were Exposed

https://www.troyhunt.com/2-billion-email-addresses-were-exposed-and-we-indexed-them-all-in-have-i...
19•esnard•13m ago•2 comments

Kimi K2 Thinking, a SOTA open-source trillion-parameter reasoning model

https://moonshotai.github.io/Kimi-K2/thinking.html
393•nekofneko•5h ago•149 comments

Swift on FreeBSD Preview

https://forums.swift.org/t/swift-on-freebsd-preview/83064
116•glhaynes•2h ago•58 comments

ICC ditches Microsoft 365 for openDesk

https://www.binnenlandsbestuur.nl/digitaal/internationaal-strafhof-neemt-afscheid-van-microsoft-365
377•vincvinc•3h ago•113 comments

LLMs Encode How Difficult Problems Are

https://arxiv.org/abs/2510.18147
40•stansApprentice•2h ago•2 comments

Show HN: I scraped 3B Goodreads reviews to train a better recommendation model

https://book.sv
17•costco•1d ago•1 comments

Open Source Implementation of Apple's Private Compute Cloud

https://github.com/openpcc/openpcc
304•adam_gyroscope•1d ago•57 comments

The Parallel Search API

https://parallel.ai/blog/introducing-parallel-search
44•lukaslevert•3h ago•19 comments

What if hard work felt easier?

https://jeanhsu.substack.com/p/what-if-hard-work-felt-easier
36•kiyanwang•1w ago•20 comments

Show HN: TabPFN-2.5 – SOTA foundation model for tabular data

https://priorlabs.ai/technical-reports/tabpfn-2-5-model-report
41•onasta•2h ago•8 comments

I analyzed the lineups at the most popular nightclubs

https://dev.karltryggvason.com/how-i-analyzed-the-lineups-at-the-worlds-most-popular-nightclubs/
120•kalli•6h ago•61 comments

Auraphone: A simple app to collect people's info at events

https://andrewarrow.dev/2025/11/simple-app-collect-peoples-info-at-events/
6•fcpguru•5h ago•1 comments

Senior BizOps at Artie (San Francisco)

https://www.ycombinator.com/companies/artie/jobs/gqANVBc-senior-business-operations
1•tang8330•3h ago

Show HN: Dynamic code and feedback walkthroughs with your coding Agent in VSCode

https://www.intraview.ai/hn-demo
6•cyrusradfar•3h ago•0 comments

FBI tries to unmask owner of archive.is

https://www.heise.de/en/news/Archive-today-FBI-Demands-Data-from-Provider-Tucows-11066346.html
470•Projectiboga•4h ago•267 comments

Eating stinging nettles

https://rachel.blog/2018/04/29/eating-stinging-nettles/
135•rzk•8h ago•137 comments

Benchmarking the Most Reliable Document Parsing API

https://www.tensorlake.ai/blog/benchmarks
18•calavera•2h ago•13 comments

Springs and Bounces in Native CSS

https://www.joshwcomeau.com/animation/linear-timing-function/
43•Bogdanp•1w ago•4 comments

Mathematical exploration and discovery at scale

https://terrytao.wordpress.com/2025/11/05/mathematical-exploration-and-discovery-at-scale/
202•nabla9•11h ago•91 comments

UK outperforms US in creating unicorns from early stage VC investment

https://www.cityam.com/uk-outperforms-us-in-creating-unicorns-from-early-stage-vc-investment/
13•mmarian•33m ago•7 comments

Show HN: See chords as flags – Visual harmony of top composers on musescore

https://rawl.rocks/
92•vitaly-pavlenko•1d ago•24 comments

Supply chain attacks are exploiting our assumptions

https://blog.trailofbits.com/2025/09/24/supply-chain-attacks-are-exploiting-our-assumptions/
26•crescit_eundo•4h ago•15 comments

Cloudflare Tells U.S. Govt That Foreign Site Blocking Efforts Are Trade Barriers

https://torrentfreak.com/cloudflare-tells-u-s-govt-that-foreign-site-blocking-efforts-are-digital...
260•iamnothere•6h ago•158 comments

Show HN: qqqa – A fast, stateless LLM-powered assistant for your shell

https://github.com/matisojka/qqqa
95•iagooar•9h ago•72 comments

IKEA launches new smart home range with 21 Matter-compatible products

https://www.ikea.com/global/en/newsroom/retail/the-new-smart-home-from-ikea-matter-compatible-251...
228•lemoine0461•7h ago•170 comments

I may have found a way to spot U.S. at-sea strikes before they're announced

https://old.reddit.com/r/OSINT/comments/1opjjyv/i_may_have_found_a_way_to_spot_us_atsea_strikes/
225•hentrep•15h ago•314 comments

Black Hole Flare Is Biggest and Most Distant Seen

https://www.caltech.edu/about/news/black-hole-flare-is-biggest-and-most-distant-seen
4•gmays•1h ago•0 comments

How I am deeply integrating Emacs

https://joshblais.com/blog/how-i-am-deeply-integrating-emacs/
189•signa11•13h ago•126 comments

Pico-100BASE-TX: Bit-Banged 100 MBit/s Ethernet and UDP Framer for RP2040/RP2350

https://github.com/steve-m/Pico-100BASE-TX
69•_Microft•6d ago•12 comments

Phantom in the Light: The story of early spectroscopy

https://chrisdempewolf.com/posts/phantom-in-the-light/
8•dempedempe•1w ago•0 comments
Open in hackernews

Benchmarking the Most Reliable Document Parsing API

https://www.tensorlake.ai/blog/benchmarks
18•calavera•2h ago

Comments

serjester•1h ago
This is just a company advertisement, not even one that’s well done. They didn’t benchmark any of the real leaders in the space (reducto, extend, etc) and left Gemini out of the first two tests, presumably because it was the best performer (while also being multiple orders of magnitude cheaper).
JLO64•1h ago
Personally I use OpenAI models via the API for transcription of PDF files. Is there a big difference between them and Gemini models?
diptanu•1h ago
Hey! I am the founder of Tensorlake. We benchmarked the models that our customers consider using in enterprises or regulated industries where there is a big need for processing documents for various automation. Benchmarking takes a lot of time so we focussed on the ones that we get asked about.

On Gemini and other VLMs - we excluded these models because they don't do visual grounding - aka they don't provide page layouts, bounding boxes of elements on the pages. This is a table stakes feature for use-cases customers are building with Tensorlake. It wouldn't be possible to build citations without bounding boxes.

On pricing - we are probably the only company offer a pure on-demand pricing without any tiers. With Tensorlake, you can get back markdown from every page, summaries of figures, tables and charts, structured data, page classification, etc - in ONE api call. This means we are running a bunch of different models under the hood. If you add up the token count, and complexity of infrastructure to build a complex pipeline around Gemini, and other OCR/Layout detection model I bet the price you would end up with won't be any cheaper than what we provide :) Plus doing this at scale is very very complex - it requires building a lot of sophisticated infrastructure - another source of cost behind modern Document Ingestion services.

ianhawes•1h ago
I just tested a non-English document and it rendered English text. Does your model not support anything other than English?
diptanu•44m ago
It does, we have users in Europe and Asia using it with non English languages. Can you please send me a message at diptanu at tensorlake dot ai, would love to see why it didn’t work.
coderintherye•57m ago
Google's Vertex API for document processing absolutely does bounding boxes. In fact, some of the document processors are just a wrap around Google's product.
diptanu•45m ago
OP mentioned Gemini and not Google’s Vertex OCR API which has very different performance and accuracy characteristics than Gemini
hotpaper75•1h ago
Thanks for mentioning them, indeed their post seem to only surface a couple of names in the field and maybe not the most relevant ones.
karakanb•1h ago
I have been recently looking into extracting a bunch of details from a set of legacy invoice PDFs and had a subpar experience. Gemini was the best among the ones that I tried, but even that missed quite a bit. I'll definitely give this a look.

It seems like such a crowded space and there are many tools doing document extraction, I wonder if there's anything particular pulling more attention into the space?

recursive4•53m ago
Curious how it compares to https://github.com/datalab-to/chandra
diptanu•41m ago
We haven’t texted Chandra yet, because it’s very new. Under the hood Tensorlake is very similar to Marker - it’s a pipeline based OCR API, we do layout detection, Text Recognition and Detection, Table Structure Understanding, etc. We then use VLMs to enrich the results. Our models are much bigger than marker, and thus takes a little longer to parse documents. We optimized for accuracy. We will have a faster API soon.
kissgyorgy•28m ago
I just tried it out and docling finished in 20s (with pretty good results) the same document which in Tensorlake is still pending for 10 minutes. I won't even wait for the results.
goldenjm•14m ago
This would be more helpful if it included DeepSeek-OCR, PaddleOCR-VL and MinerU 2.5. In general, I've found that OmniDocBench is a reliable benchmark, perhaps surprisingly because it is made by the same team as MinerU. They updated their benchmark table recently: https://github.com/opendatalab/OmniDocBench#end-to-end-evalu.... There are some other models that score above DeepSeek-OCR as well that I'm not as familiar with.