frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: Getting full-text scientific content into LLMs+Agents is stupidly hard

https://www.valyu.network/blogs/deepsearch-v2-updates
3•zk108•1d ago
Most APIs don’t return actual content. You get metadata, maybe an abstract, maybe a snippet...never the thing itself. And if you want proper sources like arXiv, PubMed, or major publishers? Good luck. You’re stuck scraping tens of millions PDFs or semantic scholar and building your own ingestion pipeline.

We hit this building agentic workflows and RAG backends. What we needed wasn’t “search”, it was a way to retrieve real, structured full text with enough metadata to plug straight into a reasoning system. So we built a system that could do that: multimodal inputs (text, math, figures), clean citations, reference chaining, and filters that work (by date, by source, etc).

The hard part wasn’t retrieval but preprocessing at scale. Figuring out how to analyse, chunk, structure tens of millions of docs without taking months or breaking the bank. Not to mention dealing with licensed content where formats vary wildly or building retrieval systems at this scale.

Still a work in progress with more updates on the way. But miles better than duct-taping together PDFs, AI search engines etc. and hoping to find the relevant context you need.

Comments

yorkeccak•1d ago
aligns very well with what Anthropic researchers said on a recent podcast that even if AI progress stalls, current AI models are already capable of automating all white-collar jobs - the only lacking components being better access to information, and the infra/workflows around the models themselves
yorkeccak•1d ago
https://x.com/evankirstel/status/1927184767218229309?s=46

Ask HN: Does it make sense to be in SF if you are bootstrapped?

1•kingkhalid•18s ago•0 comments

DeepSeek-R1-0528 is now live on Hyperbolic

https://app.hyperbolic.xyz/
1•dubrado•38s ago•1 comments

Beware of Fast-Math

https://simonbyrne.github.io/notes/fastmath/
1•thunderbong•1m ago•0 comments

"The Age of Work", part two: The country's youngest workforce

https://www.adpresearch.com/the-age-of-work-part-two-the-countrys-youngest-workforce/
1•toomuchtodo•2m ago•1 comments

'Are the Bricks Evil?' In a Village Built for Nazis, Darkness Lingers

https://www.nytimes.com/2025/05/27/realestate/berlin-holocaust-nazis-neighborhood.html
1•whack•4m ago•0 comments

How to Stop AI Cheating

https://www.honest-broker.com/p/5-ways-to-stop-ai-cheating
1•gHeadphone•7m ago•0 comments

3-Week MVP Sprint (Node/React/AWS) – 2 free builds for early partners

https://github.com/mackerricher/3week-mvp-sprint/blob/main/README.md
1•mackerricher•9m ago•0 comments

Nvidia beats earnings expectations but takes hit from chip export restrictions

https://www.axios.com/2025/05/28/nvidia-earnings-jensen-huang-ai
1•doener•10m ago•0 comments

The TACO trade is the new Trump trade

https://www.businessinsider.com/trump-trade-taco-tariffs-buy-the-dip-trade-war-sp500-2025-5
2•nabla9•11m ago•0 comments

Website for Bay Area-born Victoria's falls victim to hackers

https://www.sfgate.com/local/article/bay-area-born-victorias-secret-website-down-20349547.php
1•ckrailo•11m ago•1 comments

Unpredictability and Undecidability in Dynamical Systems [pdf]

https://gwern.net/doc/cs/computable/1990-moore.pdf
2•alexmolas•15m ago•0 comments

Tool-as-State: A New Pattern for Expanding LLM Capability

https://medium.com/@mnaei/tool-as-state-a-new-pattern-for-expanding-llm-capability-71a125f035de
1•mnaei•16m ago•0 comments

Show HN: I built an AI tool that generates click-worthy YouTube thumbnails

https://thumbnailx.com/
2•gits997•17m ago•0 comments

Opera Neon

https://press.opera.com/2025/05/28/opera-neon-the-first-ai-agentic-browser/
1•thm•20m ago•0 comments

Task-management system you can drop into Cursor, Lovable, Windsurf, Roo, etc.

https://github.com/eyaltoledano/claude-task-master
1•consumer451•20m ago•0 comments

Turn a Tesla into a mapping vehicle with Mapillary

https://blog.mapillary.com/update/2020/12/09/map-with-your-tesla.html
2•faebi•22m ago•0 comments

Parking_lot: Ffffffffffffffff

https://fly.io/blog/parking-lot-ffffffffffffffff/
3•shepmaster•24m ago•0 comments

Maybe You're Not Trying

https://usefulfictions.substack.com/p/maybe-youre-not-actually-trying
1•jger15•31m ago•0 comments

Capcom reveals its PC game sales overshadow console, and it's only growing

https://www.pcguide.com/news/60-of-capcoms-digital-sales-were-on-pc-overshadowing-console-sales/
6•dcu•33m ago•0 comments

Show HN: Adaptive Geocodes – Truncatable Geocodes

https://davidcwga.github.io/agc.html
2•LeoPanthera•35m ago•0 comments

Unsold Tesla Cybertrucks are piling up at Detroit parking lot

https://techcrunch.com/2025/05/28/dozens-of-unsold-tesla-cybertrucks-are-piling-up-at-detroit-parking-lot/
8•vinni2•35m ago•0 comments

Trump admin tells SCOTUS: ISPs shouldn't be forced to boot alleged pirates

https://arstechnica.com/tech-policy/2025/05/trump-admin-tells-scotus-isps-shouldnt-be-forced-to-boot-alleged-pirates/
5•pseudolus•35m ago•0 comments

Show HN: Financial Goals for SaaS Founders

https://earlytraction.substack.com/p/b2b-saas-marketing-mistake-4-not-setting-clear-goals
1•superamped•37m ago•0 comments

Nvidia's gaming business just had its best quarter ever

https://www.theverge.com/news/676062/nvidias-gaming-business-just-had-its-best-quarter-ever
1•thesuperbigfrog•37m ago•0 comments

Overlooked Films

https://scottsumner.substack.com/p/overlooked-films
1•jger15•37m ago•0 comments

Link Text Automation in Sphinx

https://technicalwriting.dev/links/automation.html
1•todsacerdoti•38m ago•0 comments

There's a Simple Pattern to Elon Musk's Broken Promises

https://www.wired.com/story/theres-a-very-simple-pattern-to-elon-musks-broken-promises/
2•doener•39m ago•3 comments

DepthAnything v2 Tutorial: How to Convert 2D Images to 3D Models with Python

https://medium.com/data-science-collective/depthanything-v2-tutorial-how-to-convert-2d-images-to-3d-models-with-python-2708d295b7e5
2•homarp•40m ago•0 comments

AGI Won't Be Engineered–It Will Emerge

https://botverse.github.io/2025/05/28/agi-won-t-be-engineered-it-will-emerge.html
1•botverse•42m ago•0 comments

Apple in China

https://thechipletter.substack.com/p/apple-in-china
8•rbanffy•43m ago•1 comments