AI for Scientific Search

https://arxiv.org/abs/2507.01903

125•omarsar•7mo ago

Comments

mixedmath•7mo ago

From the title, I had thought that this would be a new tool for searching science, such as searching the arxiv. But this is actually a survey.

I quote the conclusion of the survey:

---

In conclusion, rapid advancements in artificial intelligence, particularly large language models like OpenAI-o1 and DeepSeek-R1, have demonstrated substantial potential in areas such as logical reasoning and experimental coding. These developments have sparked increasing interest in applying AI to scientific research. However, despite the growing potential of AI in this domain, there is a lack of comprehensive surveys that consolidate current knowledge, hindering further progress. This paper addresses this gap by providing a detailed survey and unified framework for AI4Research. Our contributions include a systematic taxonomy for classifying AI4Research tasks, identification of key research gaps and future directions, and a compilation of open-source resources to support the community. We believe this work will enhance our understanding of AI’s role in research and serve as a catalyst for future advancements in the field.

---

I jumped at this because I'm a mathematician who has been complaining about the lack of effective mathematical search for several years.

Davidzheng•7mo ago

How do you view o3? I personally find it superior to google search almost always. Do you find that it often misses key references? (also mathematician)

mixedmath•7mo ago

Google is completely inadequate at mathematical search. But here is a concrete problem that no search seems to handle: given some complicated integral (say, some contour integral involving a K-Bessel function), find where it appears in the literature.

Most search will totally fail, because this is made of math symbols. Embedding-based search will give various related things involving, say, integrals and Bessel functions. But then I end up opening Gradshteyn and Ryzhik and trying to find where in this book the relevant terrible integrals appear.

This is a common experience for analytic number theorists. And it's a lousy experience.

masterjack•7mo ago

Have you found https://sugaku.net/ useful? It’s focused on math research

BrtByte•7mo ago

This paper is more of a meta-level overview than a hands-on solution

gavinray•7mo ago

I was hoping for this to announce a tool for research.

Anyone know of the best way to do something like:

"Find most relevant papers related to topic XYZ, download them, extract metadata, generate big-picture summary and entity-relationship graph"?

Having a nice workflow for this would be the best thing since sliced bread for hobbyists interested in niche science topics.

Recently found https://minicule.com which is free and lets you search + import, but it focuses more on "concept-extraction" than LLM synthesis/summary.

AustinBGibbons•7mo ago

Check out https://elicit.com/

gavinray•7mo ago

Seems potentially useful, thanks! Only drawback I can see is the small number of papers provided by the free plan, but that's reasonable I suppose.

hugeBirb•7mo ago

I've been trying to tackle this exact problem. Current process is to use exa.ai to collect a wide breadth of research papers. Do a summarization pass and convert to markdown. Search for more specific terms then give the relevant papers/context to Gemini 2.5 pro and say give me a summary. Looking for very specific resources and to be honest it's been a terrible process :|

kianN•7mo ago

Linking to a nearby thread in case this is helpful: https://news.ycombinator.com/item?id=44457928

dmezzetti•7mo ago

PaperAI is also an option if you prefer open-source: https://github.com/neuml/paperai

Disclaimer: I'm the primary author of this project.

kianN•7mo ago

I built a public literature review search tool for some graduate student friends that became pretty popular in the Santa Barbara area. It actually does exactly what you are describing.

It’s not neural network based: it leverages hierarchical mixture models to give a statistical overview of the data. It lets you build these analysis graphs via search or citation networks.

Example: https://platform.sturdystatistics.com/deepdive?search_type=e...

gavinray•7mo ago

This is genuinely incredible, tried it using a recent-ish paper on the pharmacology and mechanisms of the Androgen Receptor and my mind is blown:

https://platform.sturdystatistics.com/deepdive?fast=1&q=http...

andjar•7mo ago

A while ago, I started working on two R packages for creating 'living reviews': metawoRld and DataFindR, see https://andjar.github.io/metawoRld/articles/conceptual_overv... . You do the broad literature search yourself, but the idea is to use LLMs to select relevant studies and perform data extraction in a structured, reproducible manner. The extracted data is stored in a git repository for collaboration and version tracking, with automated validation and website generation for presenting results.

TechDebtDevin•7mo ago

"Structured and Reproducable"

tkuipers•7mo ago

I’ve found a lot of success with https://www.undermind.ai/ though I’m not sure it has the graph you’re looking for

gavinray•7mo ago

This also looks excellent, thank you!

whattheheckheck•7mo ago

Connectedpapers.com

tough•7mo ago

emergentmind is pretty good

sergeim19•7mo ago

Hi, I'm the creator of https://tatevlab.com. It does something similar + aiming to be something like a "spotify" for research papers (currently working on a feature to allow creating and sharing personal collections). It summarizes papers based on practical potential and you can find papers based on similarity. Feedback is welcome.

Metacelsus•7mo ago

https://platform.futurehouse.org/

gavinray•7mo ago

Their Chemistry LLM that's an iteration of ChemCrow is really useful, thank you!

matt1•7mo ago

My site, https://www.emergentmind.com, is exactly for this. It surfaces trending AI/ML/CS papers, summarizes them, links to social commentary, lets you read and download papers, links to topics, and more. Would love any feedback you have!

fabmilo•7mo ago

I like zotero, I started vibe coding some integration for my workflow, the project is a bit clunky to build and iterate the development specially with gemini & claude. But I think that is the direction to take instead of reinvent from scratch something

BrtByte•7mo ago

I've been thinking about a plugin that auto-suggests related papers as I write

scientific_ass•7mo ago

Was expecting a product I can try out. But still, not disappointed.

bossyTeacher•7mo ago

AI for Scientific Search yes. LLM for Scientific Search I am not sure. AI is not equivalent with LLM. I dislike it when people do it.

AI will have a brand crisis once LLMs get abandoned and researchers need to explain the public that the new AI (not LLM based) is different than the old AI (LLM based) which is different from the old AI (GOFAI)

NitpickLawyer•7mo ago

> once LLMs get abandoned

See, you start making a good point in your rant, but then go too much and stop making sense. LLMs are not going to be abandoned. They've "solved" intent from natural language. They're here to stay.

Of course "AI" will get new things. And architectures might improve. And new things will be discovered and added to the tool box. But having the ability to use natural language as input is so invaluable that there's no way we'll just abandon it...

bossyTeacher•7mo ago

We will abandon it when we find something better. That is the lifecycle of technology.

rob_c•7mo ago

Always worth noting where the authors are affiliated and I don't remember ever hearing of bytedance breaking new ground in chemical or materials research so I'm sceptical about reading this...

Amaury-El•7mo ago

AI getting into scientific research is definitely impressive. But the more we use it, the more it feels like we're slowly getting too lazy to think on our own. Human judgment and intuition seem to be fading bit by bit.

caporaltito•7mo ago

"AI" is also the opposite of scientific research: word-suggestion algorithm which guess what is the most probable next part given a set of inputs. In the end, you'll still need to prove that your theory is right.

Raghavendra8008•7mo ago

Is there any intership opportunity for me

BrtByte•7mo ago

I wonder how well these models will hold up in messy, interdisciplinary real-world projects

The P in PGP isn't for pain: encrypting emails in the browser

Show HN: Mirror Parliament where users vote on top of politicians and draft laws

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

We Mourn Our Craft

Jim Fan calls pixels the ultimate motor controller

Exploring a Modern SMTPE 2110 Broadcast Truck with My Dad

AI UX Playground: Real-world examples of AI interaction design

The Field Guide to Design Futures

The Other Leverage in Software and AI

AUR malware scanner written in Rust

Free FFmpeg API [video]

Are AI agents ready for the workplace? A new benchmark raises doubts

Show HN: AI Watermark and Stego Scanner

Clarity vs. complexity: the invisible work of subtraction

Solid-State Freezer Needs No Refrigerants

Ask HN: Will LLMs/AI Decrease Human Intelligence and Make Expertise a Commodity?

From Zero to Hero: A Brief Introduction to Spring Boot

NSA detected phone call between foreign intelligence and person close to Trump

How to Fake a Robotics Result

It's time for the world to boycott the US

Show HN: Semantic Search for terminal commands in the Browser (No Back end)

The AI CEO Experiment

Speed up responses with fast mode

MS-DOS game copy protection and cracks

Updates on GNU/Hurd progress [video]

Epstein took a photo of his 2015 dinner with Zuckerberg and Musk

MyFlames: View MySQL execution plans as interactive FlameGraphs and BarCharts

Show HN: LLM of Babel

A modern iperf3 alternative with a live TUI, multi-client server, QUIC support

Famfamfam Silk icons – also with CSS spritesheet

The P in PGP isn't for pain: encrypting emails in the browser

Show HN: Mirror Parliament where users vote on top of politicians and draft laws

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

We Mourn Our Craft

Jim Fan calls pixels the ultimate motor controller

Exploring a Modern SMTPE 2110 Broadcast Truck with My Dad

AI UX Playground: Real-world examples of AI interaction design

The Field Guide to Design Futures

The Other Leverage in Software and AI

AUR malware scanner written in Rust

Free FFmpeg API [video]

Are AI agents ready for the workplace? A new benchmark raises doubts

Show HN: AI Watermark and Stego Scanner

Clarity vs. complexity: the invisible work of subtraction

Solid-State Freezer Needs No Refrigerants

Ask HN: Will LLMs/AI Decrease Human Intelligence and Make Expertise a Commodity?

From Zero to Hero: A Brief Introduction to Spring Boot

NSA detected phone call between foreign intelligence and person close to Trump

How to Fake a Robotics Result

It's time for the world to boycott the US

Show HN: Semantic Search for terminal commands in the Browser (No Back end)

The AI CEO Experiment

Speed up responses with fast mode

MS-DOS game copy protection and cracks

Updates on GNU/Hurd progress [video]

Epstein took a photo of his 2015 dinner with Zuckerberg and Musk

MyFlames: View MySQL execution plans as interactive FlameGraphs and BarCharts

Show HN: LLM of Babel

A modern iperf3 alternative with a live TUI, multi-client server, QUIC support

Famfamfam Silk icons – also with CSS spritesheet

AI for Scientific Search

Comments