AI-generated medical data can sidestep usual ethics review, universities say

https://www.nature.com/articles/d41586-025-02911-1

5•qnleigh•1h ago

Comments

evil-olive•4m ago

> To generate what is called synthetic data, researchers train generative AI models using real human medical information, then ask the models to create data sets with statistical properties that represent, but do not include, human data.

famously, "garbage in, garbage out"

but thanks to AI, we now have the exciting innovation that you can inject garbage into the middle of the process.

you have data from actual humans. it has some statistical properties.

you could look at those statistical properties, and do research on them, looking for hidden correlations or whatever. that's been possible for decades, no need for LLMs.

or, you can take those statistical properties, ask a chatbot to generate synthetic data based on them, and then do research on that synthetic data. but...why?

any valid conclusions from the research will be based on the statistical properties that were already there in the original data. the extra step of using the LLM gains nothing, and adds risk of the research being faulty because it found some correlation that the LLM made up.

this is like taking an image, saving it as a JPEG with 5% quality (or some other lossy process), and then asking an AI to upscale and enhance it for you. in the best-case, all you get is a reconstruction of the original. and realistically you'll almost certainly introduce misleading artifacts and noise.

or, scramble an egg, take a picture, and ask the chatbot to generate a picture for you of what the unbroken egg might have looked like. maybe it'll do a decent job of it...but 5 minutes ago you had the unbroken egg in your hand.

LLMs cannot reverse entropy. they cannot unscramble the egg. you can easily add randomness to a data set, but you cannot easily remove it.

Should LLMs Write FOSS Books?

A Word about Complexity

The Age of Greater Reykjavík

Show HN: Freeze Trap

How to Scam the Scammers

Stop After Current

The Vibe Coder's Guide to Product Management (Open Source Book)

Njalla Has Silently Changed: A Word of Caution

CVC acquires majority stake in Namecheap for $1.5B

The Qweremin

Experimental platform using LLMs to generate algorithmic music

TLD domain name renewal grace periods

New and simple detection method for nanoplastics

My Thoughts on Renting Versus Buying

Quote Posts

Microsoft mandates RTO – claims Teams and all remote work solutions are inferior

Riff

Dev3000 – The browser for AI-based development

Female jumping spiders drive hybridization by favoring red males across species

AI as Teleportation

How the WSJ Analyzed More Than One Million FAA Reports

Interactive Tree of Life Explorer

Ask HN: Why does AWS Route53 price .click domains at only $3/year?

Elastically Graded Embroidered Tessellations

"We must break with the idea that it is civil liberty to use encrypted apps"

Created a map of all the research on Asthma for the past 10 years

Show HN: Proxmox-GitOps: Recursive IaC LXC Container Automation

Home is where the home server is

Show HN: Worried about your pet? Health assessments with instant answers

Daisugi, the 600-Year-Old Japanese Technique of Growing Trees Out of Other Trees (2020)