SimpleFold: Folding proteins is simpler than you think

103•kevlened•1h ago

Comments

kylehotchkiss•1h ago

> Folding Proteins Is Simpler Than You Think

Then why do we need customized LLM models, two of which seemed to require the resources of 2 of the wealthiest companies on earth (this and google's alphafold) to do it?

wrs•1h ago

How simple did you think it was before?

kylehotchkiss•1h ago

Not simple! Wasn't/Isn't X-ray crystallography what it usually takes to determine the structure?

wrsh07•1h ago

Folding proteins is pretty valuable and this model is comparably small

This doesn't seem like particularly wasteful overinvestment.

Granted, I'm more excited about the research coming out of arc

jjtheblunt•1h ago

what are you referring to by arc?

ben_w•59m ago

Not op, but I presume the ARC prize/ARC-AGI series of tests: https://arcprize.org/

hirenj•46m ago

Arc institute probably.

aDyslecticCrow•52m ago

Its not an LLM, It's a transformer. I know the terms are really being butchered in media, but if we're gonna use the term LLM instead of AI, we better make sure it's actually a "large language model" that is being refereed to. If you're unsure, call it a neural net, or machine learning algorithm, or AI.

It's indeed a large model. But if you knew the history of the field, it's a massive improvement. It has progressed from a almost "NP" problem only barely approachable with distributed cluster compute, to something that can run on a single server with some pricey hardware. The smallest model is only here is only 100M parameters and the largest is 3B parameters, that's very approachable to run locally with the right hardware, and easily within the range for a small biotech lab (compared to the cost of other biotech equipment)

It's also (i'd argue) one of the only truly economically and sociably valuable AI technologies we've found over the past few years. Every simulated protein fold is saving a biotech company weeks of work for highly skilled biotech engineers and very expensive chemicals (In a way that that truly only supplement rather than replace the work). Any progress in the field is a huge win for society.

barbarr•1h ago

Why is apple doing protein folding?

mabedan•1h ago

Prowlly cuz Siri didn’t work out

Forbo•1h ago

Reputation laundering?

jama211•54m ago

What’s there to launder? Perhaps they shouldn’t have as good a reputation as they do, but you can’t deny they do have a good reputation.

amelius•42m ago

Reputation of what? They are just an office appliance company. 100 years ago they would be the ones making luxury staplers and typewriters.

axoltl•27m ago

You're confusing your opinion of the company with the perception by the general public. Apple's definitely not perceived as 'an office appliance company' by your average person. It's considered a high-end luxury brand by many[1].

1: https://www.researchgate.net/publication/361238549_Consumer_...

robotresearcher•10m ago

I think their public sales data shows Apple sells mainly to consumers, and mainly iPhones at that.

Like 1980s SONY, they are the top of the line consumer electronics giant of the time. The iPhone is even more successful than the Walkman or Trinitron TVs.

They also sell the most popular laptops,to consumers as well as corporate. Like SONY’s VAIO but more popular again.

IncreasePosts•1h ago

They're jealous they haven't won a Nobel prize

nextos•1h ago

Local inference. I imagine they have an interest in making this and other cutting edge models small enough to be possible to do quick inference on their desktop machines. The article shows that, with Figure 1E demonstrating inference on an M2 Max 64 GB.

Frankly, it's a great idea. If you are a small pharma company, being able to do quick local inference removes lots of barriers and gatekeeping. You can even afford to do some Bayesian optimization or RL with lab feedback on some generated sequences.

In comparison, running AlphaFold requires significant resources. And IMHO, their usage of multiple alignments is a bit hacky, makes performance worse on proteins without close homologs, and requires tons of preprocessing.

A few years back, ESM from Meta already demonstrated that alignment-free approaches are possible and perform well. AlphaFold has no secret sauce, it's just a seq2seq problem, and many different approaches work well, including attention-free SSMs.

lovasoa•44m ago

How do you call the opposite of green washing? When you want to show that you are burning as much energy on training models as the others.

giancarlostoro•39m ago

No idea, but can I be signed up for R&D jobs where you don't necessarily build something generating revenue?

Maybe these are just projects they use to test and polish their AI chips? Not sure.

shpongled•35m ago

Probably because ByteDance and Facebook (spun out into EvolutionaryScale) are doing it

robotresearcher•15m ago

Apple has an ML research group. They do a mixture of obviously-Apple things, other applications, generally useful optimizations, and basic research.

https://machinelearning.apple.com/

wild_pointer•1h ago

Did you just assume what I think about protein folding simplicity?!

IAmBroom•1h ago

Link goes the github repository behind the article you might want to read.

https://arxiv.org/abs/2509.18480

IAmBroom•1h ago

And the abstract alone says (if I'm reading it correctly), "It still takes AI; just not nearly as much as others are doing."

mentalgear•26m ago

another form: transformers for the task

turblety•1h ago

I wonder why Apple can create a model to fold proteins, but still can't get Siri to control the phone competently? I'm not sure I agree with Apple's priorities. I guess these things are not synchronous and they can work on multiple things at a time.

tanelpoder•1h ago

I guess it's because SimpleFold came from a research lab with different autonomy and less competing interests and internal politics...

frenchie4111•1h ago

I am genuinely interested where the strong negativity towards Siri has come from in recent culture. From what I gather it's likely due to the high expectations we have for Apple. But what I don't really get is why is there not a similar amount of negativity being directed at Google or Samsung, who both have equally shit phone AI assistants (obviously this is just from my perspective, I am a daily user of both iOS and a Samsung Android)

I am not trying to defend Apple or Siri by any means. I think the product absolutely should (and will) improve. I am just curious to explore why there is such negativity being directed specifically at Apple's AI assistant.

Invictus0•1h ago

For the last three iOS major versions, Siri has been unable to execute the simple command "shuffle the playlist 'Jams'", or any variation, like "play the playlist Jams on shuffle". I am upset for that reason.

samuelg123•1h ago

I think Siri has always been criticized, likely because it has never worked super well and it has the most eyes (or ears) on it (iPhones still have 50% market share in the US).

And now that we have ChatGPT with voice mode, Gemini Live, etc which have incredible speech recognition and reasoning comparatively, it's harder to argue that "every voice assistant is bad" still.

xp84•1h ago

As a vocal critic of Siri, I can give you a number of reasons we hate it:

1. It seems to be actively getting worse. On a daily basis, I see it responding to queries nonsensically, like when i say “play (song) by (artist)” (I have Apple Music) by opening my Sirius app and putting on a random thing that isn’t even that artist. Other trivial commands are frequently just met with apologies or searching the web.

2. Over a year ago Apple conducted a flashy announcement full of promises about how Siri would not only do the things that it’s been marketed as being able to do for the last decade, but also things that no one has seen an assistant do. Many people believe that announcement was based on fantasy thinking and those people are looking more and more correct every day that Apple ships no actual improvements to Siri.

3. Apple also shipped a visual overhaul of how Siri looks, which gives the impression that work has been done, leading people to be even more disappointed when Siri continues to be a pile of trash.

4. The only competitor that makes sense to compare is Google, since no one else has access to do useful things on your device with your data. At least Google has a clear path to an LLM-based assistant, since they’ve built an LLM. It seems believable that android users will have access to a Gemini-based assistant, whereas it appears to most of us that Apple‘s internal dysfunction has rendered them unable to ship something of that caliber.

citizenpaul•49m ago

Is it just my rosie glasses or did siri work much better in the first couple of years and seem to decline continually since then. I actually used it a lot initially then eventually disabled it as it never worked anymore.

devmor•42m ago

I feel like the same is true of a lot of products that moved from being programmatically connected ML workflows to multi-modal AI.

We, the consumer, have received inferior products because of the vague promise that the company might one day be able to make it cheaper if they invest now.

SoftTalker•26m ago

I've disabled Siri as much as I possibly can. I've never even tried to use it. I would do the same for any other AI assistant. I don't like that they are always listening, and I just don't like talking to computers. I find it unnatural, and I get irrationally angry when they don't understand what I want.

If I could buy a phone without an assistant I would see that as a desirable feature.

al_borland•29m ago

Something like this doesn’t actually have to work. There were no expectations at all in this space.

Meanwhile, people expect perfection from Siri. At this point a new version of Siri will never live up to people’s expectations. Had they released something on-par with ChatGPT, people would hate it and probably file a class action lawsuit against Apple over it.

The entire company isn’t going to work on Siri. In a large company there are a lot of priorities, and some things that happen on the side as well. For all we know this was one person’s weekend project to help learn something new that will later be applied to the priorities.

I’ve made plenty of hobby projects related to work that weren’t important or priorities, but what I learned along the want proved extremely valuable to key deliverables down the road.

mapmeld•9m ago

As I understand it, Siri and Alexa could be plugged into an LLM, but changing it to an "open world" device that can tell your kid something disturbing, text all of your contacts, buy groceries, etc. comes with serious risk of reputational harm. While still falling short of people's expectations if it isn't ChatGPT-quality. OpenAI is new enough that they get to play by different rules.

stephenpontes•1h ago

I remember first hearing about protein folding with the Folding @Home project (https://foldingathome.org) back when I had a spare media server and energy was cheap (free) in my college dorm. I'm not knowledgable on this, but have we come a long way in terms of making protein folding simpler on today's hardware, or is this only applicable to certain types of problems?

It seems like the Folding @Home project is still around!

nkjoep•1h ago

Team F@H forever!

_joel•53m ago

Yep, that and SETI@Home. I loved the eye candy, even if I didn't know what it fully meant.

seydor•35m ago

How come we don't have AI@Home

throwup238•32m ago

The network bandwidth between nodes is a bigger limitation than compute. The newest Nvidia cards come with 400gbit busses now to communicate between them, even on a single motherboard.

Compared to SETI or Folding @Home, this would work glacially slow for AI models.

gregsadetsky•33m ago

That and project RC5 from the same time period..! :-)

https://www.distributed.net/RC5

https://en.wikipedia.org/wiki/RSA_Secret-Key_Challenge

I wonder what kind of performance would I get on a M1 computer today... haha

EDIT: people are still participating in rc5-72...?? https://stats.distributed.net/projects.php?project_id=8

roughly•10m ago

As I understand it, folding at home was a physics based simulation solver, whereas alphafold and its progeny (including this) are statistical methods. The statistical methods are much, much cheaper computationally, but rely on existing protein folds and can’t generate strong predictions for proteins that don’t have some similarities to proteins in their training set.

In other words, it’s a different approach that trades off versatility for speed, but that trade off is significant enough to make it viable to generate protein folds for really any protein you’re interested in - it moves folding from something that’s almost computationally infeasible for most projects to something that you can just do for any protein as part of a normal workflow.

foodevl•1h ago

I was curious what the protein picture was showing: "Figure 1 Example predictions of SimpleFold on targets ... with ground truth shown in light aqua and prediction in deep teal."

and now I'm even more curious why they thought "light aqua" vs "deep teal" would be a good choice

gilleain•1h ago

Well, figure a) shows a ribbon representation of the fold (as helices and strands) of the protein 7QSW (https://www.ebi.ac.uk/pdbe/entry/pdb/7qsw) which is RubisCO (https://en.wikipedia.org/wiki/RuBisCO), an plant protein that plays a key role in photosynthesis.

The different colours are for the predicted and 'real' (ground truth) models. The fact that it is hard to distinguish is partly the - as you point out - weird colour choice, but also because they are so close together. An inaccurate prediction would have parts that stand out more as they would not align well in 3D space.

underdeserver•1h ago

So, how does this compare to AlphaFold?

mentalgear•27m ago

seems like they use the normal transformer architecture versus deep fold's more specialised machine-learning approaches.

Invictus0•1h ago

They'll do anything but fix Siri

mentalgear•26m ago

They can keep on doing stuff like this that's open-source and beneficial to society.

frenchie4111•1h ago

I am curious to hear an expert weigh in on this approach's implications for protein folding research. This sounds cool but it's really unclear to me what the implications are

geremiiah•30m ago

Their representation is simpler, just a transformer. That means you can just plug in all the theory and tools that have been developed specifically for transformers, most importantly you can scale the model easier. But more than that, I think, it shows that there was no magic to AlphaFold. The details of the architecture and training method didn't matter much. All that was needed was training a big enough model on a large enough dataset. Indeed lots of people who have experimented with AlphaFold have found it to behave similiar to LLMs, i.e. it performs well on inputs close to the training dataset and but it doesn't generalize well at all.

331c8c71•1h ago

It is for structure prediction, not folding (rolleyes).

jandom•34m ago

Pssst they'll realise scientists hand out here too

barbazoo•56m ago

No folding here. Proteins go on the hanger or in the drawer.

kazinator•41m ago

I'm satisfied with with folding roast beef onto a sandwich, or folding egg whites into batter. All the protein folding action I could ever want.

vbarrielle•36m ago

A paper that says: "our approach is simpler than the state of the art". But also does not loudly say "our approach is significantly behind the state of the art on all metrics". Not easy to get published, but I guess putting it as a preprint with a big company's name will help...

shpongled•34m ago

It's not totally novel, but it's very cool to see the continued simplification of protein folding models - AF2 -> AF3 was a reduction in model architecture complexity, and this is a another step in the direction of the bitter lesson.

hashta•15m ago

I’m not sure AF3’s performance would hold up if it hadn’t been trained on data from AF2 which itself bakes in a lot of inductive bias like equivariance

nextworddev•33m ago

In industry Google practically dominates this field

hashta•24m ago

One caveat that’s easy to miss: the "simple" model here didn’t just learn folding from raw experimental structures. Most of its training data comes from AlphaFold-style predictions. Millions of protein structures that were themselves generated by big MSA-based and highly engineered models.

It’s not like we can throw away all the inductive biases and MSA machinery, someone upstream still had to build and run those models to create the training corpus.

mapmeld•12m ago

And AlphaFold was validated with experimental observation of folded proteins using X-rays

godelski•5m ago

Is this so unusual? Almost everything that is simple was once considered complex. That's the thing about emergence, you have to go through all the complexities first to find the generalized and simpler formulations. It should be obvious that things in nature run off of relatively simple rulesets, but it's like looking at a Game of Life and trying to reverse engineer those rules AND the starting parameters. Anyone telling you such a task is easy is full of themselves. But then again, who seriously believes that P=NP?

dyauspitr•24m ago

Isn’t this a largely solved problem after Alphafold?

samfriedman•11m ago

Maybe they've been working on it, but got scooped?

Databricks IPO: Pros and Cons

Faster Rust Builds on Mac

Show HN: Palm Bread – Open method to scale up home baking (100 loaves/day)

What's New in Python 3.14

Daily Caller Opinion Column 'Explicitly' Calls for Violence

Gunman Who Attacked Midtown Office Building Had CTE

Convert user sessions into playwright scripts

Europe's Time to Shine?

Lower Than Cowards: The Surrender of America's Elites

Santino (Chimpanzee)

Busch Light Apple Is the McRib of Beer

Setting Up the Z/OS Unix Shell (Correctly and Completely)

Genie3 Generated Video Glimps

Richard Sutton – If we understood a squirrel, we'd be almost all the way to AGI [video]

Are Michigan's small farms ready to go EV? MSU demos an electric tractor

Swift global payments network experiments with Ethereum Layer 2

High Voltage Coin Cell

The Coming Violent Backlash Against AI

Inactive H5N1 influenza virus in pasteurized milk poses minimal health risks

Chrome DevTools MCP

Numbered Databases in Valkey 9.0

Market design can feed the poor

Anti-Offshoring Legislation: The New Wave of Protectionism (2005) [pdf]

Nyx – An Experiment in Artificial Survival

Trump Takes Aim at Chip Makers with New Plan to Throttle Imports

The Köln Concert

HSBC unleashes yet another "qombie": a zombie claim of quantum advantage

1I/ʻOumuamua

Ukraine: EU states agree on need for 'drone wall'

Why Use Mailing Lists?