> By 2021, these engineered bacteria could be simulated in unprecedented detail. Every gene, every major protein, and nearly every metabolic reaction in JCVI-syn3A.
I think the crux is here:
> Even after years of study, 91 of JCVI-syn3A's genes remain unannotated, of which roughly one-third are essential. Deleting any single one kills the cell, yet we have no idea what they do – representing some of biology's most fundamental unsolved puzzles.
---
I think minimal cells and virtual cells are especially exciting as they open up a path to create fully controlled experimental environments for biochemistry from the ground up.
Right now sooo much time in biochemistry goes into working around the limitations of what already happens to be present in an organism. E.g. we may know 5% of mechanisms that go on in a cell, but the remaining 95% percent of mechanisms that go on may still brick your experiment, and without knowing about them you essentially have to shrug and trial and error your way through them.
In contrast in a synthetic minimal cell, we could start out with an organism where we know 95% of the mechanisms that are going on, and then study new mechanisms one gene at a time, steadily building up to bigger and bigger mechanisms.
Strangely it seems to me that a lot of effort is going more into being able to simulate full cells that contain unknown mechanisms, rather than trying to use the capabilities to create hypothesis to uncover the unknown mechanisms. Yes, that probably expedites the path towards simulating much bigger human cells, but ultimately still leaves us in the dark on most fronts.
Seems the result of this general trend in science towards brute prediction and abandoning the goal of explanation or understanding.
I imagine it's much easier to create and test hypotheses about the unknown mechanisms, when you can view them in context of a larger system, with reasonable performance, allowing you to metaphorically "grab them in your palm" and tweak on the fly. We work better when we explore things, instead of immediately taking on problems that are at the limit of our computational tools, requiring individual brains (and tons of paperwork) to make up for the difference.
In this sense, researching the nano-scale basics, and aiming to simulate micro-scale cellular systems, are actually aligned - as long as they're not cutting too much corners, the latter is creating space for former work to be done efficiently.
This is exactly what I'm an expert at, I even coined a term in the field [1], :).
Since I started doing this 15 years ago (and I know the field predates me by much), one always has had this feeling that we are so close to a big breakthrough in biological simulation, but at the same time, progress has been kind of "slow". I think the reason for that is because pushing the envelope forward in this field requires mastering three (maybe four) different disciplines, your pick of [Bio, Chem, CS, Math, Physics]. Very few people reach this level of simultaneous understanding of all these pieces.
I'm not trying to gatekeep the field, though, much of the progress here (including many of the papers mentioned in TFA) is work coming from PhD students. Anyone could jump into this, but you really need to sit down and try to make sense of it for a while, years. PhD gives one the perfect opportunity for that.
Anyway, I hope this thing keeps going on forward, it's one of the ultimate goals of Biology and it would be extremely beneficial to the world.
1: https://www.frontiersin.org/journals/plant-science/articles/...
Are there any good local (op-so ideally) tools and/or libraries one can experiment with? I have access to a couple HPC clusters and would love to learn more.
Take a look at SimTK [1].
And I would try to reproduce Karr's model [2], paper here [3]; also mentioned in the linked page.
This is the study that made me, and many others at the time, to actually take this seriously, lol. I was a student and was doing this as a hobby project, Karr's paper made me think "wait, this is actually possible, and today". It's really good if you want to learn and get your feet wet on this.
If you want, you can reach out to me at hn @ moralestapia . com, and I'll be happy to recommend some more stuff!
Most of them are built around one specific, measurable, phenotype that they want to reproduce, like estimate metabolite input/output over time.
Some others attempt to model the behavior of these cells when interacting with others, like in a colony or tissue. This is quite important because most of the phenomena that enables development, healing, regeneration, etc ... are emergent processes that only make sense when you study the whole tissue. One concrete thing you can measure/simulate here is "if I drop this hormone here, where is it going to be at time X and at what concentration" [1], which is super useful to do in silico because measuring that in real tissue, without or even with markers, is much more complicated, expensive and time consuming.
1: I wrote one of the first models that was able to do this in realistic plant tissue. Realistic here means, bounded by the chemical/physical constraints found in real plants and using a structural scaffold that resembles them as well.
nextos•4h ago
It's interesting how high-throughput perturbation assays have led to data-driven whole cell models. But these are not yet good at making robust predictions.
Probably the future are hybrid neuro-symbolic models.
donovanr•1h ago
It's nice to see the idea of virtual cells make a comeback now, though the meaning seems to have shifted to transciptomics-based transformer / gpu-powered models (which have issues[0]), it's a fun field / problem, but I think it will make better progress if we take advantage of all the varied computational work that has come before.
[0] Benchmarking Transcriptomics Foundation Models for Perturbation Analysis : one PCA still rules them all https://arxiv.org/abs/2410.13956