Machine Learning: The Native Language of Biology

https://decodingbiology.substack.com/p/machine-learning-the-native-language

53•us-merul•17h ago

Comments

bigyabai•16h ago

Look, we're all going to sit around cringing until someone says it; machine learning is explicitly the natural language of computers. In nature, neurons are not arranging themselves into neat unsigned 8-bit integers to quantize themselves for recollection. They're also networked by synapses and reactive biology, not feedforward algorithms scanning static, hereditary weights.

This whole thing feels like the author is familiar with one set of abstractions but not the other. It's very reminiscent of the (intensely fallible) Chomsky logic that leads to insane extrapolations about what biology is or isn't. Machine learning is a model, and all models are wrong.

suddenlybananas•9h ago

What do you mean by Chomsky logic?

meepmorp•6h ago

Nah, they mean UG and his theorizing about the in-born language facilitates of the human brain.

suddenlybananas•6h ago

But there's nothing intrinsically fallacious about positing UG, nor crazy extrapolations.

meepmorp•3h ago

I agree with you, I'm just pointing out what (imo) OP was referring to.

dmacfour•15h ago

"There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools."

-Leo Breiman, like 24 years ago

Machine learning isn't the native language of biology, the author just realized that there's more than one approach to modeling. I'm a statistician working in an ML role and most of the issues I run into (from a modeling perspective) are the reverse of what this article describes - people trying to use ML for the precise things inferential statistics and mechanistic models are designed for. Not that the distinction is that clear to begin with.

Fomite•9h ago

This is largely my feeling as well.

bglazer•15h ago

The problem with this machine-learned “predictive biology” framework is that it doesn’t have any prescription for what to do when your predictions fail. Just collect more data! What kind of data? As the author notes, the configuration space of biology is effectively infinite so it matters a great deal what you measure and how you measure it. If you don’t think about this (or your model can’t help you think about it) you’re unlikely to observe the conditions where your predictions are incorrect. That’s why other modeling approaches care about tedious things like physics and causality. They let you constrain the model to conditions you’ve observed and hypothesize what missing, unobserved factors might be influencing your system.

It’s also a bit arrogant in presuming that no other approaches to modeling cells cared about “prediction”. Of course, systems and mathematical biologists care about making accurate predictions, they just also care about other things like understanding molecular interactions *because that lets you make better predictions*

Not to be cynical but this seems like an attempt to export benchmark culture from ML into bio. I think that blindly maximizing test set accuracy is likely to lead down a lot dead end paths. I say this as someone actively doing ML for bio research.

j7ake•13h ago

Also predictions in biology take months or years to validate, so they lack the fast feedback loop of the vision and NLP world where the feedback is almost instant.

Combine this with the fact that In vivo data in biology is extremely limited, and we see copying the NLP and vision playbook into biology is challenging

Fomite•9h ago

This. Many of the predictions we're talking about are potentially years in the making, involve expensive data collection to validate, suffer from a lot of stochastic noise, etc.

piombisallow•14h ago

That's a lot of words, including a sentence that in which the author almost compares himself with Galileo. The proof is in the pudding no? What did you predict with it?

barbarr•13h ago

The author claims that "machine learning methods better describe many biological systems than traditional mathematical formulations", but I see very little concrete evidence in the article to support it.

Perenti•13h ago

In the third paragraph the authors state:

"For example, the Lotka-Volterra model accurately captures predator-prey dynamics using systems of differential equations."

This is incorrect. The validation of the L-V predator/prey model was considered to be the population dynamics of the Snow Shoe Hare and Canada Lynx as seen in Hudson Bay Company records. The data actually models the fashion cycles in Europe, showing prices and demand from Europe drove the efforts of the Company and the trappers. This is in the standard texts from at least the mid 90s AFAIK.

seydor•11h ago

Biological systems can be described via diff equations, e.g. neural cells can be analyzed with hodgkin-huxley type models and this can lead to bottom-up theories of biological neural networks. ML is used to approximate other more complex processes but that doesn't mean that it s impossible

suddenlybananas•9h ago

Science isn't about making predictions primarily, it's about explanations.

HappMacDonald•9h ago

Explanations in turn are tools whose only purpose is to make predictions.

jltsiren•9h ago

Explanations are also useful, because people often find them interesting.

Some things are valuable, because they keep us alive and healthy in the short term. Some things are valuable, because we find them interesting, enjoyable, or something like that. And some things are indirectly valuable, because they enable other things that are more directly valuable.

dtj1123•7h ago

This is an inaccurate statement. Geocentrism makes identical predictions to heliocentrism, but clearly the two models offer differing explanations of the dynamics of the solar system.

From an engineering perspective, yes, predictions are all that you care about. From a scientific perspective, the end goal is the simplest and most general set of explanations possible.

suddenlybananas•6h ago

In fact, geocentric models made better predictions than early heliocentric ones because epicycles allowed a better fit to the data.

randcraw•8h ago

IMHO, this article makes grand claims but doesn't substantiate them.

In what way is ML-based biology any different from the myriad statistics-based mechanistic models that systems or computational biology has employed for 50 years to model biological mechanisms and processes? Does the author claim that theory-less parameterless ML models like those in deep NNs are superior because theory-based explicitly parameterized models are doomed to fail? If so, then some specific examples / illustrations would go a long way toward making your case.

LeonardoTolstoy•8h ago

This person seems to work in a field (exercise / athletics) with an abundance of data, low stakes outcomes, reasonably well established biomarkers, etc. in other words, a field perfectly suited for a top down outcome driven analysis.

IMO the post is merely stating: "man, everyone should be doing this!" Without realizing that (1) everyone is doing this, and (2) it doesn't seem like it because many (most?) fields in biology don't work in the top down approach being suggested. Determining mechanism and function is vital in biology because in a lot of cases there just isn't the data to perform a fuzzy outcome driven analysis.

mfld•8h ago

I generally enjoyed the article. Maybe it's because the classical functional categorization/cataloging approaches in molecular biology are rarely sufficient to explain experimental data unless you are an expert and know all the exceptions and special cases. So the Predictive Biology approach seems a promising path, particularly since a lot of data for ML training is available.

That said, the formulation "machine learning is the native language of biology" seems odd.

Meta: Shut Down Your Invasive AI Discover Feed. Now

Decreasing Gitlab repo backup times from 48 hours to 41 minutes

Sandia turns on brain-like storage-free supercomputer – Blocks and Files

Odyc.js – A tiny JavaScript library for narrative games

Why Bell Labs Worked

An Interactive Guide to Rate Limiting

A masochist's guide to web development

Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction

Curate Your Shell History

4-7-8 Breathing

Too Many Open Files

VPN providers in France ordered to block pirate sports IPTV

Weaponizing Dependabot: Pwn Request at its finest

Deepnote (YC S19) is hiring engineers to build an AI-powered data notebook

Self-hosting your own media considered harmful according to YouTube

Swift and Cute 2D Game Framework: Setting Up a Project with CMake

How to (actually) send DTMF on Android without being the default call app

Ask HN: Any good tools for viewing congressional bills?

Top researchers leave Intel to build startup with 'the biggest, baddest CPU'

Silicon Valley aghast at the Musk-Trump divorce

ThornWalli/web-workbench: Old operating system as homepage

Jepsen: TigerBeetle 0.16.11

The impossible predicament of the death newts

OpenAI is retaining all ChatGPT logs "indefinitely." Here's who's affected

Small Programs and Languages

The Coleco Adam Computer

Show HN: Air Lab – A portable and open air quality measuring device

Apple warns Australia against joining EU in mandating iPhone app sideloading

Tokasaurus: An LLM inference engine for high-throughput workloads

How we’re responding to The NYT’s data demands in order to protect user privacy