Is there a balance to be struck between simple hierarchical models and

https://statmodeling.stat.columbia.edu/2024/05/26/is-there-a-balance-to-be-struck-between-simple-hierarchical-models-and-more-complex-hierarchical-models-that-augment-the-simple-frameworks-with-more-modeled-interactions-when-analyzing-real-data/

39•luu•4d ago

Comments

Onawa•21h ago

Full Title: Is there a balance to be struck between simple hierarchical models and more complex hierarchical models that augment the simple frameworks with more modeled interactions when analyzing real data?

a-dub•20h ago

"When working on your particular problem, start with simple comparisons and then fit more and more complicated models until you have what you want."

sounds algorithmic...

mnky9800n•19h ago

Yes and you can even build symbolic engines that do this for you. I think the real question we must ask ourselves as data scientists or statisticians or whatever is whether we believe these data models represent the space of data fully or by happenstance. And if by happenstance is it because the data doesn’t capture the underlying processes that produced the data or are they uncapturable in this way and function approximators like neural networks or gradient booster machines are better. And is that because those function approximators capture interactions between the driving processes that otherwise go unseen or is it because those processes have fractional dimensions that control their impact that are not captured by data models. This all is summed up well by Leo Breimans two cultures paper in my opinion. I have gone back and forth on which “culture” is the correct representation of how processes produce data. If you buy that only function approximators truly capture the complexity of whatever processes you are observing then you have to wonder why physics works so well. That’s because, at least in my opinion, from the statistical point of view physics has spent centuries developing equations that are linear combinations of variables that are essentially data models according to Leo. I hope this opinion generates discussion because I don’t know what the answer is or if it matters that there is one.

a-dub•16h ago

seems to me that one approach is fueled by data and the other is fueled by understanding. in the former, the observations form a view of behavior which is then modeled with high fidelity. in the latter, active inquiry, adversarial data collection and careful reasoning produce simpler models of hypothsized underlying processes that often prove to have nearly perfect generalization.

the interesting future is probably the one where the former produces new building blocks for the latter. (ie, the computer generates new simple and easy to understand constructs from which it explains previously not understood or well modeled phenomena.)

joe_the_user•19h ago

Well, my impression is that the statistic paradigm itself limits the complexity of a model through it's basic aims and measures. Especially, a statistical model aims to be an unbiased predictor of a variable whereas machine learning/"AI" just aims for prediction and doesn't care about bias in the sense of statistics.

klysm•17h ago

I think they have totally different goals typically. For example, let’s say we are doing a sampling procedure. How do you estimate the sampling error? I’m not aware of a machine learning technique that will help, but you can use Bayesian and MCMC techniques

usgroup•17h ago

I think this is accurate but mostly because statistical modelling aims for interpretable parameters. That very strongly regularises complexity.

A community-led fork of Organic Maps

University of Texas-Led Team Solves a Big Problem for Fusion Energy

Spade Hardware Description Language

A crypto founder faked his death. We found him alive at his dad's house

I ruined my vacation by reverse engineering WSC

Plain Vanilla Web

A Typical Workday at a Japanese Hardware Tool Store [video]

Implicit UVs: Real-time semi-global parameterization of implicit surfaces [pdf]

Paul McCartney, Elton John and other creatives demand AI comes clean on scraping

Spark AI (YC W24) Is Hiring a Full Stack Engineer in San Francisco

CrowdStrike CEO Cuts His Voting Power by 92% with Unexplained Gifts

US Copyright Office found AI companies breach copyright. Its boss was fired

Continuous Thought Machines

Intellect-2 Release: The First 32B Model Trained Through Globally Distributed RL

Armbian Updates: OMV support, boot improvents, Rockchip optimizations

Making PyPI's test suite 81% faster – The Trail of Bits Blog

Why Bell Labs Worked

Car companies are in a billion-dollar software war

Absolute Zero Reasoner

The Academic Pipeline Stall: Why Industry Must Stand for Academia

High-school shop students attract skilled-trades job offers

Scraperr – A Self Hosted Webscraper

Writing an LLM from scratch, part 13 – attention heads are dumb

Ask HN: Cursor or Windsurf?

For better or for worse, the overload (2024)

Title of work deciphered in sealed Herculaneum scroll via digital unwrapping

How friction is being redistributed in today's economy

Show HN: Codigo – The Programming Language Repository

ToyDB rewritten: a distributed SQL database in Rust, for education

LSP client in Clojure in 200 lines of code