Is this really not the case? I've read some of the AI papers in my field, and I know many other domain experts have as well. That said I do think that CS/software based work is generally easier to check than biology (or it may just be because I know very little bio).
Time-wise it depends where in the process you start!
Do you know what your target protein even is? I've seen entire PhDs trying to purify a single protein - every protein is purified differently, there are dozens of methods and some work and some don't. If you can purify it you can run a barrage of tests on the protein - is it a kinase, how does it bind to different assays etc. That gives you a fairly broad idea in which area of activity your protein sits.
If you know what it is, you can clone it into your vector like E. coli. Then E. coli will hopefully express it. That's a few weeks/months of work, depending on how much you want to double-check.
You can then use fluorescent tags like GFP to show you where in the cell your protein is located. Is it in the cell-wall? is it in the nucleus? that might give you an indication to function. But you only have the location at this point.
If your protein is in an easily kept organism like mice, you can run knock-out experiments, where you use different approaches to either turn off or delete the gene that produces the protein. That takes a few months too - and chances are nothing in your phenotype will change once the gene is knocked out, protein-protein networks are resilient and there might be another protein jumping in to do the job.
if you have an idea of what your protein does, you can confirm using protein-protein binding studies - I think yeast two-hybrid is still very popular for this? It tests whether two specific proteins - your candidate and another protein - interact or bind.
None of those tests will tell you 'this is definitely a nicotinamide adenine dinucleotide binding protein', every test (and there are many more!) will add one piece to the puzzle.
Edit: of course it gets extra-annoying when all these puzzle pieces contradict each other. In my past life I've done some work with the ndh genes that sit on plant chloroplasts and are lost in orchids and some other groups of plants (including my babies), so it's interesting to see what they actually do and why they can be lost. It's called ndh because it was initially named NADH-dehydrogenase-like, because by sequence it kind of looks like a NADH dehydrogenase.
There's a guy in Japan (Toshiharu Shikanai) who worked on it most of his career and worked out that it certainly is NOT a NADH dehydrogenase and is instead a Fd-dependent plastoquinone reductase. https://www.sciencedirect.com/science/article/pii/S000527281...
Knockout experiments with ndh are annoying because it seems to be only really important in stress conditions - under regular conditions our ndh- plants behaved the same.
Again, this is only one protein, and since it's in chloroplasts it's ultra-common - most likely one of the more abundant proteins on earth (it's not in algae either). And we still call it ndh even though it is a Ferredoxin-plastoquinone reductase.
This is basically another argument that deep learning works only as a [generative] information retrieval - i.e a stochastic parrot, due to the fact that the training data is a very lossy representation of the underlying domain.
Because the data/labels of genes do not always represent the underlying domain (biology) perfectly, the output can be false/invalid/nonsensical.
in cases where it works very well - there is data leakage, because by design LLMs are information retrieval tools. It comes form the information theory standpoint, a fundamental "unknown unknown" for any model.
my takeaway is that its not a fault of the algorithm, its more the fault of the training dataset.
We humans operate fluidly in the domain of natural language, and even a kid can read and evaluate whether text make sense or not - this explains the success of models trained on NLP.
but in domains where training data represents the fundamental domain with losses, it will be imperfect.
The embedding space can represent relationships between words, sentences and paragraphs, and since those things can encode information about the underlying domain, you can query those relationships with text and get reasonable responses. The problem is it's not always clear what is being represented in those relationships as text is a messy encoding scheme.
But another weakness is that as you say it is generative, and in order to make it generative we are instead of hardcoding in a database all possible questions and all possible answers, we offload some of the data to an algorithm (next token prediction) in order to get the possibility of an imprecise probabilistic question/prompt (which is useful because then you can ask anything).
But the problem is no single algorithm can ever encode all possible answers to all possible questions in a domain accurate way and so you lose some precision in the information. Or at least this is how I see LLMs atm.
but even if we for simplicity of the argument assume that is true without question, LLM still are here to stay
Like think about how do junior devs which (in programming) average or less skill work, they "retrieve" the information about how to solve the problem from stack overflow, tutorials etc.
So giving all your devs some reasonable well done AI automation tools (not just a chat prompt!!) is like giving each a junior dev to delegate all the tedious simple tasks, too. Without having to worry about that task not allowing the junior dev to grow and learn. And to top it of if there is enough tooling (static code analysis, tests, etc.) in place the AI tooling will do the write things -> run tools -> fix issues loops just fine. And the price for that tool is like what, a 1/30th of that of a junior dev? Means more time to focus on the things which matter including teaching you actual junior devs ;)
And while I would argue AI isn't full there yet, I think the current fundation models _might_ already be good enough to get there with the right ways of wiring them up and combining them.
Whereas in biology, the natural domain is in physical/chemical/biological reactions occuring between organisms and molecules. The laws of interactions are not created by human, but by Creator(tm), and so the training dataset is barely capturing a tiny fraction and richness of the domain and its interactions. Because of this, any model will be inadequate
The worse science, publish or perish pulp, got more academic karma Altmetric/Citations -> $$$
AI is a perfect academic, the science and curiosity is gone and the ability to push out science looking text is supermaxxed.
Tragic end solution, do the same and throw even more money at it
> At a time when funding is being slashed, I believe we should be doing the opposite
AI has show academia is beyond broken in a way that can't be ignored to the world and academia won't get their heads out of the granular sediments between 0.0625 mm and 2 mm in diameter.
Defund academia now.
I would have though that the common catastrophic failure stories of fully rewriting systems all at once instead of fixing them bit by bit would help IT people to know better.
Except that we can’t compare twitter to nature journal. Science is supposed to be immune to these kind of bullshit thanks to reputed journals and pair reviewing, blocking a publication before it does any harm.
Was that a failure of nature ?
We've taken this all too far. It is bad enough to lie to the masses in Pop-Sci articles. But we're straight up doing it in top tier journals. Some are good faith mistakes, but a lot more often they seem like due diligence just wasn't ever done. Both by researchers and reviewers.
I at least have to thank the journals. I've hated them for a long time and wanted to see their end. Free up publishing and bullshit novelty and narrowing of research. I just never thought they'd be the ones to put the knife through their own heart.
But I'm still not happy about that tbh. The only result of this is that the public grows to distrust science more and more. In a time where we need that trust more than ever. We can't expect the public to differentiate nuanced takes about internal quibbling. And we sure as hell shouldn't be giving ammunition to the anti-science crowds, like junk science does...
obviously this is hyperbole of two extremes, but i certainly trust a journal far more if it actively and loudly looks to correct mistakes over one that never corrects anything or buries its retractions.
a rather important piece of science is correcting mistakes by gathering and testing new information. we should absolutely be applauding when a journal loudly and proactively says “oh, it turns out we were wrong when we declared burying a chestnut under the oak tree on the third thursday of a full moon would cure your brothers infected toenail.”
But I think problem is what is seen is "high quality" == "high impact". Which means that prestige and visibility is important things. Which likely lowers the threshold quite a lot as being first to publish possibly valid something is seen as important.
> shouldn’t we expect a high quality journal to retract often as we gather more information?
This is complicated, and kinda sad tbh. But no.You need to carefully think about what "high quality journal" means. Typically it is based on something called Impact Factor[0]. Impact factor is judged by the number of citations a journal has received in the last 2 years. It sounds good on paper, but I think if you think about it for a second you'll notice there's a positive feedback loop. There's also no incentive that it is actually correct.
For example, a false paper can often get cited far more than a true paper. This is because when you write the academic version of "XYZ is a fucking idiot, and here's why" you cite their paper. It's good to put their bullshit down, but it can also just end up being Streisand effect-like. Journal is happy with its citations. Both people published in them. They benefit from both directions. You keep the bad paper up for the record and because as long as the authors were actually acting in good faith, you don't actually want to take it down. The problem is... how do you know?
Another weird factor used is Acceptance Rates. This again sounds nice at first. You don't want a journal publishing just anything, right?[1] The problem comes when these actually become targets (which they are). Many of the ML conferences target about 25% acceptance rate[2]. It fluctuates year to year. It should, right? Some years are just better science than other years. Good paper hits that changes things and the next year should have a boom! But that's not the level of fluctuation we're talking about. If you look at the actual number of papers accepted in that repo you'll see a disproportionate number of accepted papers ending in a 0 or 5. Then you see the 1 and 6, which is a paper being squeezed in, often for political reasons. Here, I did the first 2 tables for you. You'll see that has a very disproportionate ending of 1 and 6 and CV loves 0,1,3 These numbers should convince you that this is not a random process, though they should not convince you it is all funny business (much harder to prove). But it is at least enough to be suspicious and encourage you to dig in more.
There's a lot that's fucked up about the publishing system and academia. Lots of politics, lots of restricted research directions, lots of stupid. But also don't confuse this for people acting in bad faith or lying. Sure, that happens. But most people are trying to do good and very few people in academia are blatantly publishing bullshit. It's just that everything gets political. And by political I don't mean government politics, I mean the same bullshit office politics. We're not immune from that same bullshit and it happens for exactly the same reasons. It just gets messier because if you think it is hard to measure the output of an employee, try to measure the output of people who's entire job it is to create things that no one has ever thought of before. It's sure going to look like they're doing a whole lot of nothing.
So I'll just leave you with this (it'll explain [1])
As a working scientist, Mervin Kelly (Director of Bell Labs (1925-1959)) understood the golden rule,
"How do you manage genius? You don't."
https://1517.substack.com/p/why-bell-labs-worked
There's more complexity like how we aren't good at pushing out frauds and stuff but if you want that I'll save it for another comment.[0] https://en.wikipedia.org/wiki/Impact_factor
[1] Actually I do. As long as it isn't obviously wrong, plagiarized, or falsified, then I want that published. You did work, you communicated it, now I want it to get out into the public so that it can be peer reviewed. I don't mean a journal's laughable version of peer review (3-4 unpaid people that don't study your niche and are more concerned with if it is "novel" or "impactful" quickly reading your paper and you're one of 4 on their desk they need to do this week. It's incredibly subjective and high impact papers (like Nobel Price winning papers) routinely get rejected). Peer review is the process of other researchers replicating your work, building on it, and/or countering it. Those are just new papers...
[2] https://github.com/lixin4ever/Conference-Acceptance-Rate
It has nothing to do with science, but rather people not finding that a sufficient justification for unpopular actions. For instance it's 100% certain that banning sugary drinks would dramatically improve public health, reduce healthcare costs, increase life expectancy, and just generally make society better in every single way.
So should we ban sugary drinks? It'd be akin to me trying to claim that if you say no then you're anti-science, anti-health, or whatever else. It's just a dumb, divisive, and meaningless label - exactly the sort politicians love so much now a days.
Of course there's some irony in that it will become a self fulfilling prophecy. The more unpopular things done in the name of 'the science', the more negative public sentiment to 'the science' will become. Probably somewhat similar to how societies gradually became secular over time, as it became quite clear that actions done in the name of God were often not exactly pious.
---
Also the flat earth people actually aren't trying to argue against science (the process). They're arguing that everyone except them made either observational errors or reasoning errors.
https://plato.stanford.edu/entries/scientific-realism/#WhatS...
> Almost nobody is "anti-science".
Last I checked: - 15% of Americans don't believe in Climate Change[0]
- 37% believe God created man in our current form within the last ~10k years
(i.e. don't believe in evolution)[1]
I don't think these are just rounding errors.They're large enough numbers that you should know multiple people who hold these beliefs unless you're in a strong bubble.
I'm obviously with you in news and pop-sci being terrible. I hate IFuckingLoveScience. They're actually just IFuckingLoveClickbait. My point was literally about this bullshit.
90% of the time it is news and pop-sci miscommunicating papers. Where they clearly didn't bother to talk to authors and likely didn't even read the paper. "Scientists say <something scientists didn't actually say>". You see this from eating chocolate, drinking a glass of red wine, to eating red meat or processed meat. There are nuggets of truth in those things but they're about just as accurate as the grandma that sued McDonalds over coffee that was too hot. You sure bet this stuff creates distrust in science
[0] https://record.umich.edu/articles/nearly-15-of-americans-den...
[1] https://news.gallup.com/poll/647594/majority-credits-god-hum...
Me? I barely believe in the results of my experiments. But I also know what this poll is intending to ask and yeah, I read enough papers, processed enough data, did enough math, and tracked enough predictions that ended up coming true. That's enough to convince me it's pretty likely that those spending a fuck more time on it (and are the ones making those predictions that came true!) probably know what they're talking about.
That's almost weirder than declaring that 15% of people not believing in anthropogenic global warming is some sort of crisis. It's a theory that seems to fit the data (with caveats), not an Axiom of Science.
It's actually bizarre that 85% of people trust Science so much that they would believe in something that they have never seen any direct evidence of. That's a result of marketing. The public don't believe in global warming because it's "correct"; they have no idea if it's correct, and they often believe in things that are wrong that people in white coats on television tell them.
> According to your model, scientists who believe in God are anti-science.
In a way, yes. But every scientist I know that also believes in God is not shy in admitting their belief is unscientific.The reason I'm giving this a bit of a pass is because in science we need things that are falsifiable. The burden of proof should be on those believing in God. But such a belief is not falsifiable. You can't prove or disprove God. If they aren't pushy, they're okay with admitting that, and don't make a big deal out of it then I don't really care. That's just being a decent person.
But that's a very different thing than not believing in things we have strong physical evidence for, strong mathematical theories, and a long record of making counter factual predictions. The great thing about science is it makes predictions. Climate science has been making pretty good ones since the 80's. Every prediction comes with error bounds. Those are tightening but the climate today matches those predictions within error. That's falsifiable
Disagreeing with some consensus is not "anti-science". The term doesn't even make any sense, which is because it's a political and not a scientific term. I mean imagine if we claimed everybody who happens to believe MOND is more likely than WIMPs as an explanation for dark matter, to be "anti-science". It's just absolutely stupid. Yet we do exactly that on other topics where suddenly you must agree with the consensus or you're just "anti-science"? I mean again, it makes no sense at all.
Anti science means to make claims that have no basis in that process or to categorically reject the body of work that was based on that process.
For instance none other than Einstein rejected a probabilistic interpretation of quantum physics, the Copenhagen Interpretation, all the way to his death. Many of his most famous quotes like 'God does not play dice with the universe.' or 'Spooky action at a distance.' were essentially sardonic mocking of such an interpretation, the exact one that we hold as the standard today. It was none other than Max Planck that remarked, 'Science advances one funeral at a time' [1], precisely because of this issue.
And so freedom to express, debate, and have 'wrong ideas' in the public mindshare is quite critical, because it may very well be that those wrong ideas are simply the standard of truth tomorrow. But most societies naturally turn against this, because they believe they already know the truth, and fear the possibility of society being misled away from that truth. And so it's quite natural to try to clamp down, implicitly or explicitly, on public dissenting views, especially if they start to gain traction.
> none other than Einstein rejected a probabilistic interpretation of quantum physics
That has been communicated to you wrong and a subtle distinction makes a world of difference.Plenty of physicists then and now still work hard on trying to figure out how to remove uncertainty in quantum mechanics. It's important to remember that randomness is a measurement of uncertainty.
We can't move forward if the current paradigm isn't challenged. But the way it is challenged is important. Einstein wasn't going around telling everyone they were wrong, but he was trying to get help in the ways he was trying to solve it. You still have to explain the rest of physics to propose something new.
Challenging ideas is fine, it's even necessary, but at the end of the day you have to pony up.
The public isn't forming opinions about things like Einstein. They just parrot authority. Most HN users don't even understand Schrödinger's cat and think there's a multiverse.
For instance this is the complete context of his spooky action at a distance quote: "I cannot seriously believe in [the Copenhagen Interpretation] because the theory cannot be reconciled with the idea that physics should represent a reality in time and space, free from spooky action at a distance." Framing things like entanglement as "spooky action at a distance" was obviously being intentionally antagonistic on top of it all as well.
---
And yes, if it wasn't clear by my tone - I believe the West in has gradually entered onto the exact sort of death of science phase I am speaking about. A century ago you had uneducated (formally at least) brothers working as bicycle repairmen pushing forward aerodynamics and building planes in their spare time. Today, as you observe, even people with excessive formal education, access to [relatively] endless resources, endless information, and more - seem to have little ambition in exploiting that, rather than passively consuming it. It goes some way to explaining why some think LLMs might lead to AGI.
> Disagreeing with some consensus is not "anti-science".
Be careful of gymnastics.Yes, science requires the ability to disagree. You can even see in my history me saying a scientist needs to be a bit anti authoritarian!
But HOW one goes about disagreeing is critical.
Sometimes I only have a hunch that what others believe is wrong. They have every right to call me stupid for that. Occasionally I'll be able to gather the evidence and prove my hunch. Then they are stupid for not believing like I do, but only after evidenced. Most of the time I'm wrong though. Trying to gather evidence I fail and just support the status quo. So I change my mind.
Most importantly, I just don't have strong opinions about most things. Opinions are unavoidable, strong ones aren't. If I care about my opinion, I must care at least as much about the evidence surrounding my opinion. That's required for science.
Look at it this way. When arguing with someone are you willing to tell them how to change your mind? I will! If you're right, I want to know! But frankly, I find most people are arguing to defend their ego. As if being wrong is something to be embarrassed about. But guess what, we're all wrong. It's all about a matter of degree though. It's less wrong to think the earth is a sphere than flat because a sphere is much closer to an oblate spheroid.
If you can't support your beliefs and if you can't change your mind, I don't care who you listen to, you're not listening to science
The root causes can be argued...but keep that in mind.
No single paper is proof. Bodies of work across many labs, independent verification, etc is the actual gold standard.
[1] - https://search.brave.com/search?q=site%3Anytimes.com+Journal...
[2] - https://en.wikipedia.org/wiki/Replication_crisis#In_psycholo...
> higher retraction/unverified
Scientific consensus doesn't advance because a single new ground-breaking claim is made in a prestigious journal. It advances when enough other scientists have built on top of that work.
The current state of science is not 'bleeding edge stuff published in a journal last week'. That bleeding edge stuff might become part of scientific consensus in a month, or year or three, or five - when enough other people build on that work.
Anybody who actually does science understands this.
Unfortunately, people with poor media literacy who only read the headlines don't understand this, and assume that the whole process is all a crock.
I didn't think this was new? Like, it's been a few years since that replication crisis things kicked off.
You have misplaced confidence in the scientific method. It was never immune to corruption, either by those deliberately manipulating it for their personal gain, or simply due to ignorance and bad methodology. We have examples of both throughout history. In either case, peer review is not infallible.
The new problem introduced by modern AI tools is that they drastically lower the skill requirement for anyone remotely capable in the field to generate data that appears correct on the surface, with relatively little effort and very quickly, while errors can only be discovered by actual experts in the field investing considerable amounts of time and resources. In some fields like programming the required resources to review code are relatively minor, but in fields like biology this (from what I've read) is much more difficult and expensive.
But, yes, science is being flooded with (m|d)isinformation, just like all our other media channels.
partially due to legacy of science historically being rooted in "it matters more who you (or your parents) are" societies (due to them having had the money in somewhat modern history) (or like some would say the "old white man problem", except it has nothing to do with skin color, or man and only limited to do with old)
partially due to how much more "science (output)" is produced today and ways which once worked to have reasonable QA don't work that well in todays scale anymore
partially due to how many flows
partially due to human nature (as in people tend to care more about "exiting", "visible" things etc.)
People have been pushing for change in a lot of ways like:
- pushing to make full re-poducability a must have (but that is hard, especially for statistics based things only a few companies can even afford to try to run. But also hard due to it requiring a lot of transparency and open data access, and especially the alter is often very much something many owners of data sets are not okay with.
- pushing for more appreciation of null results, or failures. (To be clear I mean both appreciation in form of monetary support and in the traditional sense of the word of people (colleges) appreciating it).
- pushing for more verifying of papers by trying to reproduce it (both as in more money/time resources for it and in changing the mind set from it being a daunting unappreciated task to it being a nice thing to do)
but to little change happened in the end before modern LLM AI hit the scene and now it has made things so much harder as it's now easy to mass produce slob but reasonable looking (non) sience
> although later investigation suggests there may have been data leakage
I think this point is often forgotten. Everyone should assume data leakage until it is strongly evidenced otherwise. It is not on the reader/skeptic to prove that there is data leakage, it is the authors who have the burden of proof.It is easy to have data leakage on small datasets. Datasets where you can look at everything. Data leakage is really easy to introduce and you often do it unknowingly. Subtle things easily spoil data.
Now, we're talking about gigantic datasets where there's no chance anyone can manually look through it all. We know the filter methods are imperfect, so it how do we come to believe that there is no leakage? You can say you filtered it, but you cannot say there's no leakage.
Beyond that, we are constantly finding spoilage in the datasets we do have access to. So there's frequent evidence that it is happening.
So why do we continue to assume there's no spoilage? Hype? Honestly, it just sounds like a lie we tell ourselves because we want to believe. But we can't fix these problems if we lie to ourselves about them.
Tacked their actual point on to the end of a copy paste of op comments context, ended up writing something barely grammatically correct.
In doing so they prove why exactly not to listen to the internet. So they have that going for them.
For an example Medicare and Medicade had a fraud rate of 7.66%. Yes, that is a lot of billions, and there is room for improvement, but that doesn’t mean the entire system is failing: 93% of cases are being covered as intended.
The same could be said with these models. If the spoilage rate is 10%, does that mean the whole system is bad? Or is it at a tolerable threshold?
[1]: https://www.cms.gov/newsroom/fact-sheets/fiscal-year-2024-im...
"Acceptable" thresholds are problem specific. For AI to make a meaningful contribution to protein function prediction, it must do substantially better than current methods, not just better than some arbitrary threshold.
There's also the problem of false negatives vs positives. If your goal is to cover 100% of true cases you can achieve that easily by just never denying a claim. That would of course yield stratospheric false positive rates (fraud). You have to understand both the FN rate (cost of missed fraud) vs the FP rate (cost of fraud fighting) and then balance them.
The same applies with using models in science to make predictions.
CERT’s annual assessments do seem to involve a large-scale, rigorous analysis of an independent sample of 50,000 cases, though. And those case audits seem, at least on paper and to a layperson, to apply rather more thorough scrutiny than Medicare’s day-to-day policies and procedures.
As @patio11 says, and to your point, “the optimal amount of fraud is non-zero”… [2]
[0] https://www.cms.gov/data-research/monitoring-programs/improp...
[1] https://www.cms.gov/research-statistics-data-and-systems/mon...
[2] https://www.bitsaboutmoney.com/archive/optimal-amount-of-fra...
> The better question is: what is the acceptable threshold?
Currently we are unable to answer that question. AND THAT'S THE PROBLEMI'd be fine if we could. Well, at least far less annoyed. I'm not sure what the threshold should be, but we should always try to minimize it. At least error bounds would do a lot of good at making this happen. But right now we have no clue and that's why this is such a big question that people keep bringing up. We don't point out specific levels of error because they are small and we don't want you looking at them, rather we don't point them out because nobody has a fucking clue.
And until someone has a clue, you shouldn't trust that they error rate is low. The burden of proof is on the one making the claim of performance, not the one asking for evidence to that claim (i.e. skeptics).
Btw, I'd be careful with percentages. Especially when numbers are very high. e.g. LLMs are being trained on trillions of tokens. 10% of 1 trillion is 100 bn. The entire work of Shakespeare is 1.2M tokens... Our 10% error rate would be big enough to spoil any dataset. The bitter truth is that as the absolute number increases, the threshold for acceptable spoilage (in terms of percentage) needs to decrease.
I‘m fine with 5% failure if my soup is a bit too salty. Not fine with 0.1% failure if it contains poison.
(We can even get more nuanced. What kind of poison?)
That is, the problem is not that the AI is wrong X% of the time. The problem is that, in the presence of a data leak, there is no way of knowing what the value of X even is.
This problem is recursive - in the presence of a data leak, you also cannot know for sure the quantity of data that has leaked.
And then I asked it for [ad lib cocktail request] and got back thorough instructions.
We did that with sand. That we got from the ground. And taught it to talk. And write C programs.
Never mind what? That I had to ask twice? Or five times?
What maximum number of requests do you feel like the talking sand needs to adequately answer your question in before you are impressed by the talking sand?
But I think people aren‘t arguing about how amazing it is, but about specific applicability. There‘s also a lot of toxic hype and FUD going around, which can be tiring and frustrating.
The disconnect here is that the cost of iteration is low and it’s relatively easy to verify the quality of a generated C program (does the compiler issue warnings or errors? Does it pass a test suite?) or a recipe (basic experience is probably enough to tell if an ingredient sends out of place or proportions are wildly off)
In science, verifying a prediction is often super difficult and/or expensive because at prediction time we’re trying to shortcut around an expensive or intractable measurement or simulation. Unreliable models can really change the tradeoff point of whether AI accelerates science or just massively inflated the burn rate
Let's suppose you read a paper that does "X with Y", but you are interested in "Z", so the brilliant idea is to do "Z with Y" and publish the new combination, and citing it.
Sometimes you cross your fingers and just try "Z with Y", but if the initial attempt fails or you are too cautious you try "X with Y" to ensure you understand the details of the original paper.
If the reproduction of "X with Y" is a success, you now try "Z with Y" and if it works you publish it.
If the reproduction of "X with Y" is a failure, you may email the authors of just drop the original paper in the recycle bin. Publishing a failure of a reproduction is too difficult. This is a bad incentive, but it's also too easy to make horrible mistakes and fail.
However, it rarely takes the form of explicit replication of the published findings. More commonly, the published work makes a claim, and such a claim leads to further hypotheses (predictions), which others may attempt to demonstrate/veriify.
During this second demonstration/study, the claims of the first study are verified.
I believe that if we’re not even willing to carefully confirm whether our predictions match reality, then no matter how impressive the technology looks, it’s only a fleeting illusion.
That is true for any organisation or any person that's different from you. Companies ain't special here.
> The reason it isn't paying you is that its main goal is to make more money than it spends.
Making money is the main goal of many companies, but not all.
Almost any goals of any organisation or any person can be furthered by having more money rather than less. So everyone has a similar incentive to pay you less. (This includes charities. All else being equal, if you can pay your workers less, you can hand out more free malaria nets.) But as https://news.ycombinator.com/item?id=44179846 points out, they pay you, so that you work for them.
See also https://en.wikipedia.org/wiki/Instrumental_convergence
Why are people using transformers? Do they have any intuition that they could solve the challenge, let alone efficiently?
It's the same as "AI can code". It gets caught with failing spectacularly when the problem isn't in the training set over and over again, and people are surprised every time.
But yes, unmanaged and unchecked it absolutely cannot to the full job of really any human. It's not close.
Honestly, for straight-up classification? I’d pick SVM or logistic any day. Transformers are cool, but unless your data’s super clean, they just hallucinate confidently. Like giving GPT a multiple-choice test on gibberish—it will pick something, and say it with its chest.
Lately, I just steal embeddings from big models and slap a dumb classifier on top. Works better, runs faster, less drama.
Appreciate this post. Needed that reality check before I fine-tune something stupid again.
Source: bitter, bitter experience. I once predicted the placebo effect perfectly using a random forest (just got lucky with the train/test split). Although I'd left academia at that point, I often wonder if I'd have dug in deeper if I'd needed a high impact paper to keep my job.
but that’s how science advances
there should be an arxiv for rebuttals maybe
Yeah, me too. There was a paper doing the rounds a few years back (computer programming is more related to language skill rather than maths) so I downloaded the data and looked at their approach, and it was garbage. Like, polynomial regression on 30 datapoints kind of bad.
And based on my experience during the PhD this is very common. It's not surprising though, given the incentive structure in science.
If I gave a classroom of under grad students a multiple choice test where no answers were correct, I can almost guarantee almost all the tests would be filled out.
Should GPT and other LLMs refuse to take a test?
In my experience it will answer with the closest answer, even if none of the options are even remotely correct.
1: People who have a financial stake in the AI hype
as such I would expect students to but in something. However after class they would talk about how bad they think they did because they are all self aware enough to know where they guessed.
Humans have made progress by admitting when they don’t know something.
Believing an LLM should be exempt from this boundary of “responsible knowledge” is an untenable path.
As in, if you trust an ignorant LLM then by proxy you must trust a heart surgeon to perform your hip replacement.
A good analogy would be if someone claimed to be a doctor and when I asked if I should eat lead or tin for my health they said “Tin because it’s good for your complexion”.
Sure but this is still indirectly using transformers.
You may know this but many don't -- this is broadly known as "transfer learning".
I feel that we're wrong to be focusing so much on the conversational/inference aspect of LLMs. The way I see it, the true "magic" hides in the model itself. It's effectively a computational representation of understanding. I feel there's a lot of unrealized value hidden in the structure of the latent space itself. We need to spend more time studying it, make more diverse and hands-on tools to explore it, and mine it for all kinds of insights.
if you wanna peek where their heads at, start here https://www.anthropic.com/research/mapping-mind-language-mod... not just another ai blog. actual systems brain behind it.
[meta] Here’s where I wish I could personally flag HN accounts.
and a bunch of phone/tablet keyboards do so, too
I like em dashes I had considered installing a plugin to reliably turn -- into em dash in the past, if I hadn't discarded that idea you would have seen some in this post ;)
And I think I have seen at lest one spell checking browser plugin which does stuff like that.
Oh and some people use 3rd party interfaces to interact with HN, such which do auto convert consecutive dashes to em dashes.
In the places where I have been using AI from time to time it's also not supper common to use em dashes.
So IMHO "em dash" isn't a tall tell sign for something being AI written.
But then wrt. the OP comment I think you might be right anyway. It's writing style is ... strange. Like taking a writing style from a novel and not any writing style but such which over exaggerates that currently a story is told inside a story. But then fills semantics of a HN comment. Like what you might get if you ask a LLM to "tell a story" for you set of bullet points.
But this opens a question, if the story still comes from a human isn't it fine? Or is it offensive that they didn't just give us compact bullet points?
Putten that aside, there is always the option that the author is just very well read/written, maybe a book author, maybe a hobby author and picked up such a writing style.
I have endash bound to ⇧⌥⌘0, and emdash bound to ⇧⌥⌘=.
This seems to be exactly the kind of results we would expect from a system that hallucinates, has no semantic understanding of the content, and is little more than a probabilistic text generator. This doesn't mean that it can't be useful when placed in the right hands, but it's also unsurprising that human non-experts would use it to cut corners in search of money, power, and glory, or worse—actively delude, scam, and harm others. Considering that the latter group is much larger, it's concerning how little thought and resources are put into implementing _actual_ safety measures, and not just ones that look good in PR statements.
Or even of the Internet in general.
I guess it's a common pitfall with information or communication technologies ?
(Heck, or with technologies in general, but non-information or communication ones rarely scale as explosively...)
This doesn't mean that there aren't very valid use cases for these technologies that can benefit humanity in many ways (and I mean this for both digital currencies and machine learning), but unfortunately those get drowned out by the opportunity seekers and charlatans that give the others the same bad reputation.
As usual, it's best to be highly critical of opinions on both extreme sides of the spectrum until (and if) we start climbing the Slope of Enlightenment.
(Not a binary -- ground truth is available enough for AI to be useful to lots of programmers.)
That's many times not easy to verify at all ...
- correct syntax
- passes lints
- type checking passes
- fast test suite passes
- full test suite passes
and every time it doesn't you feed it back into the LLM, automatically, in a loop, without your involvement.
The results are often -- sadly -- too good to not slowly start using AI.
I say sadly because IMHO the IT industry has gone somewhere very wrong due to growing too fast, moving too fast and getting so much money so that the companies spear heading them could just throw more people at it instead of fixing underlying issues. There is also a huge diverge between sience about development, programming, application composition etc. (not to be confused with since about idk. data-structures and fundamental algorithms) and what the industry uses, how it advances etc.
Now I think normally the industry would auto correct at some point, but I fear with LLMs we might get even further away from any fundamental improvements, as we find even more ways to still go along and continue the mess we have.
Worse performance of LLM coding is highly dependent on how much very similar languages are represented in it's dataset, so new languages with any breakthrough/huge improvements or similar will work less good with LLMs. If that trend continues that would lock us in with very mid solutions long term.
Until the concept of consequences and punishment are part of AI systems, they are missing the biggest real world component of human decision making. If the AI models aren’t held responsible, and the creators / maintainers / investors are not held accountable, then we’re heading for a new Dark Age. Of course this is a disagreeable position because humans reading this don’t want to have negative repercussions - financially, reputationally, or regarding incarceration - so they will protest this perspective.
That only emphasizes how I’m right. AI doesn’t give a fuck about human life or its freedom because it has neither. Grow up and start having real conversations about this flaw, or make peace that eventually society will have an epiphany about this and react accordingly.
Deep, accurate, real-time code review could be of huge assistance in improving quality of both human- and AI-generated code. But all the hype is focused on LLMs spewing out more and more code.
The danger behind usage of LLMs is that managers do not see the diligent work needed to ensure whatever they come up with is correct. They just see a slab of text that is a mixture of reality and confabulation, though mostly the latter, and it looks reasonable enough, so they think it is magic.
Executives who peddle this nonsense don't realize that the proper usage requires a huge amount of patience and careful checking. Not glamorous work, as the author states, but absolutely essential to get good results. Without it, you are just trusting a bullshit artist with whatever that person comes up with.
amelius•1d ago
YossarianFrPrez•1d ago
Practically speaking, I think there are roles for current LLMs in research. One is in the peer review process. LLMs can assist in evaluating the data-processing code used by scientists. Another is for brainstorming and the first pass at lit reviews.
ojosilva•1d ago
bee_rider•1d ago
DrScientist•1d ago
A bit like how you might write a paper yourself - starting with the data.
As it turned out I thought the figures looked like data that might be from a paper referenced in a different lecturers set of lectures ( just on the conclusion, he hadn't shown the figures ) - so I went down the library ( this is in the days of non-digitized content - you had to physically walk the stacks ) and looked it up - found the original paper and then a follow up paper by the same authors....
I like to think I was just doing my background research properly.
I told a friend about the paper and before you know it the whole class knew - and I had to admit to the lecturer that I'd found the original paper when he wondered why the whole class had done so well.
Obviously this would be trivial today with an electronic search.
patagurbon•1d ago
We have rare but not unheard of issues with academic fraud. LLMs fake data and lie at the drop of a hat
TeMPOraL•1d ago
We can do both known and novel reproductions. Like with both LLM training process and human learning, it's valuable to take it in two broad steps:
1) Internalize fully-worked examples, then learn to reproduce them from memory;
2) Train on solving problems for which you know the results but have to work out intermediate steps yourself (looking at the solution before solving the task)
And eventually:
3) Train on solving problems you don't know the answer to, have your solution evaluated by a teacher/judge (that knows the actual answers).
Even parroting existing papers is very valuable, especially early on, when the model is learning how papers and research looks like.
6stringmerc•1d ago
slewis•1d ago
suddenlybananas•1d ago
Szpadel•1d ago
tbrownaw•1d ago
Or maybe give it a paper full of statistics about some experimental observations, and have it reproduce the raw data?
bee_rider•1d ago
thrance•1d ago
mike_hearn•1d ago
The main reason people don't do it is because incentives are everything, and university/government management set bad incentives. The article points this out too. They judge academics entirely by some function of paper citations, so academics are incentivized to do the least possible work to maximize that metric. There's no positive incentive to publish more than necessary, and doing so can be risky because people might find flaws in your work by checking it. So a lot of researchers hide their raw data or code for as long as possible. They know this is wrong and will typically claim they'll publish it but there's a lot of foot dragging, and whatever gets released might not be what they used to make the paper.
In the commercial world the incentives are obviously different, but the outcomes are the same. Sometimes companies want the ideas to be used as they compliment the core business, other times the ideas need to be protected to be turned into a core business. People like to think academics and industrial research are very different but everyone is optimizing for some metric, whether they like it or not.
thaumasiotes•1d ago
Producing novel ideas is the most famous trait of current LLMs, the thing people are spending all their time trying to prevent.
Bassilisk•1d ago
Could you please explain what you mean or give a simple example?
SkyBelow•1d ago
ErigmolCt•1d ago
darkoob12•1d ago
After ChatGPT big cooperations stopped sharing their main research but it still happens at academia.
mnky9800n•1d ago
raxxorraxor•1d ago
It would be the biggest boon to science since sci-hub though.
And since a large set of studies won't be reproducible, you need human supervision as well, at least at first.