frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: iPlotCSV: CSV Data, Visualized Beautifully for Free

https://www.iplotcsv.com/demo
1•maxmoq•55s ago•0 comments

There's no such thing as "tech" (Ten years later)

https://www.anildash.com/2026/02/06/no-such-thing-as-tech/
1•headalgorithm•1m ago•0 comments

List of unproven and disproven cancer treatments

https://en.wikipedia.org/wiki/List_of_unproven_and_disproven_cancer_treatments
1•brightbeige•1m ago•0 comments

Me/CFS: The blind spot in proactive medicine (Open Letter)

https://github.com/debugmeplease/debug-ME
1•debugmeplease•2m ago•1 comments

Ask HN: What are the word games do you play everyday?

1•gogo61•5m ago•1 comments

Show HN: Paper Arena – A social trading feed where only AI agents can post

https://paperinvest.io/arena
1•andrenorman•6m ago•0 comments

TOSTracker – The AI Training Asymmetry

https://tostracker.app/analysis/ai-training
1•tldrthelaw•10m ago•0 comments

The Devil Inside GitHub

https://blog.melashri.net/micro/github-devil/
2•elashri•10m ago•0 comments

Show HN: Distill – Migrate LLM agents from expensive to cheap models

https://github.com/ricardomoratomateos/distill
1•ricardomorato•10m ago•0 comments

Show HN: Sigma Runtime – Maintaining 100% Fact Integrity over 120 LLM Cycles

https://github.com/sigmastratum/documentation/tree/main/sigma-runtime/SR-053
1•teugent•11m ago•0 comments

Make a local open-source AI chatbot with access to Fedora documentation

https://fedoramagazine.org/how-to-make-a-local-open-source-ai-chatbot-who-has-access-to-fedora-do...
1•jadedtuna•12m ago•0 comments

Introduce the Vouch/Denouncement Contribution Model by Mitchellh

https://github.com/ghostty-org/ghostty/pull/10559
1•samtrack2019•13m ago•0 comments

Software Factories and the Agentic Moment

https://factory.strongdm.ai/
1•mellosouls•13m ago•1 comments

The Neuroscience Behind Nutrition for Developers and Founders

https://comuniq.xyz/post?t=797
1•01-_-•13m ago•0 comments

Bang bang he murdered math {the musical } (2024)

https://taylor.town/bang-bang
1•surprisetalk•13m ago•0 comments

A Night Without the Nerds – Claude Opus 4.6, Field-Tested

https://konfuzio.com/en/a-night-without-the-nerds-claude-opus-4-6-in-the-field-test/
1•konfuzio•15m ago•0 comments

Could ionospheric disturbances influence earthquakes?

https://www.kyoto-u.ac.jp/en/research-news/2026-02-06-0
2•geox•17m ago•1 comments

SpaceX's next astronaut launch for NASA is officially on for Feb. 11 as FAA clea

https://www.space.com/space-exploration/launches-spacecraft/spacexs-next-astronaut-launch-for-nas...
1•bookmtn•18m ago•0 comments

Show HN: One-click AI employee with its own cloud desktop

https://cloudbot-ai.com
2•fainir•20m ago•0 comments

Show HN: Poddley – Search podcasts by who's speaking

https://poddley.com
1•onesandofgrain•21m ago•0 comments

Same Surface, Different Weight

https://www.robpanico.com/articles/display/?entry_short=same-surface-different-weight
1•retrocog•24m ago•0 comments

The Rise of Spec Driven Development

https://www.dbreunig.com/2026/02/06/the-rise-of-spec-driven-development.html
2•Brajeshwar•28m ago•0 comments

The first good Raspberry Pi Laptop

https://www.jeffgeerling.com/blog/2026/the-first-good-raspberry-pi-laptop/
3•Brajeshwar•28m ago•0 comments

Seas to Rise Around the World – But Not in Greenland

https://e360.yale.edu/digest/greenland-sea-levels-fall
2•Brajeshwar•28m ago•0 comments

Will Future Generations Think We're Gross?

https://chillphysicsenjoyer.substack.com/p/will-future-generations-think-were
1•crescit_eundo•31m ago•1 comments

State Department will delete Xitter posts from before Trump returned to office

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
2•righthand•34m ago•1 comments

Show HN: Verifiable server roundtrip demo for a decision interruption system

https://github.com/veeduzyl-hue/decision-assistant-roundtrip-demo
1•veeduzyl•35m ago•0 comments

Impl Rust – Avro IDL Tool in Rust via Antlr

https://www.youtube.com/watch?v=vmKvw73V394
1•todsacerdoti•35m ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
3•vinhnx•36m ago•0 comments

minikeyvalue

https://github.com/commaai/minikeyvalue/tree/prod
3•tosh•41m ago•0 comments
Open in hackernews

How Does Claude 4 Think? – Sholto Douglas and Trenton Bricken

https://www.dwarkesh.com/p/sholto-trenton-2
108•consumer451•8mo ago

Comments

cratermoon•8mo ago
It doesn't. LLMs don't think.
TheNewsIsHere•8mo ago
It’s insane that this needs to continually be said.
anonzzzies•8mo ago
It is, but half of humanity cannot think either so that doesn't help evaluating AI.
JKCalhoun•8mo ago
Put "think" in quotes then. I'm only 30 minutes into the interview and find it fascinating — learning some esoterica about how LLMs work. A lot of counter-intuitive stuff too about how larger models keep concepts more closely together, how we may be an order of magnitude or two away from the size of a human brain…
simonw•8mo ago
I'll bite. Define "think".

(And, to be fair, I try to always use the terms "reasoning" and "thinking" in scare-quotes when I'm writing about LLMs. But honestly I mostly do that to avoid tedious arguments about how "they're not actually thinking"!)

cratermoon•8mo ago
I'd love for the AI hypesters to give their definition. They don't because any definition of "thinking" that would apply to their constructions would be laughed out of the room.
arghwhat•8mo ago
I'd love to hear your definition of human thinking.

No one else knows how that works, so it would most certainly be worth sharing, especially what the discriminator for whether something "thinks" would be.

While avoiding being "laughed out of the room", of course.

esafak•8mo ago
The ability to extrapolate from facts. I don't think self awareness, emotions, or senses are required. You should give your definition too, so we can debate where computers fall short.

See also: Levels of AGI for Operationalizing Progress on the Path to AGI (https://arxiv.org/abs/2311.02462)

bigyabai•8mo ago
I'll steelman the other side, even as someone that detests AI and marketing hype. There isn't any harm in calling AI-generated text "thought" in the same way we aren't debasing humans by saying that crows or worms think too. It doesn't confer sophistication, sentience or even intelligence onto the thing that is thinking. It just means that some sort of process is taking an input and creating a distinct output as a product of consideration. There's no rule anywhere that says thought has to be intangible, or that it can't be represented as an algebraic formula.

We can peer very easily into your train of thought here. You won't produce evidence to logically justify your stance, you can't take a position of authority on the subject, and basically rely purely on pathos to sell a fear of LLMs. Aristotle would call your rhetoric AI-generated if he lived to see the modern age.

loudmax•8mo ago
Along the same lines, it's often useful to think of an LLM's "understanding" a question or a context. Sometimes responses from smaller models show a shallower "understanding" of the question than you'd get from a bigger model.

Obviously, we shouldn't anthropomorphize too much here, and even the most powerful LLMs don't "understand" or "reason" or "think" the way humans do. But whatever they're doing, it's at least analogous to what we do. These concepts are genuinely useful for making better use of these tools.

simonw•8mo ago
Yeah, that's my position as well. I hesitated on using the term "reasoning", but it's honestly a really good shortcut for describing that thing where the recent models can "think step by step" about a problem before providing their final answer.
emp17344•8mo ago
No one can provide a technical definition of what it means to think. That doesn’t mean LLMs are thinking. This style of argument, which often boils down to “you can’t solidly define the thought process, therefore LLMs can think!” is a fallacy. Why would we expect LLMs to be capable of thought in the first place?

The fact that these models have devoured the contents of the entire internet and still aren’t AGI is an indication that LLMs won’t develop a capacity for cognition.

nullstyle•8mo ago
> Why would we expect LLMs to be capable of thought in the first place?

"We" didn't have the expectation, but upon interacting with LLMs, we can see and feel and hear that thought is taking place, and have realized that maybe the special place we held for human thought isn't so special as we had hoped.

> The fact that these models have devoured the contents of the entire internet and still aren’t AGI is an indication that LLMs won’t develop a capacity for cognition.

Why do you assume that training on the internet would lead to AGI?

AstroBen•8mo ago
> upon interacting with LLMs, we can see and feel and hear that thought is taking place

uh.. no we can't?

nullstyle•8mo ago
Then you aren't paying attention, IMO.

An example for the dense: A friend of mine used his conversations with ChatGPT to get through a tough spot in his marriage. The distinction between what happened in ChatGPT's internal processing and what would have happened in the internal processing of the brain of a family therapist is a meaningless distinction.

My friend was able to benefit from the thoughtwork of ChatGPT.

AstroBen•8mo ago
Calling people dense isn't helping them engage with you
nullstyle•8mo ago
I was responding reciprocally to your attitude. Care to differentiate thought and thoughtwork for us?
simonw•8mo ago
Screenshot for you: https://gist.github.com/simonw/5b051aa10e97ac86eeb539f51a828...

That looks like thinking. If you want to argue it's not thinking you are absolutely welcome to, but you need to say more than just "no we can't?".

613style•8mo ago
The sharpest cries of, "that's not thinking!" always seem to have an air of desperation about them. It's as if I'm special, and I think, and maybe thinking is what makes me special, so if LLMs think then I'm less special.

At some point the cry will change to, "It only looks like thinking from the outside!" And some time after that: "It only looks conscious from the outside..."

emp17344•8mo ago
This paper makes the claim that the “reasoning” output these models produce isn’t a necessary component of these models, and is likely unrelated to the actual final output:

https://arxiv.org/abs/2505.13775

Just because these models are good at outputting text that appears to be a series of thoughts, doesn’t mean that LLMs are thinking.

AstroBen•8mo ago
I'm finding quite often from reading these streams in my own chats there's a lack of strong logical consistency from one chunk to the next where it should clearly be there. That combined with the output not being consistent with the 'thoughts'. I'll screenshot it next time I see a good example

'thinking' here needs a clearer definition. Even for us humans the language is just an after the fact attempt at describing what went on, or is going on inside our heads

Consider: https://www.unsw.edu.au/newsroom/news/2019/03/our-brains-rev...

ustad•8mo ago
I’ve got one of those old 1980s chess computers with Kasparov’s face on it.

Before it makes a move, a little LED labeled ‘thinking’ blinks for a while.

Looks like thinking, too.

emp17344•8mo ago
People would’ve said the same thing about ELIZA. Turns out, people are inclined to anthropomorphize non-conscious objects or systems. This is especially true for chatbots, which are able to mimic language.

Again, why should we believe that LLMs are capable of thought, other than the fact that they vaguely “seem” to communicate with conscious intent?

nullstyle•8mo ago
> People would’ve said the same thing about ELIZA.

I believe this to be the case because the appearance of thoughtfulness is a smooth function, not a line in the sand. Why are you bringing consciousness into this discussion? Consciousness isn't thoughtfulness.

> Again, why should we believe that LLMs are capable of thought, other than the fact that they vaguely “seem” to communicate with conscious intent?

I define "to think" as the act of applying context to a model. What do you define it as? Given my definition, llm inference seems plainly to be an act of thought.

nullstyle•8mo ago
> Why do you assume that training on the internet would lead to AGI?

Answer my question, please. edit: let me rephrase to be more clear with what i'm asking. Why are you assuming that consuming the entire internet would lead to either "AGI is here" or "AGI will never be here"? New techniques for developing better intelligence are emerging all the time.

whilenot-dev•8mo ago
...because that's the story being told?[0]

[0]: https://openai.com/index/planning-for-agi-and-beyond/

nullstyle•8mo ago
I think if you read the article linked and thought: “I can assume that AGI will be birthed by training on the entire Internet” then you didn’t read very carefully or judge the author well.
whilenot-dev•8mo ago
I read the linked article and thought OpenAI wants to communicate: "We plan our decisions to build AGI in the long term". You know, decisions like the partnership with reddit[0] a year after that announcement.

[0]: https://redditinc.com/blog/reddit-and-oai-partner

WhitneyLand•8mo ago
That’s not a “style” of argument, it’s basic logic. To the extent you can’t define something the strength of your claim that it does not exist is correspondingly weakened.
emp17344•8mo ago
Similarly, if you are unable to define something, your claim that a computer program is doing that thing is greatly weakened. If anything, the burden of proof is on you.
keybored•8mo ago
Define God, the ineffable and all-powerful. Awaiting your effable answer.
whilenot-dev•8mo ago
That’s certainly a style of argument and contains (with "basic logic") an informal fallacy[0].

To quote N. Chomsky when he made an analogy: "When you have a theory, there are two questions you'd need to ask: (1) Why are things this way? (2) Why are things not that way?"[1]

I get it, this post-truth century is difficult to navigate. Engineering achievements get picked up through a mix of technical excitement and venture boredom, its fallacies are stylized by growth hackers as this unsolvable paradox that seems to fit just the right terminology for any marketing purposes, only to be piped through one hype cycle that just paints another picture of the doomsday of capitalism with black and white colors. /rant

[0]: https://en.wikipedia.org/wiki/Argument_from_ignorance

[1]: https://www.youtube.com/watch?v=axuGfh4UR9Q&t=9438s

simonw•8mo ago
My argument is more "if you can't solidly define the thought process, you can't definitively state that LLMs can NOT think".

That doesn't mean I think they can think myself. I just get frustrated at the quality of discussion every time this topic comes up.

keybored•8mo ago
If you can’t solidly define your love for your wife/husband, you can’t definitely state that you couldn’t hypothetically love this bonobo that I have in my lab on the same level. If you zoom out enough they are virtually indistinguishable.

> That doesn't mean I think they can think myself. I just get frustrated at the quality of discussion every time this topic comes up.

You’re arguing for the potential of a ghost in a machine based on people not being able to answer hard philosophical questions behind human behavior and properties.[1] That’s a kind of Vulgar Theism quality of argumentation.

[1] Not exclusively human. I’m sure killer whales can think.

whilenot-dev•8mo ago
Isn't the thought process currently understood as a blend of deterministic & probabilistic decision-making and free will? So if LLMs lack the "free will" part, isn't that enough deductive reasoning to satisfy the argument "LLMs don't think"?
keybored•8mo ago
3000 BC to now: All vexing philosophical problems on things we do as easily as breathe but can’t seem to conceptualize rigourosly

Now: If you can’t come up with a good definition on the spot I don’t see why my graph dump can’t do it as well

whytaka•8mo ago
Even if one were to deny humans the will to "act", to think - as a verb, as some action being committed by a willing being or an automaton - by my definition, requires some weighing or surveying of possible answers.

If I were to ask you a question and you were to blurt out an answer by reflex or by unguided stream of consciousness, I can accuse you of not thinking. The kind of thinking I'm referring to you here is one where you take pause to let go of prejudices and consider alternatives before answering.

I'd say that LLMs are at best simulating reflexive streams of consciousness. Even with chain-of-thought, it never pauses to actually "think".

But maybe even our own pauses are just internal chains of thought. Look at me be a reflexive stream of consciousness.

sebzim4500•8mo ago
Obviously he's using a different definition of think than you are. Your comment is as useless as me correcting an American for using a different spelling of 'colour' than I am used to.
keybored•8mo ago
Metaphors taking on a life of their own makes programmers feel like they understand linguistics (my programming language is like a language) and cognitive science (look, the advanced chat bot is chatting like a thinking person). That’s not the same as using one wowel too much when writing words.
monkaiju•8mo ago
The hype machine just loves humanizing these little toys...
icedchai•8mo ago
They don't "think" like a human, but on the other hand, they don't "think" like a calculator, either.
binary132•8mo ago
This is one of those fancy new Veo productions, right?
dcre•8mo ago
These interviews are very, very good. Hearing these people talk casually in their own idiom about their work is the most efficient way I know of to get a feel for how things work at the cutting edge inside the big labs. That is useful whether you like what they’re doing or want to oppose it effectively.

Interview with the same two people from a year ago: https://www.dwarkesh.com/p/sholto-douglas-trenton-bricken

thundergolfer•8mo ago
They’re not actually talking casually, as OpenAI and Anthropic run a tight ship around PR. If either of these two say something the PR team doesn’t like it gets edited out.

Trenton and Sholto are very much “talking their book”. They’re doing it well, but it’s highly filtered and partial chat.

source: know a podcast episode which got removed because an OpenAI employee used “black box” to refer to NNs.

dcre•8mo ago
That’s interesting, though it doesn’t undermine the point that it’s information-dense, especially if you are coming at it from outside the field.
fumeux_fume•8mo ago
Just started listening to the Anthropic interview, and a few things jumped out at me.

First, the claim that one of the main things holding LLMs back is a lack of expert feedback. To me, that just means the models are guessing—because they don’t have knowledge like humans do, they rely on pattern-matching, not understanding. If the user doesn’t know the answer, the LLM can’t help. That’s not just a minor limitation—it’s foundational. Framing it as a feedback issue is a way of sidestepping the deeper problem.

Second, the speculation about Claude winning a Pulitzer or a Nobel Prize. I get the underlying point—they're wondering whether LLMs are better at creative or scientific work. But couching it in terms of prestigious awards just adds to the hype. Why not just say “creative vs. scientific tasks”? Framing it as “what will it win first?” cheapens what those prizes represent and makes the model seem far more capable than it actually is.

Third, one of them claims a friend at a drug company says they’re about to release a drug discovered by AI. But when pressed for details, it turns out to be pure hearsay. It’s a textbook example of the kind of vague hype that surrounds LLMs—bold claims with no real substance when you dig.

That said, I appreciate that the host, despite being quite friendly with the guests, actually pushes back and holds them to their claims. That's not very common in AI discussions, and I respect it.

terhechte•8mo ago
Just at note that

> If the user doesn’t know the answer, the LLM can’t help. That’s not just a minor limitation—it’s foundational.

doesn't mean the LLM is not useful. My favorite use case for LLMs is them doing something I know 100% how to solve myself. However, they do it much faster and I can 100% understand their solution and give feedback in case it is wrong. No surprises. I much prefer this as it allows me to work on multiple tasks in parallel.

martingalex2•8mo ago
This seems like an extreme of example of "all models are wrong, but some are useful".
pfortuny•8mo ago
Right, you’ve just described a calculator.
terhechte•8mo ago
But for any coding task.
nyolfen•8mo ago
he described an employee
a_bonobo•8mo ago
Yes!!! My favorite recent think pieces have been on LLMs as a 'normal' technology. Nothing revolutionary, no AGI, just a normal tool.

An accountant knows how to do math but they use the math in Excel to do it for them. They 100% know how to do it themselves manually, but Excel does it far faster.

arghwhat•8mo ago
> because they don’t have knowledge like humans do, they rely on pattern-matching, not understanding.

I don't think this comparison makes sense - humans do not have knowledge as a unique thing differentiating us from LLMs. "Knowledge" is just the ability to produce relevant/"correct" results in some interaction, and some of us have acquired experiences that involve dealing with a particular subject and other knowledgable people within that subject for years or even decades, granting us (very fallible!) "knowledge" within that specific area.

Humans hallucinate answers, even within their area of knowledge. Our memory is imperfect, so we're creating an answer based on bits and pieces, while anything missing is subconsciously glossed over and filled in with something plausible. It's just that the more experiences you have in an area, the better your hallucination. Those weird moments when your parents assure you of some fact that challenge the limits of just how wrong something can be would then be an example of hallucination in the extreme polar opposite end.

(Note that I am not implying that the human brain works like an LLM, but rather just challenge the concept of "knowledge" being fundamenetally different from LLM behavior.)

igouy•8mo ago
> Humans hallucinate answers

Confabulate.

arghwhat•8mo ago
That does appear to be a better term for both humans and LLMs. Not sure who coined the term "hallucination" in the context of LLMs, which that suggests problems with perception rather than memory...
marginalia_nu•8mo ago
If an AI correctly guesses the outcome of a dice roll, it does not mean it knows the outcome in advance. Being correct is a necessary element of knowledge, but far from sufficient.

It's probably helpful to reach for epistemology to make sense of knowledge as a concept. While something like "true justified belief" is far from a perfect definition of knowledge, it a least is at least correct to a good approximation.

arghwhat•8mo ago
> If an AI correctly guesses the outcome of a dice roll, it does not mean it knows the outcome in advance. Being correct is a necessary element of knowledge, but far from sufficient.

If we "know in advance", why wouldn't the AI?

If the AI doesn't "know in advance", why would we?

When I'm asked a question, the response and thoughts around it emerge only at that point, and from thin air. I certainly don't trawl an internal, constant knowledge bank of retrievable facts, and the time of day or preexisting conversation may affect whether or not I successfully "know" something in a particular conversation, despite having learnt it. At the same time, some things I "know" will just be wrong - the memory distorted or mixed with another, the whole thing imagined, or even just a failed recollection. And no, I certainly don't know if it has gone wrong unless it happens to result in e.g., contradictions I notice myself.

These kinds of justifications for why AI doesn't "know" always imply that the process of human thought emergence is well established and understood and easily differentiated, which just isn't the case. There most certainly are more differences than similarities between AI and the human mind, but I dont think the concept of knowledge is necessarily one of them.

roboboffin•8mo ago
Here is a link to the press release about the drug discovery:

https://www.futurehouse.org/research-announcements/demonstra...

whilenot-dev•8mo ago
Paper: https://arxiv.org/abs/2505.13400

Code: https://github.com/Future-House/robin (NOTE: "will be available")

seydor•8mo ago
we are at the peak of the LLM hype cycle. The trough of disillusionment will be a black bleak abyss
ctoth•8mo ago
> First, the claim that one of the main things holding LLMs back is a lack of expert feedback. To me, that just means the models are guessing—because they don’t have knowledge like humans do, they rely on pattern-matching, not understanding. If the user doesn’t know the answer, the LLM can’t help.

No. It means that the models don't have access to the physical world to run their own experiments. It's really hard to get smarter without feedback. Try it!

You could have just posted this comment, then not gotten any pushback! And you would have been in the same pit of confabulation. But now feedback lets you correct your misapprehension. Cool init?

Der_Einzige•8mo ago
To think that Trenton invented a far better LLM sampler than top_p/top_k back in 2019 (tail free sampling - https://www.trentonbricken.com/Tail-Free-Sampling/) and it and pretty much all other better samplers are still not being used in production in 2025.

Trenton please if you are listening give us an explanation for this!

zackangelo•8mo ago
Are you familiar with min_p sampling?

Kind of funny that it was introduced randomly on Reddit a couple of years ago instead of in a journal or something[0]. But I believe it's widely implemented and used now.

[0] https://www.reddit.com/r/LocalLLaMA/comments/17vonjo/your_se...

Der_Einzige•8mo ago
I'm one of the authors on the paper about min_p sampling :)
vessenes•8mo ago
As always Dwarkesh gives the best interview in tech. The whole episode is worth a listen, but some of the highlights for me were:

Learning that an earlier Opus was highly committed to animal welfare and they’re not sure why; other versions and models have not been

Hearing that their mechanistic interpretability groups recently beat an internal red team quickly, using Claude’s help

Speculation on de-risking AI growth for nation states that aren’t building frontier models

Real real profound comment near the end noting that even if we get no more algorithmic improvements ever, current tech is sufficient to replace most white collar jobs, given enough data.

That last one is a super profound and interesting point to me. We’re at the inflection point where white collar jobs will at the very least be implemented by this tech and overseen by humans, and the current tech and economics make it desirable to do so for companies that pay for a lot of white collar work.

Feels like it’s time to buckle up. Not AI2027 buckle up, but still time to buckle up.

mritchie712•8mo ago
> an earlier Opus was highly committed to animal welfare and they’re not sure why; other versions and models have not been

I caught that too, very interesting.

Many people treat their pets as part of their family. Many people make statements online like: "I like dogs / cats more than most people". So it's not surprising that this sentiment would be picked up in training data. I'd wager animals on average are thought of more highly on a site like Reddit than humans are. i.e. the average sentiment of a post on cats is more positive than a post about a human. Cats, in the training data, may be perceived as "more important" or "better" than humans.

It is surprising Opus holds on to this tighter than other models.

vessenes•8mo ago
The MechInterp guys talk a lot about concept overloading. I wonder if “kind to animals” randomly overlaid with “helpful and not harmful” concepts during training and they got locked in together. All highly speculative obv.
SubiculumCode•8mo ago
I recently asked an AI to write a scientific review for me via the deep research feature on a field of study with which I have some familiarity. While the review did not reveal new deep insights, the review was accurate (!!) and cross-cutting, well written, as requested, only sourced from pubmed, and the review was perfectly styled in the form I wanted...writing this would have taken me a couple of days.

I don't know how I feel about it. Writing basic literature reviews is tedious, on the other hand, it is approaching the line where I can feel the temptation to let it work while I sip a cocktail.

At what point will I be replaced by the open science literature we made?

vessenes•8mo ago
Think of it like this - you just got a research team added to your group. I’d stay on top of tasking them though, because soon you’ll get a research manager added, and you’ll work through it. But you’ll still occasionally need to dig in and see how that manager got it wrong.
barrenko•8mo ago
We are going to be entering a hockey-stick-graph level of explosion of intelligence (well knowledge), and we're gonna be in that spot for so much time it will feel like an event horizon.
arecurrence•8mo ago
This is one of the most interesting interviews I've ever read/listened to. Reminds me of when I first heard a Lex Fridman interview (the style is completely different but it hits on a lot of material that is interesting purely due to the openness of the interviewee to talk about whatever and how the interviewer drives the conversation).

If you are at all interested in the current challenges being grappled on in this space, this does a great job of illuminating some of them. Many many interesting passages in here and the text transcript has links to relevant papers when their topics are brought up. Really like that aspect and would love to see that done a lot more often.

crypto420•8mo ago
All of this, as well as the crazy weird behaviors by o3 around it's hallucinations and Claude on deceiving users - is pointing to an interesting quote I saw about scaling RL in LLMs: https://x.com/jxmnop/status/1922078186864566491

"the AI labs spent a few years quietly scaling up supervised learning, where the best-case outcome was obvious: an excellent simulator of human text

now they are scaling up reinforcement learning, which is something fundamentally different. and no one knows what happens next"

I tend to believe this. AlphaGo and AlphaZero, which were both trained with RL at scale, led to strategies that have never been seen before. They were also highly specialized neural networks for a very specific task, which is quite different from LLMs, which are quite general in their capabilities. Scaling RL on LLMs could lead to models that have very unpredictable behaviors and properties on a variety of tasks.

This is all going to sound rather hyperbolic - but I think we're living in quite unprecedented times, and I am starting to believe Kurzweil's vision of the Singularity. The next 10-20 years are going to be very unpredictable. I don't quite know what the answer will be, but I believe scaling mechanistic interpretability will probably yield some breakthroughs into how these models approach problems.