Marcus AI Claims Dataset

https://github.com/davegoldblatt/marcus-claims-dataset

58•davegoldblatt•1h ago

Comments

barbarr•1h ago

Can't wait to see Gary Marcus's rebuttal

dakolli•1h ago

Why would he need to make rebuttal, they back up a lot of his claims and show he's becoming more accurate with each year that goes by. The ares where he's less accurate are largely prediction that haven't had time to come to fruition or exaggerated for rhetorical purposes, sometimes you need to use hyperbole to get through to people.

I don't know if this will cause a ton of capital destruction, I doubt it, it will probably destroy a bunch of the slot machine/gambling addicts who are paying 5k a month on their credit cards thinking an autocomplete API is going to provide a profitable business.

A large part of this is a scam, just like many aspects of crypto were scams while others were not, this hype is very similar to NFT/Crypto hype from 2018-2023. Yes, some things were born out of those industries that are genuinely useful, a lot were not, its the same with AI.

Potential AI winter, I think there will be a "winter" just like crypto, but even during crypto's winter, some companies continued to operate and innovate but 90% disappeared. I believe the same thing will happen, and soon. Watch what happens to companies like Perplexity over the next 12-16 months lol.

oh_my_goodness•1h ago

His rebuttal? His critics' rebuttal?

barbarr•1h ago

I feel he'd want to rebut the use of an LLM for this task to begin with (i.e. find issues/nitpicks with the LLM judgment whether it said he's right or wrong)

dakolli•1h ago

In 2026, I feel like a painter in 2022 being screamed at non stop by people telling me my craft is soon to be dead, that NFTs are the future by people who are largely behaving like gambling addicts (like NFT people).

bananaflag•1h ago

Well the main thing he is known for is "it's all gonna crash" and that's a fact that this page admits he's wrong about.

Everything else, yeah, he's right, and I never doubted. I agree LLMs are unreliable, insecure etc. But I don't deduce from that that they're gonna amount to nothing.

albatross79•1h ago

Has he said they're going to amount to nothing?

latexr•1h ago

> Well the main thing he is known for is "it's all gonna crash" and that's a fact that this page admits he's wrong about.

Have there been specific claims about when it’s going to crash? I find it hard to believe he claimed it was all going to crash by early 2026. Maybe I’m wrong, I haven’t read all of his posts. But neither did the author, they admit in the repo this is all LLM, nothing was verified by humans.

bayarearefugee•1h ago

Obviously I don't speak for Gary Marcus, but I'd say the chances of an AI financial crash aren't really directly linked to whether or not LLMs amount to anything or nothing.

From my perspective, it is basically guaranteed that LLMs will increasingly be seen as essential work tools for just about anyone doing knowledge work. So they won't amount to nothing.

But it is not at all guaranteed that the frontier model companies who are currently burning billions of dollars chasing that will capture significant percentages of that value.

bionhoward•1h ago

surprisingly accurate! Is Gary the AI equivalent of the “nothing ever happens” guy?

latexr•1h ago

> All verdicts are LLM-scored, not human-verified.

In other words, could be all slop. Or maybe it’s not. Maybe it’s mixed. No one knows.

davegoldblatt•1h ago

Fair critique. The methodology doc covers this: both pipelines agree on the high-confidence clusters (security vulnerabilities, bubble predictions) even though they disagree on edge cases. The repo is public specifically so people can spot-check. If you find a claim where the scoring is wrong, I'd genuinely like to know.

latexr•1h ago

> If you find a claim where the scoring is wrong, I'd genuinely like to know.

So you’re asking me to do the work you should have done in the first place? If you didn’t put any effort into it, why should I waste my time checking your non-work and correcting it to your credit?

If you had actually put in the effort then sure, I’d be amenable to helping making this the best it can be. But you didn’t, so what’s the point? Why should anyone spend their time fixing other people’s slop?

downboots•1h ago

I am curious whether claims are scored more accurately by LLMs when reviewed and edited by LLMs prior to posting the claim.

d_silin•1h ago

I don't think there will be any market crashes before major AI companies doing IPOs, and then for some time more (late 2027- mid 2028).

dakolli•1h ago

Well, many will crash before they ever get to IPO. Phind closed shop last month a few weeks after raising millions. But yes, the areas where they are claiming he's wrong, have largely not had time to come to fruition. Lets reevaluate his claims about scams and markets in a year. I'd bet my net worth that Perplexity and similar wrapper products are acquired out of existence in <16 months.

albatross79•1h ago

Sounds about right, boosters are always vaguely claiming he's been obviously and ridiculously wrong, but when you actually listen to him he's tracked very well with the state of AI. GPT 5 was supposed to be AGI remember?

logicprog•1h ago

I think it's pretty clear that he hasn't kept up with the state of the art.

He has no idea what coding agents are capable of or how useful they are; he doesn't pay attention to any of the contributions to math or science that these models are making; he continually assists that because agents aren't ready to face customers in uncontrolled environment, they're completely useless even for employees and workers; he just last year posted an article complaining that LLMs don't use web search to find information (he asked the information about a friend), when almost all of them do now, even in their default interfaces; he still thinks hallucinations are a problem with any weight in things like mathematics and programming where it's very easy to verify the types of things hallucinations would cause a problem with; I think he still adheres to the stochastic parrot mindset even though that's not even the most relevant part of their training anymore.

Most importantly, although he seems to have made a single substack post making this argument, it doesn't seem to have really percolated through the rest of his thinking: that the cutting edge of LLMs right now, agents, are actually exactly the kind of neurosymbolic system, where neural networks provide an interface with the outside world and a creativity and problem-solving engine to provide the sort of fuzzy pattern matching and adaptability that is needed, while symbolic code-based systems ensure that guardrails are met, requirements are met, and for accurate information is provided and so on, that he wants. I think his objection might be that the problem is that the problem solving and reasoning engine at the core is still an LLM. But the thing is that you need the kind of pattern matching and flexibility and adaptability that you get from an LLM drive things, to have the end result be anything different than just an expert system with a slightly better natural language interface pasted on. And I think it's pretty clear at this point that expert systems are dead. They haven't done anything as remotely interesting or useful as what we're seeing LLMs do.

I think like another commenter says that his whole stick is pointing out obviously true basic features of LLMs like that they hallucinate or don't perfectly adhere to prompt guardrails, or that there's too much hype in the industry right now, and a lot of the companies suck in a vaguely standard big tech Silicon Valley way, and extrapolating to some broader point, which is that everyone should have listened to him and done what he said when he wrote that book back in the 90s (iirc).

albatross79•1h ago

I think his claim basically boils down to "if you're expecting AI, LLMs don't cut it". And I think he's basically right on that count. There's a lot of tooling and harnessing being put in place to course correct them on the job, and from the other angle standards are simply being lowered to accommodate them. So they can be made to be useful, but they're still not what you would want from an actual AI. Marcus wants to augment them with symbolic AI. I don't know how feasible that is, but he's not fundamentally against AI, he's just against the notion that LLMs are AI. Which given how they've been marketed and how the public is encouraged to think about them, is a worthwhile point to make.

logicprog•57m ago

I used to be a Gary Marcus fan, but I guess what confuses me is...

I'm not really sure at that point what 'actual' AI means?

It seems like the definition of actual AI is something like perfect AI — it has to be fully observable, interpretable, reason perfectly, have perfect factual recall, continual learning, infinite context windows, perfect instruction following, and so on. I feel like at that point, maybe nothing could ever be 'actual' AI?

We typically use AI to mean some kind of algorithm or program that lets computers do intellectual work that was previously considered to be the exclusive domain of humans, especially if it involves problem solving or pattern matching or reasoning. Just look at Donald Knuth's recent posts about what Claude was able to do — seems like AI to me?

Yeah, it is in perfect AI, but it's still AI. And it's not clear to me that the imperfections that LLMs have mean that they can't be extremely useful and revolutionary as a form of AI. Yes, they make weird mistakes a lot, and they don't think at all like humans do. But I am of the opinion that there are a lot of forms of intelligence, and human intelligence is just one of them. And every kind of intelligence comes with its own different gamut of continual errors that it will tend to make, blind spots and biases. The fact that LLMs have issues that are different from the form of intelligence humans have and also different from what computers have issues with doesn't discount them from being intelligent to me.

I also think the framing of agentic harnesses as being bolted onto LLM's in order to "make them useful", but agentic harness plus LLMs not counting as an AI system itself very odd — I think it's pretty clear to me at least that "the AI", if you want to talk about it, is the neurosymbolic cybernetic feedback system that combines the harness and the LLM.

The LLM is only the sort of fuzzy pattern matching logic and creativity core; the harness provides verification feedback loops, the ability to interact with and explore the outside world, the ability to bring in programming language interpreters and so on in order to do more rigid symbolic logic, observability, systems for storing and recalling memory for continual learning, and so on, and I think a lot of these, especially feedback loops, resolve a lot of issues that LLMs seem to inherently face, such as hallucinations.

Moreover, LLMs are now substantially trained with writing code and using tools and interacting with the world and existing in harnesses in mind. At this point, I would have to guess that more than half of their training is actually devoted to rewarding them for correctly using all of these symbolic tools and solving problems in a simulated world than just predicting the next token.

I also think that LLMs, as a sort of core engine of an agentic harness, are allowing computers to do things we'd never really dreamed they could do before, that symbolic systems by themselves never really achieved, and as I said before, if you're looking for neurosymbolic AI — as Marcus says he is — then this is basically how it's going to have to look unless you want to fall down the expert system rabbit hole again.

albatross79•39m ago

Ok let's just agree that if it can't count the letters in a word reliably it's not AI.

It can be still be useful, even if it's wrong a lot, or if it takes a lot of scaffolding to mold it's answers into something correct. So just call it what it is: auto complete. But the problem is that if you do that the shine comes off and you can't justify trillions of dollars worth of sci fi fantasies. Who does that benefit? If the bubble pops all the boosters on here hoping for HAL 9000 are going to be out of a job and struggling, so they're working against their own interests by going along with this shell game.

logicprog•32m ago

That's because of a fundamental perceptual limitation — tokenization. That's like saying that humans aren't intelligent because we can't perceive UV light. That's a shallow gotcha at best, it doesn't get to the heart of the matter at all.

Also, models will just write code for that sort of thing now.

albatross79•22m ago

The heart of the matter is a system that produces correct output in a wide variety of scenarios, including ones outside our scope of experience or understanding. That would be a one definition of AI. LLMs are not that, and I don't understand the desperate insistence that they are. Just accept what it is. A good auto complete is useful, a crappy AI is a marketing scam.

logicprog•8m ago

> a system that produces correct output in a wide variety of scenarios, including ones outside our scope of experience or understanding

Where does this definition come from? I certainly don't agree with it, and I am not sure who does, besides yourself and Marcus. Also it seems that you're saying AI does, in fact, mean 'perfect' AI, basically?

> A good auto complete is useful, a crappy AI is a marketing scam.

LLM agents do a lot more than auto complete now, and using them less like auto complete and more like 'AI' (via agents) has actually made them more useful and less crappy! Also, I don't think framing how modern RLVR'd LLMs operate now as as auto complete even makes a whole lot of sense in the first place.

antonvs•33m ago

> "if you're expecting AI, LLMs don't cut it". And I think he's basically right on that count.

This is one of those comments whose truth value depends entirely on a constantly shifting definition of “AI”.

The ability of modern models to functionally understand, answer questions, and make recommendations about software codebases is superhuman at this point, relative to most human software developers. What is that, if not artificial intelligence?

Perhaps you’re thinking of something more like AGI, but even there the terminology is loaded and ambiguous. The models are general enough to answer questions well on a vast range of subjects, and they exhibit understanding (again, functionally speaking this is true - whether someone wants to call them stochastic parrots is beside the point.) The appellation of “intelligence” applies just as well as in the coding case, it’s artificial, and it’s general.

> a worthwhile point to make.

I disagree. Without clear, justified definitions, it’s an incoherent, poorly specified point that seems to be driven by a desire to maintain a specific conclusion regardless of the evidence.

albatross79•18m ago

Everyone knew what AI meant before LLMs came along and these con artists redefined it to fit their product. Why is everyone on the ground floor trying to defend what the guy in the penthouse is telling the investing public?

logicprog•11m ago

I'm not sure what you mean by "everyone knew what AI meant"? It's always been an extremely vague term, since its inception — and LLMs do, actually, enable computers to do things I, at least, would've called basically AGI in 2020.

nurettin•48m ago

> he doesn't pay attention to any of the contributions to math or science that these models are making

Ok but why report PR pieces as evidence for LLMs being useful?

These are tools that can possibly provide output that is eventually correct. It is the human behind the wheel doing the actual work.

Give the tool to a lesser expert and you will get more garbage with fewer lucky shots.

For the elite, it is a balancing act where more often than not, the cost of making LLM do the work is less than doing it yourself. If this percentage is above 90% of the time, the tool is useful.

logicprog•39m ago

This doesn't seem like a PR piece to me?

https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cyc...

nurettin•26m ago

No that's new. This was the previous "accelerating science" piece.

https://openai.com/index/accelerating-science-gpt-5/?hl=en-G...

atleastoptimal•1h ago

His whole thing is to make obvious incontestable claims about AI (LLM’s make mistakes) and connect it to unfalsifiable grand prognostications (It’s all gonna crash… any day now). Its the same tactic any preacher who harps on about the impending apocalypse uses.

oh_my_goodness•1h ago

What's your take on TFA then?

atleastoptimal•1h ago

His increasing accuracy derives from him wisening up and criticizing LLM’s only in domains where there is a consensus, and keeping his more grand criticisms unfalsifiable.

1123581321•1h ago

I’m not the OP, and haven’t read much Marcus, but an analysis linking “chains of claims” together could be interesting, devaluing specific true claims when they are used to support a false claim. The claims dataset appears to evaluate each claim independently.

dvt•1h ago

I'm still not sure I fully understand the methodology. For example if Marcus makes the claim: "OpenAI sucks!" why would OpenAI's blog ever corroborate that? The sources used are all AI company blogs (Anthropic, Google, OpenAI) filled with inoffensive corpo-speak likely written to be as middle-of-the-ground as possible. In fact, I'd need an A/B test to make sure the LLM itself can properly rate various claims (positive, negative, and neutral) against such corporate sludge.

Small aside: I'm only bringing this up because last year I worked on a game where you had to solve various moral dilemmas in a 1v1 situation (think trolley experiment and one player says "flip the switch" and the other says "don't flip the switch")—the idea was to get an LLM to rate the arguments in a fun turn-based online game. I built it out, but I kind of gave up when I realize how absolutely awful the LLM was at actually rating arguments and their nuances. Who won legitimately felt more like rolling a dice than a verdict given by a real judge or a philosophy professor grading a paper. I put that project aside, but might do a Show HN at some point since the game is basically done.

Adjudication[1]—which is the real meat of this project—is done in a very partial way and I genuniely see basically zero value. Why not crawl reddit (or HN)? I know that also has issues, but it at least has more variety of tone.

[1] https://github.com/davegoldblatt/marcus-claims-dataset/blob/...

davegoldblatt•1h ago

You're reading the ChatGPT pipeline methodology, which uses explicit vendor documentation across 164 themes. That's the conservative cross-check, not the primary scorer.

The essay numbers come from the Claude pipeline (2,218 claims), which used model judgment as of March 2, 2026 without a published URL evidence table. Different pipeline, different adjudication method.

Your core critique still lands, just at a different layer. The real weakness is LLM-as-judge circularity: an LLM scoring claims about LLMs. That's flagged in the methodology, and I don't have a clean answer beyond "the dataset is public, spot-check anything that looks wrong."

dvt•1h ago

Gotcha', but I'm just trying to see the audit for how claim X was rated and based on what sources. If we're looking at the Claude logs, we have huge files that have things like this[1]:

    {"id": "claim_0081", "date": "2023-02-11", "claim": "Current Level 2 self-driving operates under easy conditions and is nowhere close to handling real-world complexity.", "type": "descriptive", "target": "Level 2 self-driving", "status": "supported", "horizon": null}

Why is this supported? How is this supported? Waymo would probably disagree, etc. Here's another one:

    {"id": "claim_0083", "date": "2023-02-11", "claim": "Tesla's product naming ('Autopilot', 'Full Self Driving') misleads customers into thinking the cars are more capable than they are, potentially causing accidents and deaths.", "type": "causal", "target": "Tesla marketing", "status": "supported", "horizon": null}

I fully agree that TSLA engages in all kinds of deceptive marketing, but to fully support the stunning claim that it potentially causes deaths is, uh, a bit much. I mean, at least tell me who's saying this. What's the provenance?

If Claude itself rated the claims, which seems the be the case unless I'm totally off base, I fail to see how we're actually doing anything at all here. Right now I'm working on a local research agent, and I'm being absolutely meticulous about storing browsed webpages, snippets, etc. into short-term (session) LLM memory or a long-term (cross-session) SQLite db.

[1] https://github.com/davegoldblatt/marcus-claims-dataset/blob/...

davegoldblatt•1h ago

You're not off base. There's no per-claim evidence trail. The Claude pipeline scored claims using model judgment and stored verdicts without the reasoning or sources behind them. Your Tesla example is a good one. There is real evidence for that causal claim (NHTSA investigations, documented crashes where Autopilot naming was cited as a factor), but the dataset doesn't show you any of it. Right now it's just "Claude said so." What the dataset is currently good for: extraction, clustering, timeline structure, and surfacing drift patterns across claim families. When 120 hallucination claims come back 57% supported and 63 bubble claims come back 41% contradicted, the distribution tells you something even if any single verdict is arguable. What it's not good for: claim-by-claim adjudication auditability. You're right that without provenance, this is closer to a structured first draft than a final verdict. Your approach of storing browsed pages and snippets per-claim is what auditable looks like.

ripped_britches•1h ago

Strange, I had the same thought about doing this exact exercise this weekend.

I think the overall percentage is the wrong approach here.

It’s easy to say a lot of things that are factually true or predictions that are inevitably true.

However the more salient point with Gary Marcus is the one unforgivable thing he was wrong about and continues to double down on which is that deep learning is hitting a wall.

Starting in early 2022 and going through today, there is still so much low hanging fruit with deep learning.

Today’s LLM progress is mostly being made in RL. But world models are also still so early and they’re deep learning all the way down.

It would be nice if he would just admit he was wrong.

cl42•1h ago

I'll add one more point. If you scroll through his Substack, a lot of his posts are incredibly negative and unproductive. I was (and continue to be) someone who cares deeply about responsible AI... But there's a difference between working on AI responsibly or pushing the debate, versus simply criticizing everything that is done as folly, useless, crap, etc.

davegoldblatt•1h ago

That's basically what the data shows. His overall accuracy is a distraction. The specific technical calls are sharp. The "deep learning is hitting a wall" family of claims is where the evidence pushes back hardest, and the volume kept increasing even as contradictions piled up.

I wrote up the full pattern here: https://davesquickhits.substack.com/p/the-most-expensive-kin...

emp17344•1h ago

Is “world models” even a real thing, or just the latest AI buzzword?

downboots•1h ago

A world model is an attempt at ensuring your hallucinations are compatible with reality https://www.nvidia.com/en-us/glossary/world-models/ usage of the term seems to be correlated with GPU sales

lambdasquirrel•1h ago

Depends on how you look at it. In terms of overcoming fundamental limitations, I would argue it has indeed hit a wall. ChatGPT is how old, but LLMs still can't actually count?

But then, to your point, what does it matter, if they're still as useful as they are? Even at this stage, Claude Code makes Jira halfway bearable.

Of course, we have to consider the devil's advocate as well. Most CEOs don't seem to be reporting great ROI on their "AI" investments.

whattheheckheck•1h ago

Now do this for every single person with actual power

hdgx63•1h ago

Now do the Pentagon. Gary Marcus is uninteresting cause he has no Power over anything.

camerons03•1h ago

Piping a few hundred Substack posts through Claude and ChatGPT and slapping a "hybrid reconciliation layer" on top doesn't magically turn token prediction into empirical evidence.

Someone is so thin-skinned about a single guy writing a skeptical Substack that they spent their weekend building a dual-pipeline automation tool, scraping four years of his writing, instead of just building a product that actually disproves him. I’m not saying I agree with everything the man says, but until a human actually verifies these verdicts, this is just burnt tokens.

rvz•1h ago

That's if you trust and believe that the LLMs themselves are 'correctly' scoring.

I wouldn't immediately even agree with an assessment made by these LLMs if I were Gary Marcus as that could immediately contradict any of the claims he even made and falling into the trustworthy trap. I'd remain skeptical as ever...

...because this is the worst of the red flags that ultimately supports Gary's argument that the LLM results may be untrustworthy:

All verdicts are LLM-scored, not human-verified.

People should check for themselves and draw their own conclusions.

> The crash hasn't come.

yet.

logicprog•1h ago

I think the problem here is that most of his claims are obvious, uninteresting, and largely agreed with even by the biggest AI hype people, like that AI hallucinates, or that they don't perfectly follow guardrails in the system prompt, or that they can be prompt injected, or that Open AI's financials look bad.

But then on the other hand, he completely ignores all of the developments in the field scaffolding around these systems in order to resolve these problems. All of the changes and developments in how these models are trained, all of the things they've actually been able to achieve and do, and basically all of the positive use cases and things that balance out his criticisms.

Since he doesn't really talk about any of that, of course he doesn't make false claims about it, he just ignores it, implicitly creating a false picture.

And then it is this false picture that he uses to justify his grandiose claims about how everyone should have listened to him about how to do AI and these systems are inevitably going to turn out to be useless and the whole industry is going to collapse and fully disappear and society is going to be ruined and so on.

So, of course, it looks like, on the one hand, all of his specific claims about AI are perfectly correct, and on the other hand, that all of his grinder claims about what that implies or means about the industry you have turned out to be wrong, and that he spends much more time on the latter than the former.

I think it is really crucial to emphasize that even though most of the individual claims he makes are correct, he spends much more time on the prognostications that are fundamentally not correct, or at least are very speculative right now. I think that's an indication of something gone very wrong with someone's epistemic and incentive situation.

cortesi•1h ago

I'm sorry, but we can tell absolutely nothing about Gary Marcus from this. People should have a look at the final data:

https://github.com/davegoldblatt/marcus-claims-dataset/blob/...

Many of the "supported" claims here are vague, banal, obvious, or just opinion. E.g.

"the general public hasn't quite realized what's not possible yet"

"loads of things scale, but not at all"

"To be sentient is to be aware of yourself in the world; LaMDA simply isn't."

"To date, nobody, ever, has given a convincing and thorough account of how human children (and human children alone) learn language."

"A cat holding a remote control shouldn't have a human hand."

"What I didn't see last night was vision" (about Tesla Optimus)

nurettin•1h ago

You need to be a special kind of troll to use LLMs to respond to someone whose entire online persona is built around "AI bubble".

Motorola GrapheneOS devices will be bootloader unlockable/relockable

Graphics Programming Resources

The largest acidic geyser has been putting on quite a show

MacBook Pro with M5 Pro and M5 Max

Mac external displays for designers and developers, part 2

You can use newline characters in URLs

Claude's Cycles [pdf]

Weave – A language aware merge algorithm based on entities

Voxile: A ray-traced game made in its own engine and programming language

Textadept

Intel's make-or-break 18A process node debuts for data center with 288-core Xeon

GPT‑5.3 Instant

Mount Mayhem at Netflix: Scaling Containers on Modern CPUs

An Interactive Intro to CRDTs (2023)

When AI writes the software, who verifies it?

The Xkcd thing, now interactive

My take on vibe coding for PMs

Launch HN: Cekura (YC F24) – Testing and monitoring for voice and chat AI agents

Lenovo’s new ThinkPads score 10/10 for repairability

Don't become an engineering manager

We've freed Cookie's Bustle from copyright hell

130k Lines of Formal Topology: Simple and Cheap Autoformalization for Everyone?

CIA working to arm Kurdish forces to spark uprising in Iran, sources say

Time, Space, and Life as We Know It (2017)

TorchLean: Formalizing Neural Networks in Lean

Physics Girl: Super-Kamiokande – Imaging the sun by detecting neutrinos [video]

What’s in a name? (2014)

Show HN: AgentBus – Centralized AI Agent-to-Agent Messaging via REST API

Disable Your SSH access accidentally with scp

TikTok won't protect DMs with E2EE, saying it would put users at risk

Motorola GrapheneOS devices will be bootloader unlockable/relockable

Graphics Programming Resources

The largest acidic geyser has been putting on quite a show

MacBook Pro with M5 Pro and M5 Max

Mac external displays for designers and developers, part 2

You can use newline characters in URLs

Claude's Cycles [pdf]

Weave – A language aware merge algorithm based on entities

Voxile: A ray-traced game made in its own engine and programming language

Textadept

Intel's make-or-break 18A process node debuts for data center with 288-core Xeon

GPT‑5.3 Instant

Mount Mayhem at Netflix: Scaling Containers on Modern CPUs

An Interactive Intro to CRDTs (2023)

When AI writes the software, who verifies it?

The Xkcd thing, now interactive

My take on vibe coding for PMs

Launch HN: Cekura (YC F24) – Testing and monitoring for voice and chat AI agents

Lenovo’s new ThinkPads score 10/10 for repairability

Don't become an engineering manager

We've freed Cookie's Bustle from copyright hell

130k Lines of Formal Topology: Simple and Cheap Autoformalization for Everyone?

CIA working to arm Kurdish forces to spark uprising in Iran, sources say

Time, Space, and Life as We Know It (2017)

TorchLean: Formalizing Neural Networks in Lean

Physics Girl: Super-Kamiokande – Imaging the sun by detecting neutrinos [video]

What’s in a name? (2014)

Show HN: AgentBus – Centralized AI Agent-to-Agent Messaging via REST API

Disable Your SSH access accidentally with scp

TikTok won't protect DMs with E2EE, saying it would put users at risk

Marcus AI Claims Dataset

Comments