LLMs are still surprisingly bad at some simple tasks

https://shkspr.mobi/blog/2025/09/llms-are-still-surprisingly-bad-at-simple-tasks/

52•FromTheArchives•1h ago

Comments

jw1224•1h ago

> “To stave off some obvious comments:

> yoUr'E PRoMPTiNg IT WRoNg!

> Am I though?”

Yes. You’re complaining that Gemini “shits the bed”, despite using 2.5 Flash (not Pro), without search or reasoning.

It’s a fact that some models are smarter than others. This is a task that requires reasoning so the article is hard to take seriously when the author uses a model optimised for speed (not intelligence), and doesn’t even turn reasoning on (nor suggest they’re even aware of it being a feature).

I asked the exact prompt to ChatGPT 5 Thinking and got an excellent answer with cited sources, all of which appears to be accurate.

softwaredoug•1h ago

In my experience reasoning and search come with their own set of tradeoffs. It works great when it works. But the variance can be wider than just an LLM.

Search and reasoning use up more context, leading to context rot, and subtler harder to detect hallucinations. Reasoning doesn’t always focus on evaluating the quality of evidence, just “problem solving” from some root set of axioms found in search.

I’ve had this happen in Claude code for example where it hallucinated a few details about a library based on what badly written forum post.

edent•1h ago

OP here. I literally opened up Gemini and used the defaults. If the defaults are shit, maybe don't offer them as the default?

Or, if LLMs are so smart, why doesn't it say "Hmmm, would you like to use a different model for this?"

Either way, disappointing.

magicalhippo•45m ago

> Or, if LLMs are so smart, why doesn't it say "Hmmm, would you like to use a different model for this?"

That is indeed an area where LLMs don't shine.

That is, not only are they trained to always respond with an answer, they have no ability to accurately tell how confident they are in that answer. So you can't just filter out low confidence answers.

mathewsanders•33m ago

Something I think would be interesting for model APIs and consumer apps to exposed would be the probability of each individual token generated.

I’m presuming that one class of junk/low quality output is when the model doesn’t have high probability next tokens and works with whatever poor options it has.

Maybe low probability tokens that cross some threshold could have a visual treatment to give feedback the same way word processors give feedback in a spelling or grammatical error.

But maybe I’m making a mistake thinking that token probability is related to the accuracy of output?

StilesCrisis•9m ago

Lots of research has been done here. e.g. https://aclanthology.org/2024.findings-acl.558.pdf

pwnOrbitals•38m ago

You are pointing out a maturity issue, not a capability problem. It's clear to everyone that LLM products are immature, but saying they are incapable is misleading

delusional•36m ago

In you mind, is there anything an LLM is _incapable_ of doing?

hobofan•36m ago

Then criticize the providers on their defaults instead of claiming that they can't solve the problem?

> Or, if LLMs are so smart, why doesn't it say "Hmmm, would you like to use a different model for this?"

That's literally what ChatGPT did for me[0], which is consistent from what they shared at the last keynote (quick-low reasoning answer per default first, with reasoning/search only if explicitly prompted or as a follow-up). It did miss one match tough, as it somehow didn't parse the `<search>` element from the MDN docs.

[0]: https://chatgpt.com/share/68cffb5c-fd14-8005-b175-ab77d1bf58...

maddmann•36m ago

“Defaults are shit” — is that really true though?! Just because it shits the bed on some tasks does not mean it is shit. For people integrating llms into any workflow that requires a modicum of precision or determinism, one must always evaluate output closely/have benchmarks. You must treat the llm as an incompetent but overconfident intern, and thus have fast mechanisms for measuring output and giving feedback.

delusional•50m ago

I just ran the same test on Gemini 2.5 pro (I assume it enables search by default, because it added a bunch of "sources") and got the exact same result as the author. It claims ".bdi" is the ccTLD for Burundi, which is false they have .bi[1]. It claims ".time" and ".article" are TLDs.

I think the authors point stands.

EDIT: I tried it with "Deep Research" too. Here it doesn't invent either TLDs or HTML Element, but the resulting list is incomplete.

[1]: https://en.wikipedia.org/wiki/.bi

dgfitz•27m ago

> … all of which appears to be accurate.

Isn’t that the whole goddamn rub? You don’t _know_ if they’re accurate.

voat•1h ago

So are people?

softwaredoug•1h ago

The difference might be people are actually held accountable for the results.

uncircle•1h ago

And, in most cases, they tend to learn from their mistakes.

MattGaiser•1h ago

Why is that better if in aggregate, the people who can be held accountable are worse anyway?

We are killing thousands on the road to be sure we can blame a driver instead of a computer as one example.

softwaredoug•32m ago

In the self driving case people are still held accountable. Now it’s the model developers and car manufacturers instead of drivers.

tossandthrow•1h ago

Yes, we also clearly see and upwards pressure on people.

amelius•46m ago

People are way more reliable.

randomtoast•1h ago

TLDR; OP used LLM models without search + reasoning and get bad results. He then concludes: Don't believe the hype.

raincole•59m ago

Once again an example of "anti-ai people are those who treat LLMs as oracles, not the pro-ai people."

amelius•45m ago

You mean the people who treat AI as it is advertised?

exe34•38m ago

do you believe everything you see in adverts?

maddmann•28m ago

It seems like a very backwards approach to testing a new technology: instead of “let me figure out how to use this tool”, the approach I see a lot is “let me continually use this tool in an known incorrect way”

Of course there is valuable knowledge in understanding limitations but that is not the approach the author is taking here, imo the author seems disingenuous.

sieve•57m ago

They are very good at some tasks and terrible at others.

I use LLMs for language-related work (translations, grammatical explanations etc) and they are top notch in that as long as you do not ask for references to particular grammar rules. In that case they will invent non-existent references.

They are also good for tutor personas: give me jj/git/emacs commands for this situation.

But they are bad in other cases.

I started scanning books recently and wanted to crop the random stuff outside an orange sheet of paper on which the book was placed before I handed the images over to ScanTailor Advanced (STA can do this, but I wanted to keep the original images around instead of the low-quality STA version). I spent 3-5 hours with Gemini 2.5 Pro (AI Studio) trying to get it to give me a series of steps (and finally a shell script) to get this working.

And it could not do it. It mixed up GraphicsMagick and ImageMagick commands. It failed even with libvips. Finally I asked it to provide a simple shell script where I would provide four pixel distances to crop from the four edges as arguments. This one worked.

I am very surprised that people are able to write code that requires actual reasoning ability using modern LLMs.

poszlem•54m ago

I think Gemini is one of the best example of an LLM that is in some cases the best and in some cases truly the worst.

I once asked it to read a postcard written by my late grandfather in Polish, as I was struggling to decipher it. It incorrectly identified the text as Romanian and kept insisting on that, even after I corrected it: "I understand you are insistent that the language is Polish. However, I have carefully analyzed the text again, and the linguistic evidence confirms it is Romanian. Because the vocabulary and alphabet are not Polish, I cannot read it as such." Eventually, after I continued to insist that it was indeed Polish, it got offended and told me it would not try again, accusing me of attempting to mislead it.

sieve•43m ago

I find that surprising, actually. Gemini is VERY good with Sanskrit and a few other Indian languages. I would expect it to have completely mastered European languages.

markasoftware•35m ago

as soon as an LLM makes a significant mistake in a chat (in this case, when it identified the text as Romanian), throw away the chat (or delete/edit the LLMs response if your chat system allows this). The context is poisoned at this point.

noosphr•34m ago

>Eventually, after I continued to insist that it was indeed Polish, it got offended and told me it would not try again, accusing me of attempting to mislead it.

I once had Claude tell me to never talk to it again after it got upset when I kept giving it peer reviewed papers explaining why it was wrong. I must have hit the tumbler dataset since I was told I was sealioning it, which took me back a while.

noosphr•38m ago

Just use Pillow and python.

It is the only way to do real image work these days, and as a bonus LLMs suck a lot less at giving you nearly useful python code.

The above is a bit of a lie as opencv has more capabilities, but unless you are deep in the weeds of preparing images for neural networks pillow is plenty good enough.

BOOSTERHIDROGEN•29m ago

Would you share your system prompt for that grammatical checker?

jstrieb•55m ago

The point from the end of the post that AI produces output that sounds correct is exactly what I try to emphasize to friends and family when explaining appropriate uses of LLMs. AI is great at tasks where sounding correct is the essence of the task (for example "change the style of this text"). Not so great when details matter and sounding correct isn't enough, which is what the author here seems to have rediscovered.

The most effective analogy I have found is comparing LLMs to theater and film actors. Everyone understands that, and the analogy offers actual predictive power. I elaborated on the idea if you're curious to read more:

https://jstrieb.github.io/posts/llm-thespians/

mexicocitinluez•41m ago

> When LLMs say something true, it’s a coincidence of the training data that the statement of fact is also a likely sequence of words;

Do you know what a "coincidence" actually is? The definition you're using is wrong.

It's not a coincidence that I train a model on healthcare regulations and it answers a question about healthcare regulations correctly.

None of that is coincidental.

If I trained it on healthcare regulations and asked it about recipes, it won't get anything right. How is that coincidental?

delusional•38m ago

> It's not a coincidence that I train a model on healthcare regulations and it answers a question about healthcare regulations

If you train a model on only healthcare regulations it wont answer questions about healthcare regulation, it will produce text that looks like healthcare regulations.

mexicocitinluez•34m ago

And that's not a coincidence. That's not what the word "coincidence" means. It's a complete misunderstanding of how these tools works.

delusional•31m ago

I don't think you're the right person to make any claim of "complete misunderstanding" when you claim that training an LLM on regulations would produce a system capable of answering questions about that regulation.

anthonylevine•26m ago

> you claim that training an LLM on regulations would produce a system capable of answering questions about that regulation.

Huh? But it does do that? What do you think training an LLM entails?

Are you of the belief that an LLM trained on non-medical data would have the same statical chance of answering a medical question correctly?

we're at the "Redefining what words means in order to not have to admit I was wrong" stage of this argument

jstrieb•32m ago

LLMs are trained on text, only some of which includes facts. It's a coincidence when the output includes new facts not explicitly present in the training data.

anthonylevine•27m ago

> It's a coincidence when the output includes facts,

That's not what a coincidence is.

A coincidence is: "a remarkable concurrence of events or circumstances without apparent causal connection."

Are you saying that training it on a subset of specific data and it responding with that data "does not have a causal connection"> Do you know how statistical pattern matching works?

Dilettante_•14m ago

Can I offer a different phrasing?

It's not coincidence that the answer contains the facts you want. That is a direct consequence of the question you asked and the training corpus.

But the answer containing facts/Truth is incidental from the LLMs point of view, in that the machine really does not care, nor even have any concept of whether it gave you the facts you asked for or just nice-sounding gibberish. The machine only wants to generate tokens, everything else is incidental. (To the core mechanism, that is. OpenAI and co obviously care a lot about quality and content of the output)

anthonylevine•6m ago

Totally agree with that. But the problem is the phrase "coincidence" makes it into something it absolutely isn't. And it's used to try and detract from what these tools can actually do.

They are useful. It's not a coin flip as to whether Bolt will produce a new design of a medical intake form for me if I ask it to. It does. It doesn't randomly give me a design for a social media app, for instance.

lsecondario•16m ago

I like this analogy a lot for non-technical...erm...audiences. I do hope that anyone using this analogy will pair it with loud disclaimers about not anthropomorphizing LLMs; they do not "lie" in any real sense, and I think framing things in those terms can give the impression that you should interpret their output in terms of "trust". The emergent usefulness of LLMs is (currently at least) fundamentally opaque to human understanding and we shouldn't lead people to believe otherwise.

masfuerte•54m ago

> Answering the question was a little tedious and subject to my tired human eyes making no mistakes

Who would do this manually? Concatenate the two lists and sort them. Use "uniq -c" to count the duplicate lines and grep to pull out the lines which occur twice. It would take a few seconds.

maddmann•32m ago

Good point: perhaps op should have had the llm output a script to compare the two lists.

mexicocitinluez•49m ago

> "Something that describes how an AI is convincing if you don't understand its reasoning, and close to useless if you understand its limitations."

This made me laugh. Because it's the exact opposite sentiment of anti-LLM crowd. So which is it? Is it only useful if you know what you're doing or less useful if you know what you're doing?

> "I can't wait until I can jack into the Metaverse and buy an NFT with cryptocurrency just by using an LLM! Perhaps I can view it on my 3D TV by streaming it over WIMAX? I'd better stock up on quantum computers to make sure it all works."

In the author's attempt to be a smartass, they showed themselves. It makes them sound childish. Instead of just admitting they were wrong, they make some flippant remark about cryptocurrency and NFT'S, despite having vastly different purposes and goals and successes. Just take the L.

to add: "I shouldn't have to know anything about LLMs to use them correctly" is one heck of a take, but ok.

> "I don't. I hate the way this is being sold as a universal and magical tool. The reality doesn't live up to the hype."

And I hate the way in which people will do the opposite: claim it has no uses cases. It's literally the same sentiment, but in reverse. It's just as myopic and naive. But for whatever reason, we can look at a CEO hawking it and think "They're just trying to make more money" but can't see the flipside of devs not wanting to lose their livelihoods to something. We have just as much to lose as they have to gain, but want to pretend like we're objective.

Dilettante_•47m ago

>This is a pretty simple question to answer. Take two lists and compare them.

This continues a pattern as old as home computing: The author does not understand the task themselves, consequently "holds the computer wrong", and then blames the machine.

No "lists" were being compared. The LLM does not have a "list of TLDs" in its memory that it just refers to when you ask it. If you haven't grokked this very fundamental thing about how these LLMs work, then the problem is really, distinctly, on your end.

roxolotl•41m ago

That’s the point the author is making. The LLMs don’t have the raw correct information required to accomplish the task so all they can do is provide a plausible sounding answer. And even if it did the way they are architected still can only results in a plausible sounding answer.

Dilettante_•33m ago

They absolutely could have accomplished the task. The task was purposefully or ignorantly posed in a way that is known to be not suited to the LLM, and then the author concluded "the machine did not complete the task because it sucks."

Blahah•33m ago

Not really. This works great in Claude Sonnet 4.1: 'Please could you research a list of valid TLDs and a list of valid HTML5 elements, then cross reference them to produce a list of HTML5 elements which are also valid TLDs. Use search to find URLs to the lists, then use the analysis tool to write a script that downloads the lists, normalises and intersects them.'

Ask a stupid question, get a stupid answer.

Lapel2742•24m ago

> This works great in Claude Sonnet 4.1: 'Please could you research a list of valid TLDs and a list of valid HTML5 elements, then cross reference them to produce a list of HTML5 elements which are also valid TLDs. Use search to find URLs to the lists, then use the analysis tool to write a script that downloads the lists, normalises and intersects them.'

Ok, I only have to:

1. Generally solve the problem for the AI

2. Make a step by step plan for the AI to execute

3. Debug the script I get back and check by hand if it uses reliable sources.

4. Run that script.

For what do I need the AI?

Blahah•9m ago

The work. It intelligently provides the labor, it doesn't replace your brain. It runs the script itself.

Dilettante_•6m ago

Try doing all of that by hand instead. The difference is about half an hour to an hour of work plus giving your attention to such a minor menial task.

Also, you are literally describing how you are holding it wrong. If you expect the LLM to magically know what you want from it without you yourself having to make the task understandable to the machine, you are standing in front of your dishwasher waiting for it to grow arms and do your dishes in the sink.

Lapel2742•34m ago

> No "lists" were being compared.

How would you solve that problem? You'd probably go to the internet, get the list of TLDs and the list of HTML5-Element and than compare those lists.

The author compares three commercial large‑language models that have direct internet access, but none of them appear capable of performing this seemingly simple task. I think his conclusion is valid.

tromp•46m ago

I wanted to check the prime factors of 1966 the other day so I googled it and it led me to https://brightchamps.com/en-us/math/numbers/factors-of-1966 , a site that seems focussed on number facts. It confidently states that prime factors of 1966 are 2, 3, 11, and 17. For fun I tried to multiply these numbers back in my head and concluded there's no way that 6 * 187 could reach 1966.

That's when I realized this site was making heavy use of AI. Sadly, lots of people are going to trust but not verify...

croes•30m ago

This is also very wrong

> A factor of 1966 is a number that divides the number without remainder.

>The factors of 1966 are 1, 2, 3, 6, 11, 17, 22, 33, 34, 51, 66, 102, 187, 374, 589, 1178, 1966.

If I google for the factors of 1966 the Google AI gives the same wrong factors.

amelius•10m ago

They're talking about prime factors, not that it changes much.

cdsghh•41m ago

https://chatgpt.com/share/68cffaab-4c14-8006-89a2-1818172e4d...

Tried on ChatGPT, seems fine.

simianwords•36m ago

https://chatgpt.com/s/t_68cffbc05ef48191996ffbaa3c6e55a7 same with non pro.

hobofan•30m ago

Non-pro: https://chatgpt.com/share/68cffb5c-fd14-8005-b175-ab77d1bf58...

It's consistently missing `search` for all of us.

ozgung•15m ago

Correct. This actually falsifies OP's argument. Compared to OP's list from 2 years ago [1] ChatGPT omits ".search" but it says it's not a TLD anymore. GPT also finds 2 near misses, picture(s) and code(s). It does this in 10 minutes with 33 reasoning steps. It verifies them and provides citations in this time. Also checks OpenAI policy documents for some reason.

[1] https://shkspr.mobi/blog/2023/09/false-friends-html-elements...

gherkinnn•38m ago

Don't use a microwave to fry a steak then. This is an irritating post and I have plenty of skepticism towards AI. LLMs were always bad at this kind of task, simple to us humans as it may be. This post proves nothing that wasn't known for two years.

However, I do superficially agree with some of the links at the end. LLMs as they have been so far are confirmation machines and it does take skill to use them effectively. Or knowing when not to use them.

amelius•28m ago

> Don't use a microwave to fry a steak then.

Except this microwave is advertised as also for steaks. And sometimes it works, and sometimes you cannot even warm milk in it. It's totally not reliable.

gherkinnn•9m ago

I do realise LLMs are advertised as God-in-a-pocket (when they are demonstrably not and claiming they represent a bigger step in humanity than harnessing fire is deranged) but I remain hopeful most people on this (VC funded) forum don't fall for those promises.

brid•38m ago

Why didn't he post the correct answer himself?

edent•21m ago

I did. It is literally the first link in the post.

> I know this question is possible to answer _because I went through the lists two years ago_.

simianwords•37m ago

https://chatgpt.com/s/t_68cffbc05ef48191996ffbaa3c6e55a7

Is this the right answer? Seems like it. I used the thinking model.

xbmcuser•36m ago

I don't ask LLM to do that I ask LLM write me a python script to do that. I am not a programmer but to me using llm to do stuff like comparing list etc is not understanding what LLM are as well as huge waste of resources

jstummbillig•35m ago

> I think it comes down to how familiar you are with the domain and its constraints. When I watch a medical drama, I have no idea if they're using realistic language. It sounds good, but real doctors probably cringe at the inaccuracies.

By now, numerous notable programmers have reported positive experiences with all forms of AI-assisted coding, which this conclusion arrogantly fails to account for.

amelius•31m ago

Yes they are useful somnetimes but I also often get stuck, trying to get an AI to give correct answers but to no use, wasting 15 minutes of my time.

simianwords•32m ago

Author seemed to have used a weak model since the strong models get the answer. They should have put more thought into it and at least provide a comparison.

As a ChatGPT user I would have reached for the thinking model for such questions. I understand if the “auto” model doesn’t pick the right model here - but confident claims from the author should be backed up by at least this much.

edent•24m ago

How do you think most people use tools?

Go sit on public transport and look at how people use their devices. They don't fiddle with settings or dive deep into configuration menus.

I literally just opened the tools and used what they gave me. They're sold on the promise that "this thing is really clever and will answer any question!!" so why should I have to spend time futzing with it?

thewisenerd•28m ago

> To be clear, I would expect a moderately intelligent teenager to be able to find two lists and compare them. If an intern gave me the same attention to detail as above, we'd be having a cosy little chat about their attitude to work.

sure, but when I expect this [1] from _any_ full time hire, my "expectations are too high from people" and "everybody has their strengths"

[1] find a list of valid html5 elements, find a list of TLDs, have an understanding of ccTLDs and gTLDs

unleaded•28m ago

I had a strange AI experience the other day. I asked Kimi K2 to write a 6809 indexed operand decoder in C, and this is what it gave:

  uint16_t ea_indexed(void)
  {
      uint8_t post = *PC++;
      uint16_t base, off = 0;

      /* 1. pick base register */
      static const uint16_t *const base_tbl[4] = { &X, &Y, &U, &S };
      base = *base_tbl[(post >> 5) & 3];

      /* 2. work out the effective address */
      if ((post & 0x80) == 0) {                /* 5-bit signed offset */
          off = (int8_t)(post << 3) >> 3;
      } else if ((post & 0x60) == 0x20) {      /* 8- or 16-bit offset */
          if (post & 0x10) {                   /* 16-bit */
              off = (int16_t)fetch_be16(PC);
              PC += 2;
          } else {                             /* 8-bit */
              off = (int8_t)*PC++;
          }
      } else if ((post & 0x60) == 0x40) {      /* auto inc/dec */
          int8_t step = ((post & 0x0F) == 0x0) ? 1 :
                        ((post & 0x0F) == 0x1) ? 2 :
                        ((post & 0x0F) == 0x2) ? -1 :
                        ((post & 0x0F) == 0x3) ? -2 : 0;
          if (step > 0) base += step;          /* post-increment */
          off = step < 0 ? step : 0;           /* pre-decrement already applied */
          if (step < 0) base += step;
      } else if ((post & 0x60) == 0x60) {      /* accumulator offset */
          static const uint8_t scale[4] = {1,1,2,1};   /* A,B,D,illegal */
          uint8_t acc = (post >> 3) & 3;
          if (acc == 0) off = A;
          else if (acc == 1) off = B;
          else if (acc == 2) off = (A<<8)|B;   /* D */
          off *= scale[acc];
      } else {                                   /* 11x111xx is illegal */
          illegal();
      }

      uint16_t ea = base + off;

      /* 3. optional indirect */
      if (post & 0x10) ea = read16(ea);

      return ea;
  }

( full convo: https://text.is/4ZW2J )

From looking at Page 150 of https://colorcomputerarchive.com/repo/Documents/Books/Motoro... it looked pretty much perfect except for the accumulator addressing. That's impressive...

Then in another chat I asked it "give a technical description of how the 6809 indexed operands are decoded" and it just can't do it. It always gets the fundamentals wrong and makes pretty much everything up. Try it yourself, doesn't have to be Kimi most other AIs get it wrong too.

My assumption is that it's learned to how to represent it in code from reading emulator sources, but hasn't quite mapped it well enough to be able to explain it in English.. or something like that.*

K0balt•27m ago

The training data is not automatically in the context scope, and on list tasks LLMs have nearly no way to ensure completeness due to their fundamental characteristics.

To do a task like this with LLMs, you need to use a document for your source lists or bring them directly into context, then a smart model with good prompting might zero-shot it.

But if you want any confidence in the answer, you need to use tools: “here is two lists, write a python script to find the exact matches, and return a new list with only the exact matches. Write a test dataset and verify that there are no errors, omissions, or duplicates.”

LLMs plus tools / code are amazing. LLMs on their own are a professor with an intermittent heroin problem.

unleaded•20m ago

https://dubesor.de/WashingHands

This is my personal favourite example of LLMs being stupid. It's a bit old but it's very funny that Grok is the only one that gets it..

vinc•19m ago

The other day I found that they were struggling with "find me two synonyms of 'downloading' and 'extracting' that are the same length" because I was writing a script and wanted to see if could align the next path parameter.

First there's the tokenization issue, the same old "how many R in STRAWBERRY" where they are often confidently wrong, but I also asked not to mix tense (-ing and -ed for example) and that was very hard for them.

iLoveOncall•10m ago

Surprisingly, really? Is anyone still surprised by that?

ozgung•2m ago

Claude Opus 4.1 generated me a small web app in two minutes to find the correct answer: https://claude.ai/public/artifacts/ffbb642b-8883-4b4d-8699-d...

Inside the Space Force as it prepares for a new kind of war

Project management when the project is data

Is CSS Turing complete? (2010)

Genetic factor underlying self-reported math ability andhighest math class taken

Silksong Interactive Map – Ultimate Guide

Keep Docs Always Up-to-Date with Self-Updating Screenshots

Australia joins UK and Canada in formally recognising Palestinian state

The biggest coding mistakes I made building Kiwi News

My AI Speaks `Curl` and `Jq` – MCP Can Catch Up Later

Disk Utility still can't check and repair APFS volumes and containers

A natural experiment on the effect of herpes zoster vaccination on dementia

Errortype v0.0.5 Released – Now with golangci-lint Plugin Support

Making an Impact as a Manager

Britain, Australia and Canada Recognize a Palestinian State

Show HN: Simple to follow Postgres backup and restore guide

My brain can't process Nikola founder fraud case being dropped?

Two Years After the FineWoven Fiasco, Is TechWoven Better?

Super-sensitive sensor detects hydrogen leaks in seconds for safer energy use

Show HN: Viralwalk – Random Website Discoverer

My personal website, but now it's just plain HTML

AutoCodeBench: Large Language Models Are Automatic Code Benchmark Generators

The Dark Glow of Radium

No More User Interface?

Ask HN: Does dictionary tab in macOS spotlight always return "no results found"?

Fractiles – Collection of fraction games for iOS/Android

NATO's Eastward Expansion: Is Vladimir Putin Right? (2022)

PatchMon: Open-source Linux patch monitoring across multiple hosts

Trump Says Critical Coverage of Him Is 'Really Illegal'

12 Git commands visualized in 3D: a spatial approach to version control [video]

Waymo Safety Impact