My complaint is: if you're trying to use an AI to help you find bugs, you'd sincerely hope that they would have *some* attempt to actually run the exploit. Having the LLM invent fake evidence that you have done so, when you haven't, is just evil, and should be resulting in these people being kicked straight off H1 completely.
I think my wetware pattern-matching brain spots a pattern there.
Unfortunately that's where it seems to end... I'm not that familiar with QUIC and HTTP/2, but I think the closest it gets is that the GitHub repo exists and has a `class QuicConnection` [3]. Beyond that, the QUIC protocol layer doesn't have any concept of exchanging stream priorities [4] and HTTP/2 priorities are something the client sends, not the server? The PoC also mentions HTTP/3 and PRIORITY_UPDATE frames, but those are from the newer RFC 9218 [5] and lack the stream dependencies used in HTTP/2 PRIORITY frames.
I should learn more about HTTP/3!
[1] https://blog.cloudflare.com/adopting-a-new-approach-to-http-...
[2] https://www.imperva.com/docs/imperva_hii_http2.pdf
[3] https://github.com/aiortc/aioquic/blob/218f940467cf25d364890...
[4] https://datatracker.ietf.org/doc/html/rfc9000#name-stream-pr...
[5] https://www.rfc-editor.org/rfc/rfc9218.html#name-the-priorit...
It's easy for reputational damage to exceed $1'000, but if 1000 people do this...
Most companies make you fill in expense reports for every trivial purchase. It would be cheaper to just let employees take the cash - and most employees are honest enough. However the dishonest employee isn't why they do expense reports (there are other ways to catch dishonest employees). There used to be a scam where someone would just send a bill for "services" and those got paid often enough until companies realized the costs and started making everyone do the expense reports so they could track the little expenses.
Recent toots on account has the news as well
[0] https://hackerone.com/evilginx?type=user [1] https://en.wikipedia.org/wiki/List_of_assigned_/8_IPv4_addre...
This was like two weeks ago. These things suck.
Yes. Unfortunately, some companies seem to pay out the bug bounty without even verifying that the report is actually valid. This can be seen on the "reporter"'s profile: https://hackerone.com/evilginx
I wonder if you could use AI to classify the probability factor that something is AI bullshit and deprioritize it?
FTFY
I wonder if reputation systems might work here - you could give anyone who id's with an AML/KYC provider some reputation, enough for two or three reports, let people earn reputation digging through zero rep submissions and give someone like 10,000 reputation for each accurate vulnerability found, and 100s for any accurate promoted vulnerabilities. This would let people interact anonymously if they want to edit, quickly if they found something important and are willing to AML/KYC, and privilege quality people.
Either way, AI is definitely changing economics of this stuff, in this case enshittifying first.
Personally I can't imagine how miserable it would be for my hard-earned expertise to be relegated to sifting through SLOP where maybe 1 in hundreds or even thousands of inquiries is worth any time at all. But it also doesn't seem prudent to just ignore them.
I don't think better ML/AI technology or better information systems will make a significant difference on this issue. It's fundamentally about trust in people.
I don't know where the limit would go.
> I feel like the problem seems to me to be behavior, not a technology issue.
Yes, it's a behavior issue, but that doesn't mean it can't be solved or at least minimized by technology, particularly as a technology is what's exacerbating the issue?
> It's fundamentally about trust in people.
Who is lacking trust in who here?
To be honest, this has been a grimly satisfying outcome of the AI slop debacle. For decades, the general stance of tech has been, “there is no such thing as a behavioral/social problem, we can always fix it with smarter technology”, and AI is taking that opinion and drowning it in a bathtub. You can’t fix AI slop with technology because anything you do to detect it will be incorporated into better models until they evade your tests.
We now have no choice but to acknowledge the social element of these problems, although considering what a shitshow all of Silicon Valley’s efforts at social technology have been up to now, I’m not optimistic this acknowledgement will actually lead anywhere good.
That makes it extremely hard to build a reputation system for a site like that. Almost all the accounts are going to be spam, and the highest quality accounts are going to freshly created and take ~ 1 action on the platform.
What if the human marks it as spam but you're actually legit? Deposit another 2€ to have the platform (like Hackerone or whichever you're reporting via) give a second opinion, you'll get the 4€ back if you weren't spamming. What to do with the proceeds from spammers? The first X euros of spam reports go to upkeep of the platform, the rest to a good cause defined by the projects to whom the reports were submitted because they were the ones who had to deal with reading the slop so they get at least this much out of it
Raise deposit cost so long as slop volume remains unmanageable
This doesn't discriminate against people who aren't already established, but it may be a problem if you live in a low-income country and can't easily afford 20€ (assuming it ever gets to that deposit level). Perhaps it wouldn't work, but it can first be trialed at a normal cost level. Another concern is anonymity and payment. We hackers are often a paranoid lot. One can always support cash in the mail though, the sender can choose whether their privacy is worth a postage stamp
This alignment problem between responding with what the user wants (e.g. a security report, flattering responses) and going against the user seems a major problem limiting the effectiveness of such systems.
Well the reporter in the report that stated it that they are open for employment https://hackerone.com/reports/3125832 Anyone want to hire them? They can play with ChatGPT all day and spam random projects with the AI slop.
Looking at one of the bogus reports, it doesn't even seem like a real person. Why do this if you're not trying to gain recognition?
They're doing it for money, a handful of their reports did result in payouts. Those reports aren't public though, so there's no way to know if they actually found real bugs or the reviewer rubber-stamped them without doing their due diligence.
In that sense, it has destroyed actual value as the noise crowds out the signal. AI could easily do the same to, like, all Internet communication.
They don't say it because the internet provides actual value.
And most contributions with 'AI help' tend to not follow the code practices of the code base itself, while also in general generating worse code.
Also, just like in HTTP stuff 'if curl does it its probably right', I'm also tend to think that 'if the curl team says something its bullshit its probably bullshit'.
Anything for linkedin, a light interface that doesn't required logging in?
I pretty much stopped going to linkedin years ago because they started aggressively directing a person to login. I was shocked this post works without login. I don't know if that is how it has always been, or if that is a recent change, or what. It would be nice to have alternative interfaces.
In case some people are getting gated here is their post:
===
Daniel Stenberg curl CEO. Code Emitting Organism
That's it. I've had it. I'm putting my foot down on this craziness.
1. Every reporter submitting security reports on #Hackerone for #curl now needs to answer this question:
"Did you use an AI to find the problem or generate this submission?"
(and if they do select it, they can expect a stream of proof of actual intelligence follow-up questions)
2. We now ban every reporter INSTANTLY who submits reports we deem AI slop. A threshold has been reached. We are effectively being DDoSed. If we could, we would charge them for this waste of our time.
We still have not seen a single valid security report done with AI help.
---
This is the latest one that really pushed me over the limit: https://hackerone.com/reports/3125832
===
I just opened the site with JS off on mobile. No issues.
https://blog.bismuth.sh/blog/bismuth-found-the-atop-bug
https://www.cve.org/CVERecord?id=CVE-2025-31160
The amount of bad reports curl in particular has gotten is staggering and it's all from people who have no background just latching onto a tool that won't elevate them.
Edit: Also shoutout to one of our old professors Brendan Dolan-Gavitt who now works on offensive security agents who has a highly ranked vulnerability agent XBOW.
https://hackerone.com/xbow?type=user
So these tools are there and doing real work its just there are so many people looking for a quick buck that you really have to tease the noise from the bs.
AI spam is bad. We've also never had a valid report from an by an LLM (that we could tell).
People using them will take any being told why a bug report is not valid, questions, or asks for clarification and run them back through the same confused LLM. The second pass through generates even deeper nonsense.
It's making even responding with anything but "closed as spam" not worth the time.
I believe that one day there will be great code examining security tools. But people believe in their hearts that that day is today, and that they are riding the backs of fire breathing hack dragons. It's the people that concern me. They cannot tell the difference between truth and garbage.
Based on current state, what makes you think this is given?
What in curl makes AI-based analysis completely ineffective?
The more positive take, and I think the biggest reason is that curl is just well made. But along the way, it most likely uses plenty of code analysis tools: static analysis, testing, coverage, fuzzing,... the classic. And I am sure these tools catch bugs before they are published. Is there an overlap between one of these tools and AI, can one substitute for the other?
Another possibility is that curl is "weird" enough to throw off AI-based code analysis. We won't change curl for that reason, but it may be good to know.
And yeah, it may just be that AI just sucks but only looking at one side of the equation is not very productive I think.
The article mentions spam and AI slop, it is a problem for sure, but the claim here is much stronger than "stop spamming me", it is "AI never worked". And I find it a bit surprising, because when I introduce an new category of tool on some code base I work with, AI or not, I almost always find at least a problem or two.
> Is there an overlap between one of these tools and AI, can one substitute for the other?
AI is a crude facsimile of any tool, which is both why it's useful and why it's ineffective. In the case linked from the post, it's hallucinating function names and likely hallucinating the entire patch. This hallucination would be an annoyance for the submitter using an AI tool to discover potential security vulnerabilities, and is both an annoyance and waste of time for the maintainer who was given the hallucination in bad faith.
jacksnipe•6h ago
x3n0ph3n3•6h ago
esafak•5h ago
If you're just parroting what you read, what is it that you do here?!
giantg2•5h ago
tough•5h ago
giantg2•5h ago
qmr•2h ago
esafak•2h ago
hashmush•5h ago
- I had to Google it...
- According to a StackOverflow answer...
- Person X told me about this nice trick...
- etc.
Stating your sources should surely not be a bad thing, no?
nraynaud•5h ago
spiffyk•5h ago
gruez•5h ago
I don't think I've ever seen anyone lambasted for citing stackoverflow as a source. At best, they chastised for not reading the comments, but nowhere as much pushback as for LLMs.
comex•5h ago
Also, using Stack Overflow correctly requires more critical thinking. You have to determine whether any given question-and-answer is actually relevant to your problem, rather than just pasting in your code and seeing what the LLM says. Requiring more work is not inherently a good thing, but it does mean that if you’re citing Stack Overflow, you probably have a somewhat better understanding of whatever you’re citing it for than if you cited an LLM.
spiffyk•5h ago
mynameisvlad•5h ago
If anything, SO having verified answers helps its credibility slightly compared to a LLM which are all known to regularly hallucinate (see: literally this post).
bloppe•5h ago
dpoloncsak•5h ago
"Hey, I didn't study this, I found it on Google. Take it with a grain of caution, as it came from the internet" has been shortened to "I googled it and...", which is now evolving to "Hey, I asked chatGPT, and...."
hx8•5h ago
Copy and pasting from ChatGPT has the same consequences as copying and pasting from StackOverflow, which is to say you're now on the hook supporting code in production that you don't understand.
tough•5h ago
I can use ChatGPT to teach me and understand a topic or i can use it to give me an answer and not double check and just copy paste.
Just shows off how much you care about the topic at hand, no?
multjoy•5h ago
tough•5h ago
multjoy•4h ago
If you don't know anything about the subject area, how do you know if you are asking the right questions?
ryandrake•3h ago
mystraline•3h ago
I will ask for all claims to be backed with cited evidence. And then, I check those.
In other cases, of things like code generation, I ask for a test harness be written in and test.
In some foreign language translation (High German to english), I ask for a sentence to sentence comparison in the syntax of a diff.
the_snooze•4h ago
It sucks at sports trivia. It will confidently return information that is straight up wrong [1]. This should be a walk in the park for an LLM, but it fails spectacularly at it. How is this useful for learning at all?
[1] https://news.ycombinator.com/item?id=43669364
giantrobot•3h ago
[0] https://en.m.wikipedia.org/wiki/Gell-Mann_amnesia_effect
theamk•5h ago
Starting the answer with "I asked ChatGPT and it said..." almost 100% means the poster did not double-check.
(This is the same with other systems: If you say, "According to Google...", then you are admitting you don't know much about this topic. This can occasionally be useful, but most of the time it's just annoying...)
misnome•5h ago
tough•5h ago
All marketing departments are trying to manipulate you to buy their thing, it should be illegal.
But just testing out this new stuff and seeing what's useful for you (or not) is usually the way
jacksnipe•5h ago
layer8•4h ago
tough•2h ago
stonemetal12•5h ago
mentalpiracy•4h ago
silversmith•3h ago
jacksnipe•3h ago
jstanley•2h ago
billyoneal•2h ago
jacksnipe•51m ago
mirrorlake•2h ago
Just do the research, and you don't have to qualify it. "GPT said that Don Knuth said..." Just verify that Don said it, and report the real fact! And if something turns out to be too difficult to fact check, that's still valuable information.
rhizome•4h ago
billyoneal•2h ago
kimixa•1h ago
And all the other examples will have a chain of "upstream" references, data and discussion.
I suppose you can use those same phrases to reference things without that, random "summaries" without references or research, "expert opinion" from someone without any experience in that sector, opinion pieces from similarly reputation-less people etc. but I'd say they're equally worthless as references as "According to GPT...", and should be treated similarly.
yoyohello13•5h ago
This is kind of the same with any AI gen art. Like I can go generate a bunch of cool images with AI too, why should I give a shit about your random Midjourney output.
h4ck_th3_pl4n3t•5h ago
They have to prove to someone that they're worth their money. /s
alwa•4h ago
It took a solid hundred years to legitimate photography as an artistic medium, right? To the extent that the controversy still isn’t entirely dead?
Any cool images I ask AI for are going to involve a lot less patience and refinement than some of these things the kids are using AI to turn out…
For that matter, I’ve watched friends try to ask for factual information from LLMs and found myself screaming inwardly at how vague and counterproductive their style of questioning was. They can’t figure out why I get results I find useful while they get back a wall of hedging and waffling.
kristopolous•2h ago
Here's an example https://files.meiobit.com/wp-content/uploads/2024/11/22l0nqm...
Being dismissive of AI art is like those people who dismiss electronic music because there's a drum machine.
Doing things well still requires an immense amount of skill and exhaustive amount of effort. It's wildly complicated
codr7•2h ago
kristopolous•1h ago
Photographers are not painters.
People who do modular synths aren't guitarists.
Technical DJing is quite different from tapping on a Spotify app on a smartphone.
Just because you've exclusively exposed yourself to crude implementations doesn't mean sophisticated ones don't exist.
delfinom•1h ago
People aren't trying to push photographs into painted works displays
People who do modular synths aren't typically trying to sell their music as country/rock/guitar based music.
A 3D modeler of a statue isn't pretending to be a sculpturist.
People pushing AI art are trying to slide it right into "human art" displays. Because they are talentless otherwise.
kristopolous•41m ago
The portraiture artist industry was dramatically disrupted by the daguerreotype.
The automobile dried up the income of farrier and blacksmith along with ending the horsemanship industry.
The rise of synthesizers in the 80s greatly reduced the number of studio musicians.
And it's undeniable that the industry of commercial artists is currently being disrupted by AI.
But the decline of portraiture artist due to daguerreotypes doesn't mean, say Ansel Adams is dogshit.
We can acknowledge both the industrial ramifications and the labor and skill of the new forms without being dismissive of either. Auto repair is still a skill. Driving a car is still work even if there's no horses.
When mechanical looms replaced manual weavers during the luddite movement, it might have killed countless careers but it didn't kill fashion. Our clothing isn't simulacrum echos of the 1820s.
This is the transfer of a skill into a property. Property isn't a thing, it's relationship between people about a thing.
Collective ownership of the means of production would fix this btw... These things are fixable.
evandrofisico•5h ago
jsheard•5h ago
pixl97•5h ago
mcny•4h ago
colecut•4h ago
pixl97•4h ago
It seems the initial rule seems rather worthless.
colecut•4h ago
2. So a rule with occasional exceptions is worthless, ok
layer8•4h ago
leptons•4h ago
You know how I know the difference between something an AI wrote and something a human wrote? The AI knows the difference between "to" and "too".
I guess you proved your point.
meindnoch•4h ago
1. *"If it's not worth writing, it's not worth reading"* is a normative or idealistic statement — it sets a standard or value judgment about the quality of writing and reading. It suggests that only writing with value, purpose, or quality should be produced or consumed.
2. *"There is a lot of handwritten crap"* is a descriptive statement — it observes the reality that much of what is written (specifically by hand, in this case) is low in quality, poorly thought-out, or not meaningful.
So, putting them together:
* The first expresses *how things ought to be*. * The second expresses *how things actually are*.
In other words, the existence of a lot of poor-quality handwritten material does not invalidate the ideal that writing should be worth doing if it's to be read. It just highlights a gap between ideal and reality — a common tension in creative or intellectual work.
Would you like to explore how this tension plays out in publishing or education?
palata•3h ago
It does NOT mean, AT ALL, that if it is worth writing, it is worth reading.
Logic 101?
floren•3h ago
ToValueFunfetti•1h ago
ModernMech•5h ago
layer8•4h ago
ModernMech•4h ago
Yes is true there could have been a skill issue. But it could also be true that the person just wanted input from people rather than Google. So that's why I drew the connection.
layer8•4h ago
ModernMech•4h ago
layer8•4h ago
ModernMech•3h ago
There are three main reasons I can think of for asking the Internet a question in 2010:
1. You don't know how to ask Google / you are too lazy.
2. You don't trust Google.
3. You already tried Google and it doesn't have the answer or it's wrong.
Maybe there are more I can't think of. But let's say you have one of those three reasons, so you post a question to an Internet forum in the year 2010. Someone replies back with lmgtfy. There are three typical responses depending on which of the those reasons you had f or posting:
1. "Thanks"
2. "Thanks, but I don't trust those sources, so I reiterate my question."
3. "Thanks, but I tried that and the answer is wrong, so I reiterate my question."
Now it's the year 2025 and you post a question to an Internet forum because you either don't know how to ask ChatGPT, don't trust ChatGPT, or already tried it and it's giving nonsense. Someone replies back with an answer from ChatGPT. There are three typical responses depending on your reason for posting to the forum.
1. "Thanks"
2. "Thanks, but I don't trust those sources, so I reiterate my question."
3. "Thanks, but I tried that and the answer is wrong, so I reiterate my question."
So the reason I drew the parallel was because of the similarity of experiences between 2010 and now for someone who doesn't trust this new technology.
XorNot•2h ago
jacksnipe•4h ago
soulofmischief•4h ago
cogman10•5h ago
Meaning, instead of listening to a real-life expert in the company telling them how to handle the problem they ignored my advice and instead dumped the garbage from GPT.
I really fear that a number of engineers are going to us GPT to avoid thinking. They view it as a shortcut to problem solve and it isn't.
delusional•5h ago
layer8•5h ago
I’m saying this tongue in cheek, but there’s some truth to it.
throwanem•4h ago
colechristensen•4h ago
Let's just say not listening to someone and then complaining that doing something else didn't work isn't exactly new.
colechristensen•4h ago
Oh but it is, used wisely.
One: it's a replacement for googling a problem and much faster. Instead of spending half an hour or half a day digging through bug reports, forum posts, and stack overflow for the solution to a problem. LLMs are a lot faster, occasionally correct, and very often at least rather close.
Two: it's a replacement for learning how to do something I don't want to learn how to do. Case Study: I have to create a decent-enough looking static error page for a website. I could do an awful job with my existing knowledge, I could spend half a day relearning and tweaking CSS, elements, etc. etc. or I could ask an LLM to do it and then tweak the results. Five minutes for "good enough" and it really is.
LLMs are not a replacement for real understanding, for digging into a codebase to really get to the core of a problem, or for becoming an expert in something, but in many cases I do not want to, and moreover it is a poor use of my time. Plenty of things are not my core competence or anywhere near the goals I'm trying to achieve. I just need a quick solution for a topic I'm not interested in.
ijidak•3h ago
There are so many things that a human worker or coder has to do in a day and a lot of those things are non-core.
If someone is trying to be an expert on every minor task that comes across their desk, they were never doing it right.
An error page is a great example.
There is functionality that sets a company apart and then there are things that look the same across all products.
Error pages are not core IP.
At almost any company, I don't want my $200,000-300,000 a year developer mastering the HTML and CSS of an error page.
throwanem•4h ago
I doubt the reason has to do with your qualities as an engineer, which must be basically sound. Otherwise why bother to launder the product of your judgment, as you described here someone doing?
silversmith•3h ago
jsight•2h ago
tharant•2h ago
kevmo314•2h ago
tharant•1h ago
cogman10•13m ago
However, I have a very strong suspicion they also didn't understand the GPT output.
To flush out the situation a bit further, this was a performance tuning problem with highly concurrent code. This engineer was initially tasked with the problem and they hadn't bothered to even run a profiler on the code. I did, shared my results with them, and the first action they took with my shared data was dumping a thread dump into GPT and asking it where the performance issues were.
Instead, they've simply been littering the code with timing logs in hopes that one of them will tell them what to do.
tharant•1h ago
How is this sentiment not different from my grandfather’s sentiment that calculators and computers (and probably his grandfather’s view of industrialization) are a shortcut to avoid work? From my perspective most tools are used as a shortcut to avoid work; that’s kinda the while point—to give us room to think about/work on other stuff.
stevage•53m ago
tharant•26m ago
I get it; I’m not an AI evangelist and I get frustrated with the slop too; Gen-AI (and many of the tools we’ve enjoyed over the past few millennia) was/is lauded as “The” singular tool that makes everything better; no tool can fulfill that role yet we always try to shoehorn our problems into a shape that fits the tool. We just need to use the correct tools for the job; in my mind, the only problem right now is that we have a really capable tool and have identified some really valuable use-cases for that tool yet we also keep trying to use it for (what I believe are, given current capabilities) use-cases that don’t fit the tool.
We’ll figure it out but, in the meantime, while I don’t like to generalize that a tech or its use-cases are objectively good/bad, I do tend to have an optimistic outlook for most tech—Gen-AI included.
parliament32•24m ago
tharant•8m ago
candiddevmike•5h ago
Seems like if all you do is forward questions to LLMs, maybe you CAN be replaced by a LLM.
mrkurt•4h ago
If they're saying it to you, why wouldn't you assume they understand and trust what they came up with?
Do you need people to start with "I understand and believe and trust what I'm about to show you ..."?
jacksnipe•4h ago
laweijfmvo•4h ago
JohnFen•4h ago
"I asked X and it said..." is an appeal to authority and suspect on its face whether or not X is an LLM. But when it's an LLM, then it's even worse. Presumably, the reason for the appeal is because the person using it considers the LLM to be an authoritative or meaningful source. That makes me question the competence of the person saying it.
Szpadel•4h ago
most annoying is when people trust chatgpt more that experts they pay. we had case when our client asked us for some specific optimization, and we told him that it makes no sense, then he asked the other company that we cooperate with and got similar response, then he asked chatgpt and it told him it's great idea. And guess what, he bought $20k subscription to implement it.
38•2h ago
English please
jacksnipe•54m ago
hedora•2h ago
If that's all the available information and you're out of time, you may as well cut the blue wire. But, pretty much any other source is automatically more trustworthy.
RadiozRadioz•3h ago
Frost1x•3h ago
I see email blasts suggesting I should be using it, I get peers saying I should be using it, I get management suggesting I should use it to cut costs… and there is some truth there but as usual, it depends.
I, like many others, can’t be asked to take on inefficiency in the name of efficiency ontop of currently most efficient ways to do my work. So I too say “ChatGPT said: …” because I dump lots of things into it now. Some things I can’t quickly verify, some things are off, and in general it can produce far more information than I have time to check. Saying “ChatGPT said…” is the current CYA caveat statement around the world of: use this thing but also take liability for it. No, if you practically mandate I use something, the liability falls on you or that thing. If it’s a quick verify I’ll integrate it into knowledge. A lot of things aren’t.
rippleanxiously•2h ago
"Hey, whatcha doin?"
"Oh hi, yea, this car has a slight misfire on cyl 4, so I was just pulling one of the coilpacks to-"
"Yea alright, that's great. So hey! You _really_ need to use this tool. Trust me, it's gonna make your life so much easier"
"umm... that's a 3d printer. I don't really think-"
"Trust me! It's gonna 10x your work!"
...
I love the tech. It's the evangelists that don't seem to bother researching the tech beyond making an account and asking it to write a couple scripts that bug me. And then they proclaim it can replace a bunch of other stuff they don't/haven't ever bothered to research or understand.
parliament32•19m ago
The ideal scenario: you write a few bulletpoints and ask Copilot to turn it into a long-form email to send out. Your receiving coworker then asks Copliot to distill it back into a few bullet points they can skim.
You saved 5 minutes but one of your points was ignored entirely and 20% of your output is nonsensical.
Your coworker saved 2 minutes but one of their bulletpoints was hallucinated and important context is missing from the others.
Microsoft collects a fee from both of you and is the only winner here.
godelski•2h ago
It is just bad design. You want errors to be as loud as possible. So they can be traced and resolved. On the other hand, LLMs optimize human preference (or some proxy of this). While humans prefer accuracy, it would be naive to ignore all the other things that optimize this objective. Specifically, humans prefer answers that they don't know are wrong over those that they do know are wrong.
This doesn't make LLMs useless but certainly it should strongly inform how we use them. Frankly, you cannot trust outputs, so you have to verify. I think this is where there's a big divergence between LLM users (and non-users). Those that blindly trust and those that don't (extreme case is non-users). If you need to constantly verify AND recognize that verification is extra hard (because it is optimized to be invisible to you), it can create extra work, not less.
It really is two camps and I think it says a lot:
Wide range of opinions in these two camps, but I think it comes down to some threshold of default trust or default suspicion.__turbobrew__•2h ago
xnx•1h ago
Current systems are definitely flawed (incomplete, biased, or imagined information), but I'd pick the answers provided by Gemini over a random social post, blog page, or influencer every time.
mwigdahl•28m ago
Recently I used o3 to plan a refactoring related to upgrading the version of C++ we are using in our product. It pointed out that we could use a tool built in to VS 2022 to make a particular change automatically based on compilation output. I was not familiar with this tool and neither were the other developers on the team.
I did confirm its accuracy myself, but also made sure to credit the model as the source of information about the tool.