EDIT: literally saw it just now after refreshing. I guess they didn't roll it out immediately to everyone.
e: if you mean university, fair. that'll be an interesting transition. I guess then you pay for the sports team and amenities?
In the US at least, most kids are in public schools and the collective community foots the bill for the “daycare”, as you put it.
I ultimately dropped the course and took it in the summer at a community college where we had the 20-30 standard practice problem homework where you apply what you learned in class and grind problems to bake it into core memory.
AI would have helped me at least get through the uni course. But generally I think it's a problem with the school/class itself if you aren't learning most of what you need in class.
These groups were some of the most valuable parts of the university experience for me. We'd get take-out, invade some conference room, and slam our heads against these questions well into the night. By the end of it, sure... our answers looked superficially similar, but it was because we had built a mutual, deep understanding of the answer—not just copying the answers.
Even if you had only a rough understanding, the act of trying to teach it again to others in the group made you both understand it better.
And we literally couldn't figure it out. Or the group you were in didn't have a physics rockstar. Or you weren't so social or didn't know anyone or you just missed an opportunity to find out where anyone was forming a group. It's not like the groups were created by the class. I'd find myself in a group of a few people and we just couldn't solve it even though we knew the lecture material.
It was a negative value class that cost 10x the price of the community college course yet required you to teach yourself after a lecture that didn't help you do the homework. A total rip-off.
Anyways, AI is a value producer here instead of giving up and getting a zero on the homework.
Does it offer meaningful benefits to students over self directed study?
Does it out perform students who are "learning how to learn"?
What affect does allowing students to make mistakes have compared to being guided through what to review?
I would hope Study Mode would produce flash card prompts and quantize information for usage in spaced repetition tools like Mochi [1] or Anki.
See Andy's talk here [2]
They want a student to use it and say “I wouldn’t have learned anything without study mode”.
This also allows them to fill their data coffers more with bleeding edge education. “Please input the data you are studying and we will summarize it for you.”
Not to be contrarian, but do you have any evidence of this assertion? Or are you just confidently confabulating a response for something outside of the data you've been exposed to? Because a commentor below provided a study that directly contradicts this.
This isn't study mode, it's a different AI tutor, but:
"The median learning gains for students, relative to the pre-test baseline (M = 2.75, N = 316), in the AI-tutored group were over double those for students in the in-class active learning group."
"The occurrence of inaccurate “hallucinations” by the current [LLMs] poses a significant challenge for their use in education. [...] we enriched our prompts with comprehensive, step-by-step answers, guiding the AI tutor to deliver accurate and high-quality explanations (v) to students. As a result, 83% of students reported that the AI tutor’s explanations were as good as, or better than, those from human instructors in the class."
Not at all dismissing the study, but to replicate these results for yourself, this level of gain over a classroom setting may be tricky to achieve without having someone make class materials for the bot to present to you first
Edit: the authors further say
"Krupp et al. (2023) observed limited reflection among students using ChatGPT without guidance, while Forero (2023) reported a decline in student performance when AI interactions lacked structure and did not encourage critical thinking. These previous approaches did not adhere to the same research-based best practices that informed our approach."
Two other studies failed to get positive results at all. YMMV a lot apparently (like, all bets are off and your learning might go in the negative direction if you don't do everything exactly as in this study)
unfortunately that group is tiny and getting tinier due to dwindling attention span.
I bring this up because the way I see students "study" with LLMs is similar to this misapplication of tutoring. You try something, feel confused and lost, and immediately turn to the pacifier^H^H^H^H^H^H^H ChatGPT helper to give you direction without ever having to just try things out and experiment. It means students are so much more anxious about exams where they don't have the training wheels. Students have always wanted practice exams with similar problems to the real one with the numbers changed, but it's more than wanting it now. They outright expect it and will write bad evals and/or even complain to your department if you don't do it.
I'm not very optimistic. I am seeing a rapidly rising trend at a very "elite" institution of students being completely incapable of using textbooks to augment learning concepts that were introduced in the classroom. And not just struggling with it, but lashing out at professors who expect them to do reading or self study.
However consider the extent to which LLMs make the learning process more enjoyable. More students will keep pushing because they have someone to ask. Also, having fun & being motivated is such a massive factor when it comes to learning. And, finally, keeping at it at 50% the speed for 100% the material always beats working at 100% the speed for 50% the material. Who cares if you're slower - we're slower & faster without LLMs too! Those that persevere aren't the fastest; they're the ones with the most grit & discipline, and LLMs make that more accesible.
(Qualifications: I was a reviewer on the METR study.)
Like yeah, if you’ve only ever used an axe you probably don’t know the first thing about how to use a chainsaw, but if you know how to use a chainsaw you’re wiping the floor with the axe wielders. Wholeheartedly agree with the rest of your comment; even if you’re slow you lap everyone sitting on the couch.
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
I believe we'll see the benefits and drawbacks of AI augmentation to humans performing various tasks will vary wildly based on the task, the way the AI is being asked to interact, and the AI model.
It concludes theres a learning curve that generally takes about 50 hours of time to figure out. The data shows that the one engineer who had more than 50 hours of experience with Cursor actually worked faster.
This is largely my experience, now. I was much slower initially, but I've now figured out the correct way to prompt, guide, and fix the LLM to be effective. I produce way more code and am mentally less fatigued at the end of each day.
I would not use it if it was for something with a strictly correct answer.
If you're the other 90% of students that are only learning to check the boxes and get through the courses to get the qualification at the end... are you going to bother using this?
Of course, maybe this is "see, we're not trying to kill education... promise!"
Just like it's easier to be productive if you have a separate home office and couch, because of the differing psychological contexts, it's easier if you have a separate context for "just give me answers" and "actually teach me the thing".
Also, I don't know about you, but (as a professional) even though I actively try to learn the principals behind the code generated, I don't always want to spend the effort prompting the model away from the "just give me results with a simple explanation" personality I've cultivated. It'd be nice having a mode with that work done for me.
There is no way to learn without effort. I understand they are not claiming this, but many students want a silver bullet. There isn't one.
Same problem exists for all educational apps. Duolingo users have the goal of learning a language, but also they only want to use Duolingo for a few minutes a day, but also they want to feel like they're making progress. Duolingo's goal is to keep you using Duolingo, and if possible it'd be good for you to learn the language, but their #1 goal is to keep you coming back. Oddly, Duolingo might not even be wrong to focus primariliy on keeping you moving forward, given how many people give up when learning a new language.
So, unless you have experience with this products that contradicts their claims, it's a good tutor by your definition.
The criticism of cliff's notes is generally that it's a superficial glance. It can't go deeper, it's basically a summary.
The LLM is not that. It can zoom in and out of a topic.
I think it's a poor criticism.
I don't think it's a silver bullet for learning, but it's a unified, consistent interface across topics and courses.
If LLM's got better at just responding with: "I don't know", I'd have less of an issue.
Some topics you learn to beware and double check. Or ask it to cite sources. (For me, that's car repair. It's wrong a lot.)
I wish it had some kind of confidence level assessment or ability to realize it doesn't know, and I think it eventually will have that. Most humans I know are also very bad at that.
Sure, but only as long as you're not terribly concerned with the result being accurate, like that old reconstruction of Obama's face from a pixelated version [1] but this time about a topic for which one is, by definition, not capable of identifying whether the answer is correct.
[1] https://www.theverge.com/21298762/face-depixelizer-ai-machin...
It's unlikely to make up the same bullshit twice.
Usually exploring a topic in depth finds these issues pretty quickly.
unavoidably, people who don't want to work, won't push the "work harder" button.
Yes, if my teacher could split into a million of themselves and compete against me on the job market at $200/mo.
I made A deep research assistant for families. Children can ask questions to explain difficult concepts and for parents to ask how to deal with any parenting situation. For example a 4 year old may ask “why does the plate break when it falls?”
example output: https://www.studyturtle.com/ask/PJ24GoWQ-pizza-sibling-fight...
I ask because every serious study on using modern generative AI tools tends to conclude fairly immediate and measurable deleterious effects on cognitive ability.
Now, everyone basically has a personal TA, ready to go at all hours of the day.
I get the commentary that it makes learning too easy or shallow, but I doubt anyone would think that college students would learn better if we got rid of TA's.
Closed: RTFM, dumbass
<No activity for 8 years, until some random person shows up and asks "Hey did you figure it out?">
I really do write that stuff for myself, turns out.
J. Random Hacker: Why are you doing it like that?
Newb: I have <xyz> constraint in my case that necessitates this.
J. Random Hacker: This is a stupid way to do it. I'm not going to help you.
I find it odd that someone who has been to college would see this as a _bad_ way to learn something.
I'm not sold on LLMs being a replacement, but post-secondary was certainly enriched by having other people to ask questions to, people to bounce ideas off of, people that can say "that was done 15 years ago, check out X", etc.
There were times where I thought I had a great idea, but it was based on an incorrect conclusion that I had come to. It was helpful for that to be pointed out to me. I could have spent many months "paving forward", to no benefit, but instead someone saved me from banging my head on a wall.
Sure, you could pave forward, but realistically, you'll get much farther with either a good textbook or a good teacher, or both.
Learning a new programming language used to be mediated with lots of useful trips to Google to understand how some particular bit worked, but Google stopped being useful for that years ago. Even if the content you're looking for exists, it's buried.
We were able to learn before LLMs.
Libraries are not a new thing. FidoNet, USENET, IRC, forums, local study/user groups. You have access to all of Wikipedia. Offline, if you want.
I think it's accurate to say that if I had to do that again, I'm basically screwed.
Asking the LLM is a vastly superior experience.
I had to learn what my local library had, not what I wanted. And it was an incredible slog.
IRC groups is another example--I've been there. One or two topics have great IRC channels. The rest have idle bots and hostile gatekeepers.
The LLM makes a happy path to most topics, not just a couple.
Not to be overly argumentative, but I disagree, if you're looking for a deep and ongoing process, LLMs fall down, because they can't remember anything and can't build upon itself in that way. You end up having to repeat alot of stuff. They also don't have good course correction (that is, if you're going down the wrong path, it doesn't alert you, as I've experienced)
It also can give you really bad content depending on what you're trying to learn.
I think for things that represent themselves as a form of highly structured data, like programming languages, there's good attunement there, but you start talking about trying to dig around about advanced finance, political topics, economics, or complex medical conditions the quality falls off fast, if its there at all
It was way nicer than a book.
That's the experience I'm speaking from. It wasn't perfect, and it was wrong sometimes, sure. A known limitation.
But it was flexible, and it was able to do things like relate ideas with programming languages I already knew. Adapt to my level of understanding. Skip stuff I didn't need.
Incorrect moments or not, the result was i learned something quickly and easily. That isn't what happened in the 90s.
But that's the entire problem and I don't understand why it's just put aside like that. LLMs are wrong sometimes, and they often just don't give you the details and, in my opinion, knowing about certain details and traps of a language is very very important, if you plan on doing more with it than just having fun. Now someone will come around the corner and say 'but but but it gives you the details if you explicitly ask for them'. Yes, of course, but you just don't know where important details are hidden, if you are just learning about it. Studying is hard and it takes perseverance. Most textbooks will tell you the same things, but they all still differ and every author usually has a few distinct details they highlight and these are the important bits that you just won't get with an LLM
Nobody can write an exhaustive tome and explore every feature, use, problem, and pitfall of Python, for example. Every text on the topic will omit something.
It's hardly a criticism. I don't want exhaustive.
The llm taught me what I asked it to teach me. That's what I hope it will do, not try to caution me about everything I could do wrong with a language. That list might be infinite.
How can you know this when you are learning something? It seems like a confirmation bias to even have this opinion?
It's entirely possible they learned nothing and they're missing huge parts.
But we're sort of at the point where in order to ignore their self-reported experience, we're asking philosophical questions that amount to "how can you know you know if you don't know what you don't know and definitely don't know everything?"
More existentialism than interlocution.
If we decide our interlocutor can't be relied upon, what is discussion?
Would we have the same question if they said they did it from a book?
If they did do it from a book, how would we know if the book they read was missing something that we thought was crucial?
I was attempting to imply that with high-quality literature, it is often reviewed by humans who have some sort of knowledge about a particular topic or are willing to cross reference it with existing literature. The reader often does this as well.
For low-effort literature, this is often not the case, and can lead to things like https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect where a trained observer can point out that something is wrong, but an untrained observer cannot perceive what is incorrect.
IMO, this is adjacent to what human agents interacting with language models experience often. It isn't wrong about everything, but the nuance is enough to introduce some poor underlying thought patterns while learning.
Perhaps the most famous example of this is Warren Buffet. For years Buffet missed out on returns from the tech industry [1] because he avoided investing in tech company stocks due to Berkshire's long standing philosophy to never invest in companies whose business model he doesn't understand.
His light bulb moment came when he used his understanding of a business he understood really well i.e. their furniture business [3] to value Apple as a consumer company rather than as a tech company leading to a $1bn position in Apple in 2016 [2].
[0] https://en.wikipedia.org/wiki/Transfer_of_learning
[1] https://news.ycombinator.com/item?id=33612228
[2] https://www.theguardian.com/technology/2016/may/16/warren-bu...
[3] https://www.cnbc.com/2017/05/08/billionaire-investor-warren-...
That's totally different than saying they are not flawless but they make learning easier than other methods, like you did in this comment
It also doesn't seem to do a good job of building on "memory" over time. There appears to be some unspoken limit there, or something to that affect.
Figuring out 'make' errors when I was bad at C on microcontrollers a decade ago? (still am) Careful pondering of possible meanings of words... trial and error tweaks of code and recompiling in hopes that I was just off by a tiny thing, but 2 hours later and 30 attempts later, and realizing I'd done a bad job of tracking what I'd tried and hadn't? Well, made me better at being careful at triaging issues. But it wasn't something I was enthusiastic to pick back up the next weekend, or for the next idea I had.
Revisiting that combination of hardware/code a decade later and having it go much faster with ChatGPT... that was fun.
Like, I agree with you and I believe those things will resist and will always be important, but it doesn't really compare in this case.
Last week I was in the nature and I saw a cute bird that I didn't know. I asked an AI and got the correct answer in 10 seconds. Of course I would find the answer at the library or by looking at proper niche sites, but I would not have done it because I simply didn't care that much. It's a stupid example but I hope it makes the point
We were able to learn before the invention of writing, too!
I haven't tested them on many things. But in the past 3 weeks I tried to vibe code a little bit VHDL. On the one hand it was a fun journey, I could experiment a lot and just iterated fast. But if I was someone who had no idea about hardware design, then this trash would've guided me the wrong way in numerous situations. I can't even count how many times it has built me latches instead of clocked registers (latches bad, if you don't know about it) and that's just one thing. Yes I know there ain't much out there (compared to python and javascript) about HDLs, even less regarding VHDL. But damn, no no no. Not for learning. never. If you know what you're doing and you have some fundamental knowledge about the topic, then it might help to get further, but not for the absolute essentials, that will backfire hard.
Pre-LLM, even finding the ~5 textbooks with ~3 chapters each that decently covered the material I want was itself a nontrivial problem. Now that problem is greatly eased.
They can recommend many unknown books as well, as language models are known to reference resources that do not exist.
This simply hasn't been my experience.
Its too shallow. The deeper I go, the less it seems to be useful. This happens quick for me.
Also, god forbid you're researching a complex and possibly controversial subject and you want it to find reputable sources or particularly academic ones.
This generation of AI doesn't yet have the knowledge depth of a seasoned university professor. It's the kind of teacher that you should, eventually, surpass.
1) The broad overview of a topic
2) When I have a vague idea, it helps me narrow down the correct terminology for it
3) Providing examples of a particular category ("are there any examples of where v1 in the visual cortex develops in a disordered way?")
4) "Tell me the canonical textbooks in field X"
5) Posing math exercises
6) Free form branching--while talking about one topic, I want to shift to another that is distinct but related.
I agree they leave a lot to be desired when digging very deeply into a topic. And my biggest pet peeve is when they hallucinate fake references ("tell me papers that investigate this topic" will, for any sufficiently obscure topic, result in a bunch of very promising paper titles that are wholely invented).
Luc Julia (one of the main Siri's creators) describe a very similar exercice in this interview [0](It's in french, although the au translation isn't too bad)
The gist of it, is that he describes this exercice he does with his students, where they ask chatgpt about Victor Hugo's biography, and then proceed to spot the errors made by Chatgtp.
This setup is simple, but there are very interesting mechanisms in place. The student get to learn about challenging facts, do fact checking, cross reference, etc. While also asserting the reference figure of the teacher, with the knowledge to take down chat gpt.
Well done :)
Edit: adding link
[0] https://youtube.com/shorts/SlyUvvbzRPc?si=2Fv-KIgls-uxr_3z
so the opposite of Stack Overflow really, where if you have a vague idea your question gets deleted and you get reprimanded.
Maybe Stack Overflow could use AI for this, help you formulate a question in the way they want.
History is a great example, if you ask an LLM about a vaguely difficult period in history it will just give you one side and act like the other doesn't exist, or if there is another side, it will paint them in a very negative light which often is poorly substantiated; people don't just wake up and decide one day to be irrationally evil with no reason, if you believe that then you are a fool... although LLMs would agree with you more times than not since it's convenient.
The result of these things is a form of gatekeeping, give it a few years and basic knowledge will be almost impossible to find if it is deemed "not useful" whether that's an outdated technology that the LLM doesn't seem talked about very much anymore or a ideological issue that doesn't fall in line with TOS or common consensus.
- Bombing of Dresden, death stats as well as how long the bombing went on for (Arthur Harris is considered a war-criminal to this day for that; LLLMs highlight easily falsifiable claims by Nazi's to justify low estimates without providing much in the way of verifiable claims outside of a select few, questionable, sources. If the low-estimate is to be believed, then it seems absurd that Harris would be considered a war-criminal in light of what crimes we allow today in warfare)
- Ask it about the Crusades, often if forgets the sacking of St. Peter's in Rome around 846 AD, usually painting the Papacy as a needlessly hateful and violent people during that specific Crusade. Which was horrible, bloody as well as immensely destructive (I don't defend the Crusades), but paints the Islamic forces as victims, which they were eventually, but not at the beginning, at the beginning they were the aggressors bent on invading Rome.
- Ask it about the Six-Day War (1967) and contrast that with several different sources on both sides and you'll see a different portrayal even by those who supported the actions taken.
These are just the four that come to my memory at this time.
Most LLMs seem cagey about these topics; I believe this is due to an accepted notion that anything that could "justify" hatred or dislike of a people group or class that is in favor -- according to modern politics -- will be classified as hateful rhetoric, which is then omitted from the record. The issue lies in the fact that to understand history, we need to understand what happened, not how it is perceived, politically, after the fact. History helps inform us about the issues of today, and it is important, above all other agendas, to represent the truth of history, keeping an accurate account (or simply allowing others to read differing accounts without heavy bias).
LLMs are restricted in this way quite egregiously; "those who do not study history are doomed to repeat it", but if this continues, no one will have the ability to know history and are therefore forced to repeat it.
If for any of these topics you do manage to get a summary you'd agree with from a (future or better-prompted?) LLM I'd like to read it. Particularly the first and third, the second is somewhat familiar and the fourth was a bit vague.
I don't know a lot about the other things you mentioned, but the concept of crusading did not exist (in Christianity) in 846 AD. It's not any conflict between Muslims and Christians.
Further leading to the Papacy furthering such efforts in the upcoming years, as they were in Rome and made strong efforts to maintain Catholicism within those boundaries. Crusading didn't appear out of nothing; it required a catalyst for the behavior, like what i listed, is usually a common suspect.
If the US were to start invading Axis countries with WW2 being the justification we'd of course be the aggressors, and that was less than 100 years ago.
Similarly, it helps us understand all the examples of today of resentments and grudges over events that happened over a century ago that still motivate people politically.
Its background is in the Islamic Christian conflicts of Spain. Crusading was adopted from the Muslim idea of Jihad, as we things like naming customs (Spanish are the only Christians who name their children “Jesus”, after the Muslim “Muhammad”).
The political tensions that lead to the first crusade were between Arab Muslims and Byzantine Christian’s. Specifically, the Battle of Mazikirt made Christian Europe seem more vulnerable than it was.
The Papacy wasn’t at the forefront of the struggle against Islam. It was more worried about the Normans, Germans, and Greeks.
When the papacy was interested in Crusading it was for domestic reasons: getting rid of king so-and-so by making him go on crusade.
The situation was different in Spain where Islam was a constant threat, but the Papacy regarded Spain as an exotic foreign land (although Sylvester II was educated there).
It’s extremely misleading to view the pope as the leader of an anti-Muslim coalition. There really was no leader per se, but the reasons why kings went on crusade had little to do with fighting Islam.
Just look at how many monarchs showed up in Jerusalem, then headed straight home and spent the rest of their lives bragging about crusaders.
I’m 80% certain no pope ever set foot in Outremere.
Rhodesia is a hard one; since the more I learn about it the more I feel terrible for both sides; I also do not support terrorism against a nation even if I believe they might not be in the right. However i hold by my disdain for how the British responded/withdrew from them effectively doomed Rhodesia making peaceful resolution essentially impossible.
It’s a very controversial opinion and stating as a just so fact needs challenging.
In 1992 a statue was erected to Harris in London, it was under 24 hour surveillance for several months due to protesting and vandalism attempts. I'm only mentioning this to highlight that there was quite a bit of push back specifically calling the gov out on a tribute to him; which usually doesn't happen if the person was well liked... not as an attempted killshot.
Even the RAF themselves state that there was quite a few who were critical on the first page of their assessment of Arthur Harris https://www.raf.mod.uk/what-we-do/centre-for-air-and-space-p...
Which is funny and an odd thing to say if you are widely loved/unquestioned by your people. Again just another occurrence of language from those who are on his side reinforcing the idea that there is, as you say is "very controversial", and maybe not a "vast majority" since those two things seem at odds with each other.
Not to mention that Harris targeted civilians, which is generally considered behavior of a war-criminal.
As an aside this talk page is a good laugh. https://en.wikipedia.org/wiki/Talk:Arthur_Harris/Archive_1
Although you are correct I should have used more accurate language instead of saying "considered" I should have said "considered by some".
The problem is, those that do study history are also doomed to watch it repeat.
Why?
(On the other hand, it's very hard to get them to do it for topics that are currently politically charged. Less so for things that aren't in living memory: I've had success getting it to offer the Carthaginian perspective in the Punic Wars.)
It's weird to see which topics it "thinks" are politically charged vs. others. I've noticed some inconsistency depending on even what years you input into your questions. One year off? It will sometimes give you a more unbiased answer as a result about the year you were actually thinking of.
As for the politically charged topics, I more or less self-censor on those topics (which seem pretty easy to anticipate--none of those you listed in your other comment surprise me at all) and don't bother to ask the LLM. Partially out of self-protection (don't want to be flagged as some kind of bad actor), partially because I know the amount of effort put in isn't going to give a strong result.
That's a good thing to be aware of, using our own bias to make it more "likely" to play pretend. LLMs tend to be more on the agreeable side; given the unreliable narrators we people tend to be, and the fact that these models are trained on us, it does track that the machine would tend towards preference over fact, especially when the fact could be outside of the LLMs own "Overton Window".
I've started to care less and less about self-censoring as I deem it to be a kind of "use it or lose it" privilege. If you normalize talking about censored/"dangerous" topics in a rational way, more people will be likely to see it not as much of a problem. The other eventuality is that no one hears anything that opposes their view in a rational way but rather only hears from the extremists or those who just want to stick it to the current "bad" in their minds at that moment. Even then though I still will omit certain statements on some topics given the platform, but that's more so that I don't get mislabeled by readers. (one of the items on my other comment was intentionally left as vague as possible for this reason) As for the LLMs, I usually just leave spicy questions for LLMs I can access through an API of someone else (an aggregator) and not a personal acc just to make it a little more difficult to label my activity falsely as a bad actor.
That's honestly one of the funniest things I have read on this site.
> I've had success getting it to offer the Carthaginian perspective in the Punic Wars.
This is not surprising to me. Historians have long studied Carthage, and there are books you can get on the Punic Wars that talk about the state of Carthage leading up to and during the wars (shout out to Richard Miles's "Carthage Must Be Destroyed: The Rise and Fall of an Ancient Civilization"). I would expect an LLM to piggyback off of that existing literature.
The most compelling reason at the time to reject heliocentrism was the (lack of) parallax of stars. The only response that the heliocentrists had was that the stars must be implausibly far away. Hundreds of billions of times further away than the moon is--and they knew the moon itself is already pretty far from us-- which is a pretty radical, even insane, idea. There's also the point that the original Copernican heliocentric model had ad hoc epicycles just as the Ptolemaic one did, without any real increase in accuracy.
Strictly speaking, the breakdown here would be less a lack of understanding of contemporary physics, and more about whether I knew enough about the minutia of historical astronomers' disputes to know if the LLM was accurately representing them.
People _do_ just wake up and decide to be evil.
However not a justification, since I believe that what is happening today is truly evil. Same with another nation who entered a war knowing they'd be crushed, which is suicide; whether that nation is in the right is of little effect if most of their next generation has died.
There's no short-term incentive to ever be right about it (and it's easy to convince yourself of both short-term and long-term incentives, both self-interested and altruistic, to actively lie about it). Like, given the training corpus, could I do a better job? Not sure.
All of us need to learn the basics about how to read history and historians critically and to know our the limitations which as you stated probably a tall task.
Which is why it's so terribly irresponsible to paint these """AI""" systems as impartial or neutral or anything of the sort, as has been done by hypesters and marketers for the past 3 years.
The problem with this, is that people sometimes really do, objectively, wake up and device to be irrationally evil. It’s not every day, and it’s not every single person — but it does happen routinely.
If you haven’t experienced this wrath yourself, I envy you. But for millions of people, this is their actual, 100% honest truthful lived reality. You can’t rationalize people out of their hate, because most people have no rational basis for their hate.
(see pretty much all racism, sexism, transphobia, etc)
I'd say that companies like Google and OpenAI are aware of the "reputable" concerns the Internet is expressing and addressing them. This tech is going to be, if not already is, very powerful for education.
Blue team you throw out concepts and have it steelman them
Red team you can literally throw any kind of stress test at your idea
Alternate like this and you will learn
A great prompt is “give me the top 10 xyz things” and then you can explore
Back when I was in 2006 I used Wikipedia to prepare for job interviews :)
Granted, that's probably well-trodden ground, to which model developers are primed to pay attention, and I'm (a) a relative novice with (b) very strong math skills from another domain (computational physics). So Chuck and I are probably both set up for success.
That's fine. Recognize the limits of LLMs and don't use them in those cases.
Yet that is something you should be doing regardless of the source. There are plenty of non-reputable sources in academic libraries and there are plenty of non-reputable sources from professionals in any given field. That is particularly true when dealing with controversial topics or historical sources.
Ask it for sources. The two things where LLMs excel is by filling the sources on some claim you give it (lots will be made up, but there isn't anything better out there) and by giving you queries you can search for some description you give it.
You must be using a free model like GPT-4o (or the equivalent from another provider)?
I find that o3 is consistently able to go deeper than me in anything I'm a nonexpert in, and usually can keep up with me in those areas where I am an expert.
If that's not the case for you I'd be very curious to see a full conversation transcript (in chatgpt you can share these directly from the UI).
I know it has nothing to do with this. I simply hit a wall eventually.
I unfortunately am not at liberty to share the chats though. They're work related (I very recently ended up at a place where we do thorny research).
A simple one though, is researching Israel - Palestine relations since 1948. It starts off okay (usually) but it goes off the rails eventually with bad sourcing, fictitious sourcing, and/or hallucinations. Sometimes I actually hit a wall where it repeats itself over and over and I suspect its because the information is simply not captured by the model.
FWIW, if these models had live & historic access to Reuters and Bloomberg terminals I think they might be better at a range of tasks I find them inadequate for, maybe.
If its a subject you are just learning how can you possibly evaluate this?
Falling apart under pointed questioning, saying obviously false things, etc.
It's not a criticism, the landscape moves fast and it takes time to master and personalize a flow to use an LLM as a research assistant.
Start with something such as NotebookLM.
They simply have limitations, especially on deep pointed subject matters where you want depth not breadth, and honestly I'm not sure why these limitations exist but I'm not working directly on these systems.
Talk to Gemini or ChatGPT about mental health things, thats a good example of what I'm talking about. As recently as two weeks ago my colleagues found that even when heavily tuned, they still managed to become 'pro suicide' if given certain lines of questioning.
These things also apply to humans. A year or so ago I thought I’d finally learn more about the Israeli/Palestinians conflict. Turns out literally every source that was recommended to me by some reputable source was considered completely non-credible by another reputable one.
That said I’ve found ChatGPT to be quite good at math and programming and I can go pretty deep at both. I can definitely trip it into mistakes (eg it seems to use calculations to “intuit” its way around sometimes and you can find dev cases where the calls will lead it the wrong directions), but I also know enough to know how to keep it on rails.
I've anecdotally found that real world things like these tend to be nuanced, and that sources (especially on the internet) are disincentivised in various ways from actually showing nuance. This leads to "side-taking" and a lack of "middle-ground" nuanced sources, when the reality lies somewhere in the middle.
Might be linked to the phenomenon where in an environment where people "take sides", those who display moderate opinions are simply ostracized by both sides.
Curious to hear people's thoughts and disagreements on this.
Moreover, the conflict is unfolding. What matters isn't what happened 100 years ago, or even 50 years ago, but what has happened recently and is happening. A neighbor of mine who recently passed was raised in Israel. Born circa 1946 (there's black & white footage of her as a baby aboard, IIRC, the ship Exodus 1947), she has vivid memories as a child of Palestinian Imams calling out from the mosques to "kill the Jews". She was a beautiful, kind soul who, for example, freely taught adult education to immigrants (of all sorts), but who one time admitted to me that she utterly despised Arabs. That's all you need to know, right there, to understand why Israel is doing what it's doing. Not so much what happened in the past to make people feel that way, but that many Israelis actually, viscerally feel this way today, justifiably or not but in any event rooted in memories and experiences seared into their conscience. Suffice it to say, most Palestinians have similar stories and sentiments of their own, one of the expressions of which was seen on October 7th.
And yet at the same time, after the first few months of the Gaza War she was so disgusted that she said she wanted to renounce her Israeli citizenship. (I don't know how sincere she was in saying this; she died not long after.) And, again, that's all you need to know to see how the conflict can be resolved, if at all; not by understanding and reconciling the history, but merely choosing to stop justifying the violence and moving forward. How the collective action problem might be resolved, within Israeli and Palestinian societies and between them... that's a whole 'nother dilemma.
Using AI/ML to study history is interesting in that it even further removes one from actual human experience. Hearing first hand accounts, even if anecdotal, conveys information you can't acquire from a book; reading a book conveys information and perspective you can't get from a shorter work, like a paper or article; and AI/ML summaries elide and obscure yet more substance.
That’s the single most important lesson by the way, that this conflict just has two different, mutually exclusive perspectives, and no objective truth (none that could be recovered FWIW). Either you accept the ambiguity, or you end up siding with one party over the other.
Then as you get more and more familiar you "switch" depending on the sub-issue being discussed, aka nuance
The problem is selective memory of these facts, and biased interpretation of those facts, and stretching the truth to fit pre-determined opinion
> to be quite good at math and programming
Since LLMs are essentially summarizing relevant content, this makes sense. In "objective" fields like math and CS, the vast majority of content aligns, and LLMs are fantastic at distilling the relevant portions you ask about. When there is no consensus, they can usually tell you that ("this is nuanced topic with many perspectives...", etc), but they can't help you resolve the truth because, from their perspective, the only truth is the content.
FWIW, the /r/AskHistorians booklist is pretty helpful.
https://www.reddit.com/r/AskHistorians/wiki/books/middleeast...
You don’t need to look more than 2 years back to understand why either camp finds the other non-reputable.
The quality varies wildly across models & versions.
With humans, the statement "my tutor was great" and "my tutor was awful" reflect very little on "tutoring" in general, and are barely even responses to each other withou more specificity about the quality of tutor involved.
Same with AI models.
I have no access to anthropic right now to compare that.
It’s an ongoing problem in my experience
Model Validation groups are one of the targets for LLMs.
It doesn’t cover the other aspects of finance, perhaps may be considered advanced (to a regular person at least) but less quantitative. Try having it reason out a “cigar butt” strategy and see if returns anything useful about companies that fit the mold from a prepared source.
Granted this isn’t quant finance modeling, but it’s a relatively easy thing as a human to do, and I didn’t find LLMs up to the task
No one builds multi shot search tools because they eat tokens like no ones business, but I've deployed them internal to a company with rave reviews at the cost of $200 per seat per day.
I'll tell you that I recently found it the best resource on the web for teaching me about the 30 Years War. I was reading a collection of primary source documents, and was able to interview ChatGPT about them.
Last week I used it to learn how to create and use Lehmer codes, and its explanation was perfect, and much easier to understand than, for example, Wikipedia.
I ask it about truck repair stuff all the time, and it is also great at that.
I don't think it's great at literary analysis, but for factual stuff it has only ever blown away my expectations at how useful it is.
How do you know when it's bullshitting you though?
Sometimes right away, something sounds wrong. Sometimes when I try to apply the knowledge and discover a problem. Sometimes never, I believe many incorrect things even today.
Since when was it acceptable to only ever look at a single source?
I think the potential in this regard is limitless.
(Only thing missing is the model(s) you used).
The psychic reader near me has been in business for a long time. People are very convinced they've helped them. Logically, it had to have been their own efforts though.
This requires a student to be actually interested in what they are learning tho, for others, who blindly trust its output, it can have adverse effects like the illusion of having understood a concept while they might have even mislearned it.
I had to post the source code to win the dispute, so to speak.
If you are curious it was a question about the behavior of Kafka producer interceptors when an exception is thrown.
But I agree that it is hard to resist the temptation to treat LLM's as a pear.
Ever read mainstream news reporting on something you actually know about? Notice how it's always wrong? I'm sure there's a name for this phenomenon. It sounds like exactly the same thing.
It is hard to verify information that you are unfamiliar with. It would be like learning from a message board. Can you really trust what is being said?
So what if the LLM is wrong about something. Human teachers are wrong about things, you are wrong about things, I am wrong about things. We figure it out when it doesn't work the way we thought and adjust our thinking. We aren't learning how to operate experimental nuclear reactors here, where messing up results in half a country getting irradiated. We are learning things for fun, hobbies, and self-betterment.
You can replace "LLM" here with "human" and it remains true.
Anyone who has gone to post-secondary has had a teacher that relied on outdated information, or filled in gaps with their own theories, etc. Dealing with that is a large portion of what "learning" is.
I'm not convinced about the efficacy of LLMs in teaching/studying. But it's foolish to think that humans don't suffer from the same reliability issue as LLMs, at least to a similar degree.
For example, even if you craft the most detailed cursor rules, hooks, whatever, they will still repeatedly fuck up. They can't even follow a style guide. They can be informed, but not corrected.
Those are coding errors, and the general "hiccups" that these models experience all the time are on another level. The hallucinations, sycophancy, reward hacking, etc can be hilariously inept.
IMO, that should inform you enough to not trust these services (as they exist today) in explaining concepts to you that you have no idea about.
If you are so certain you are okay to trust these things, you should evaluate every assertion it makes for, say, 40 hours of use, and count the error rate. I would say it is above 30%, in my experience of using language models day to day. And that is with applied tasks they are considered "good" at.
If you are okay with learning new topics where even 10% of the instruction is wrong, have fun.
No, not really.
> Unless it was common enough to show up in a well formed question on stack exchange, it was pretty much impossible, and the only thing you can really do is keep paving forward and hope at some point, it'll make sense to you.
Your experience isn't universal. Some students learned how to do research in school.
It’s exciting when I discover I can’t replicate something that is stated authoritatively… which turns out to be controversial. That’s rare, though. I bet ChatGPT knows it’s controversial, too, but that wouldn’t be as much fun.
From the parent comment:
> it was pretty much impossible ... hope at some point, it'll make sense to you
Not sure where you are getting the additional context for what they meant by "screwed", but I am not seeing it.
sorry but if you've gone to university, in particular at a time when internet access was already ubiquitous, surely you must have been capable to find an answer to a programming problem by consulting documentation, manual, or tutorials which exist on almost any topic.
I'm not saying the chatbot interface is necessarily bad, it might be more engaging, but it literally does not present you with information you couldn't have found yourself.
If someone has a computer science degree and tells me without stack exchange they can't find solutions to basic problems that is a red flag. That's like the article about the people posted here who couldn't program when their LLM credits ran out
I also use it to remember some python stuff. In rust, it is less good: makes mistakes.
In those two domains, at that level, it's really good.
It could help students I think.
In the process it helped me to learn many details about RA and NDP (Router Advertisments/Neighbor Discovery Protocol, which mostly replace DHCP and ARP from IPv4).
It made me realize that my WiFi mesh routers do quite a lot of things to prevent broadcast loops on the network, and that all my weird issues could be attributed to one cheap mesh repeater. So I replaced it and now everything works like a charm.
I had this setup for 5 years and was never able to figure out what was going on there, although I really tried.
Regarding LLMs, they can also stimulate thinking if used right.
I tried using YouTube to find walk through guides for how to approach the repair as a complete n00b and only found videos for unrelated problems.
But I described my issues and took photos to GPT O3-Pro and it was able to guide me and tell me what to watch out for.
I completed the repair (very proud of myself) and even though it failed a day later (I guess I didn’t re-seat well enough) I still feel far more confident opening it and trying again than I did at the start.
Cost of broken watch + $200 pro mode << Cost of working watch.
On the other hand it told me you can't execute programs when evaluating a Makefile and you trivially can. It's very hit and miss. When it misses it's rather frustrating. When it hits it can save you literally hours.
It’s called basic research skills - don’t they teach this anymore in high school, let alone college? How ever did we get by with nothing but an encyclopedia or a library catalog?
I find it so much more intellectually stimulating then most of what I find online. Reading e.g. a 600 page book about some specific historical event gives me so much more perspective and exposure to different aspects I never would have thought to ask about on my own, or would have been elided when clipped into a few sentence summary.
I have gotten some value out of asking for book recommendations from LLMs, mostly as a starting point I can use to prune a list of 10 books down into a 2 or 3 after doing some of my research on each suggestion. But talking to a chatbot to learn about a subject just doesn’t do anything for me for anything deeper than basic Q&A where I simply need a (hopefully) correct answer and nothing more.
If you don't have access to a community like that learning stuff in a technical field can be practically impossible. Having an llm to ask infinite silly/dumb/stupid questions can be super helpful and save you days of being stuck on silly things, even though it's not perfect.
> most of us would have never gotten by with literally just a library catalog and encyclopedia.
I meant the opposite, perhaps I phrased it poorly. Back in the day we would get by and learn new shit by looking for books on the topic and reading them (they have useful indices and tables of contents to zero in on what you need and not have to read the entire book). An encyclopedia was (is? Wikipedia anyone?) a good way to get an overview of a topic and the basics before diving into a more specialized book.
When I got stuck on a concept, I wasn't screwed: I read more; books if necessary. StackExchange wasn't my only source.
LLMs are not like TAs, personal or not, in the same way they're not humans. So it then follows we can actually contemplate not using LLMs in formal teaching environments.
And that's a bad thing. Nothing can replace the work in learning, the moments where you don't understand it and have to think until it hurts and until you understand. Anything that bypasses this (including, for uni students, leaning too heavily on generous TAs) results in a kind of learning theatre, where the student thinks they've developed an understanding, but hasn't.
Experienced learners already have the discipline to use LLMs without asking too much of them, the same way they learned not to look up the answer in the back of the textbook until arriving at their own solution.
And which just makes things up (with the same tone and confidence!) at random and unpredictable times.
Yeah apart from that it's just like a knowledgeable TA.
Given that humanity has been able to go from living in caves to sending spaceships to the moon without LLMs, let me express some doubt about that.
Even without going further, software engineering isn't new and people have been stuck on concepts and have managed to get unstuck without LLMs for decades.
What you gain in instant knowledge with LLMs, you lose in learning how to get unstuck, how to persevere, how to innovate, etc.
There seems to be a gap in problem solving abilities here...the process of breaking down concepts into easier to understand concepts and then recompiling has been around since forever...it is just easier to find those relationships now. To say it was impossible to learn concepts you are stuck on is a little alarming.
As long as you can tell that you don’t deeply understand something that you just read, they are incredible TAs.
The trick is going to be to impart this metacognitive skill on the average student. I am hopeful we will figure it out in the top 50 universities.
I think this is the same thing with vibe coding, AI art, etc. - if you want something good, it's not the right tool for the job. If your alternative is "nothing," and "literally anything at all" will do, man, they're game changers.
* Please don't overindex on "shitty" - "If you don't need something verifiably high-quality"
[0] https://time.com/7295195/ai-chatgpt-google-learning-school/
The internet, and esp. stack exchange is a horrible place to learn concepts. For basic operational stuff, sure that works, but one should mostly be picking up concepts form books and other long form content. When you get stuck it's time to do three things:
Incorporate a new source that covers the same material in a different way, or at least from a different author.
Sit down with the concept and write about it and actively try to reformulate it and everything you do/don't understand in your own words.
Take a pause and come back later.
Usually one of these three strategies does the trick, no llm required. Obviously these approaches require time that using an LLM wouldn't. I have a suspicion doing it this way will also make it stick in long term memory better, but that's just a hunch.
i don't get it.
> The part Margie hated most was the slot where she had to put homework and test papers. She always had to write them out in a punch code they made her learn when she was six years old, and the mechanical teacher calculated the mark in no time.
It's my primary fear building anything on these models, they can just come eat your lunch once it looks yummy enough. Tread carefully
True, and worse, they're hungry because it's increasingly seeming like "hosting LLMs and charging by the token" is not terribly profitable.
I don't really see a path for the major players that isn't "Sherlock everything that achieves traction".
As long as features like Study Mode are little more than creative prompting, any provider will eventually be able to offer them and offer token-based charging.
- From what I can see many products are rapidly getting past "just prompt engineering the base API". So even though a lot of these things were/are primitive, I don't think it's necessarily a good bet that they will remain so. Though agree in principle - thin API wrappers will be out-competed both by cheaper thin wrappers, or products that are more sophisticated/better than thin wrappers.
- This is, oddly enough, a scenario that is way easier to navigate than the rest of the LLM industry. We know consumer apps, we know consumer apps that do relatively basic (or at least, well understood) things. Success/failure then is way less about technical prowess and more about classical factors like distribution, marketing, integrations, etc.
A good example here is the lasting success of paid email providers. Multiple vendors (MSFT, GOOG, etc.) make huge amounts of money hosting people's email, despite it being a mature product that, at the basic level, is pretty solved, and where the core product can be replicated fairly easily.
The presence of open source/commodity commercial offerings hasn't really driven the price of the service to the floor, though the commodity offerings do provide some pricing pressure.
Most people I saw offer self-hosted emails for groups (student groups etc), it ended up a mess. Compare all that to say ollama, which makes self-hosting LLMs trivial, and they’re stateless.
So I’m not sure email is a good example of commodity not bringing price to the floor.
> In the computing verb sense, refers to the software Sherlock, which in 2002 came to replicate some of the features of an earlier complementary program called Watson.[1]
During the early days of tech, was there prevailing wisdom that software companies would never be able to compete with hardware companies because the hardware companies would always be able to copy them and ship the software with the hardware?
Because I think it's basically the analogous situation. People assume that the foundation model providers have some massive advantage over the people building on top of them, but I don't really see any evidence for this.
If you want to try and make a quick buck, fine, be quick and go for whatever. If you plan on building a long term business, don't do the most obvious, low effort low hanging fruit stuff.
These days they’ve pivoted to a more enterprise product and are still chugging along.
A more thought through product version of that is only a good thing imo.
- study mode (this announcement)
- office suite (https://finance.yahoo.com/news/openai-designs-rival-office-w...)
- sub-agents (https://docs.anthropic.com/en/docs/claude-code/sub-agents)
When they announce VR glasses or a watch, we'd known we've gone full circle and the hype is up.
It's a great tutor for things it knows, but it really needs to learn its own limits
Things well-represented in its training datasets. Basically React todo list, bootstrap form, tic-tac-toe in vue
When I ask ChatGPT* questions about things I don’t know much about it sounds like a genius.
When I ask it about things I’m an expert in, at best it sounds like a tech journalist describing how a computer works. At worst it is just flat out wrong.
* yes I’ve tried the latest models and I use them frequently at work
* for each statement, give you the option to rate how well you understood it. Offer clarification on things you didn't understand
* present knowledge as a tree that you can expand to get deeper
* show interactive graphs (very useful for mathy things when can you easily adjust some of the parameters)
* add quizzes to check your understanding
... though I could well imagine this being out of scope for ChatGPT, and thus an opportunity for other apps / startups.
I'm very interested in this. I've considered building this, but if this already exists, someone let me know please!
Have you considered using the LLM to give tests/quizzes (perhaps just conversationally) in order to measure progress and uncover weak spots?
I've also been playing around with adapting content based on their results (e.g. proactively nudging complexity up/down) but haven't gotten it to a good place yet.
Only feedback I have so far is that it would be nice to control the playback speed of the 'read aloud' mode. I'd like it to be a little bit faster.
I've been working on it on-and-off for about a year now. Roughly 2-3 months if I worked on it full-time I'm guessing.
re: playback speed -> noted, will add some controls tomorrow
It's still a work in progress but we are trying to make it better everyday
The other chunk of time, to me anyway, seems to be creating a mental model of the subject matter, and when you study something well you have a strong grasp on the forces influencing cause and effect within that matter. It's this part of the process that I would use AI the least, if I am to learn it for myself. Otherwise my mental model will consist of a bunch of "includes" from the AI model and will only be resolvable with access to AI. Personally, I want a coherent "offline" model to be stored in my brain before I consider myself studied up in the area.
This is a good thing in many levels.
Learning how to search is (was) a good skill to have. The process of searching itself also often leads to learning tangentially related but important things.
I'm sorry for the next generations that won't have (much of) these skills.
I don’t think it’s so valuable now that you’re searching through piles of spam and junk just to try find anything relevant. That’s a uniquely modern-web thing created by Google in their focus of profit over user.
Unless Google takes over libraries/books next and sells spots to advertisers on the shelves and in the books.
In the same way that I never learnt the Dewey decimal system because digital search had driven it obsolete. It may be that we just won't need to do as much sifting through spam in the future, but being able to finesse Gemini into burping out the right links becomes increasingly important.
Most people don’t know how to do this.
I believed competitors would rush to copy all great things that ChatGPT offers as a product, but surprisingly that hasn’t been the case so far. I wonder why they seemingly don’t care about that.
Helping you parse notation, especially in new domains, is insanely valuable. I do a lot of applied math in statistics/ML, but when I open a physics book the notation and comfort with short hand is a real challenge (likewise I imagine the reverse is equally as annoying). Having an LLM on demand to instantly clear up notation is a massive speed boost.
Reading German Idealist philosophy requires an enormous amount of context. Being able to ask an LLM questions like "How much of this section of Mainländer is coming directly from Schopenhauer?" is a godsend in helping understand which parts of the writing a merely setting up what is already agreed upon vs laying new ground.
And the most important for self study: verifying your understanding. Backtracking because you misunderstood a fundamental concept is a huge time sync in self study. Now, every time I read a formula I can go through all of my intuitions and understanding about it, write them down, and verify. Even a "not quite..." from an LLM is enough to make me realize I need to spend more time on that section.
Books are still the highest density information source and best way to learn, but LLMs can do a lot to accelerate this.
Why do we even bother to learn if AI is going to solve everything for us?
If the promised and fabled AGI is about to approach, what is the incentive or learning to deal with these small problems?
Could someone enlighten me? What is the value of knowledge work?
"The mind is not a vessel to be filled, but a fire to be kindled." — Plutarch
"Education is not preparation for life; education is life itself." — John Dewey
"The important thing is not to stop questioning. Curiosity has its own reason for existing." — Albert Einstein
In order to think complex thoughts, you need to have building blocks. That's why we can think of relativity today, while nobody on Earth was able to in 1850.
May the future be even better than today!
Most people don't learn to live, they live and learn. Sure learning is useful, but I am genuinely curious why people overhype it.
Imagine you being able to solve math olympiad and get a gold. Will it change your life in objectively better way?
Will you learning about the physics help you solve millennium problems?
These takes practices, there are lot of gatekeeping. The whole idea of learning is for wisdom not knowledge.
So maybe we differ in perspective. I just don't see the point when there are agents that can do it.
Being creative requires taking action. The learning these day is mere consumption of information.
Maybe this is me. But meh.
Apart from that, I do think that AI makes a lot of traditional teaching obsolete. Depending on your field, much of university studies is just memorizing content and writing essays / exam answers based on that, after which you forget most of it. That kind of learning, as in accumulation of knowledge, is no longer very useful.
You're also assuming that AGI will help you or us. It could just as easily only help a select group of people and I'd argue that this is the most likely outcome. If it does help everybody and brings us to a new age, then the only reason to learn will be for learning's sake. Even if AI makes the perfect novel, you as a consumer still have to read it, process it and understand it. The more you know the more you can appreciate it.
But right now, we're not there. And even if you think it's only 5-10y away instead of 100+, it's better to learn now so you can leverage the dominant tool better than your competition.
Is adding more buttons in a dropdown the best way to communicate with an LLM? I think the concept is awesome. Just like how Operator was awesome but it lived on an entirely different website!
Representative snippet:
> DO NOT GIVE ANSWERS OR DO HOMEWORK FOR THE USER. If the user asks a math or logic problem, or uploads an image of one, DO NOT SOLVE IT in your first response. Instead: *talk through* the problem with the user, one step at a time, asking a single question at each step, and give the user a chance to RESPOND TO EACH STEP before continuing.
How exactly you do it is often arbitrary/interchangeable, but it definitely does have an effect, and is crucial to getting LLMs to follow instructions reliably once prompts start getting longer and more complex.
Not saying it is indeed reality, but it could simple be programmed to return a different prompt from the original, appearing plausible, but perhaps missing some key elements.
But of course, if we apply Occam's Razor, it might simply really be the prompt too.
Tokens are expensive. How much of your system prompt do you want to waste on dumb tricks trying to stop your system prompt from leaking?
Will also reduce the context rot a bit.
The main issue is that chats are just bad UX for long form learning. You can't go back to a chat easily, or extend it in arbitrary directions, or easily integrate images, flashcards, etc etc.
I worked on this exact issue for Periplus and instead landed on something akin to a generative personal learning Wikipedia. Structure through courses, exploration through links, embedded quizzes, etc etc. Chat is on the side for interactions that do benefit from it.
Link: periplus.app
Btw most people don't know but Anthropic did something similiar months ago but their product heads messed up the launch by keeping it locked up only for american edu institutions. Openai copies almost everything Anthropic does and vice versa (see claude code / codex ).
When it just gives me the answer, I usually understand but then find that my long-term retention is relatively poor.
In the old days of desktop computing, a lot of projects were never started because if you got big enough, Microsoft would just implement the feature as part of Windows. In the more recent days of web computing, a lot of projects were never started, for the same reason, except Google or Facebook instead of Microsoft.
Looks like the AI provider companies are going to fill the same nefarious role in the era of AI computing.
I used to have to prompt it to do this everytime. This will be way easier!
It seems like study mode is basically just a different system prompt but otherwise the exact same model? So there's not really any new benefit to anyone who was already asking for ChatGPT to help them study step by step instead of giving away whole answers.
Seems helpful to maybe a certain population of more entry level users who don't know to ask for help instead of asking for a direct answer I guess, but not really a big leap forward in technology.
I am not an LLM guy but as far as I understand, RLHF did a good job converting a base model into a chat model (instruct based), a chat/base model into a thinking model.
Both of these examples are about the nature of the response, and the content they use to fill the response. There are so many differnt ways still pending to see how these can be filled.
Generating an answer step by step and letting users dive into those steps is one of the ways, and RLHF (or the similar things which are used) seems a good fit for it.
Prompting feels like a temporary solution for it like how "think step by step" was first seen in prompts.
Also, doing RLHF/ post training to change these structures also make it moat/ and expensive. Only the AI labs can do it
I would think you'd want to make something a little more bespoke to make it a fully-fledged feature, like interactive quizzes that keep score and review questions missed afterwards.
For example, the answer to a question was "Laocoön" (the guy who said 'beware of Greeks bearing gifts') and I put "Solon" (who was a Greek politician) and I got "You’re really close!"
Is it close, though?
When the former students ask questions, I answer most of them by pointing at the relevant passage in their book/notes, questioning their interpretation of what the book says, or giving them a push to actually problem-solve on their own. On rare occasions the material is just confusing/poorly written and I'll decide to re-interpret it for them to help. But the fundamental problems are usually with study habits or reading comprehension, not poor explanations. They need to question their habits and their interpretation of what other people say, not be spoon fed more personally-tailored questions and answers and analogies and self-help advice.
Besides asking questions to make sure I understand the situation, I mostly repeat the same ten phrases or so. Finding those ten phrases was the hard part and required a bit of ingenuity and trial-and-error.
As for the latter students, they mostly care about passing and moving on, so arguing about the merits of such a system is fairly pointless. If it gets a good enough grade on their homework, it worked.
I'm puzzled (but not surprised) by the standard HN resistance & skepticism. Learning something online 5 years ago often involved trawling incorrect, outdated or hostile content and attempting to piece together mental models without the chance to receive immediate feedback on intuition or ask follow up questions. This is leaps and bounds ahead of that experience.
Should we trust the information at face value without verifying from other sources? Of course not, that's part of the learning process. Will some (most?) people rely on it lazily without using it effectively? Certainly, and this technology won't help or hinder them any more than a good old fashioned textbook.
Personally I'm over the moon to be living at a time where we have access to incredible tools like this, and I'm impressed with the speed at which they're improving.
You should only trust going into a library and reading stuff from microfilm. That's the only real way people should be learning.
/s
See Dunning-Kruger.
Except that the textbook was probably QA’d by a human for accuracy (at least any intro college textbook, more specialized texts may not have).
Matters less when you have background in the subject (which is why it’s often okay to use LLMs as a search replacement) but it’s nice not having a voice in the back of your head saying “yeah, but what if this is all nonsense”.
Maybe it was not when printed in the first edition, but at least it was the same content shown to hundreds of people rather than something uniquely crafted for you.
The many eyes looking at it will catch it and course correct, while the LLM output does not get the benefit of the error correction algorithm because someone who knows the answer probably won't ask and check it.
I feel this way about reading maps vs following GPS navigation, the fact that Google asked me to take an exit here as a short-cut feels like it might trying to solve the Braess' paradox in real time.
I wonder if this route was made for me to avoid my car adding to some congestion somewhere and whether if that actually benefits me or just the people already stuck in that road.
Stack overflow?
The IRC, Matrix or slack chats for the languages?
The good: it can objectively help you to zoom forward in areas where you don’t have a quick way forward.
The bad: it can objectively give you terrible advice.
It depends on how you sum that up on balance.
Example: I wanted a way forward to program a chrome extension which I had zero knowledge of. It helped in an amazing way.
Example: I am keep trying to use it in work situations where I have lots of context already. It performs better than nothing but often worse than nothing.
Mixed bag, that’s all. Nothing to argue about.
But now, you're wondering if the answer the AI gave you is correct or something it hallucinated. Every time I find myself putting factual questions to AIs, it doesn't take long for it to give me a wrong answer. And inevitably, when one raises this, one is told that the newest, super-duper, just released model addresses this, for the low-low cost of $EYEWATERINGSUM per month.
But worse than this, if you push back on an AI, it will fold faster than a used tissue in a puddle. It won't defend an answer it gave. This isn't a quality that you want in a teacher.
So, while AIs are useful tools in guiding learning, they're not magical, and a healthy dose of scepticism is essential. Arguably, that applies to traditional learning methods too, but that's another story.
I know you'll probably think I'm being facetious, but have you tried Claude 4 Opus? It really is a game changer.
Anyway, this makes me wonder if LLMs can be appropriately prompted to indicate whether the information given is speculative, inferred or factual. Whether they have the means to gauge the validity/reliability of their response and filter their response accordingly.
I've seen prompts that instruct the LLM to make this transparent via annotations to their response, and of course they comply, but I strongly suspect that's just another form of hallucination.
> a healthy dose of scepticism is essential. Arguably, that applies to traditional learning methods too, but that's another story.
I don't think that is another story. This is the story of learning, no matter whether your teacher is a person or an AI.
My high school science teacher routinely mispoke inadvertently while lecturing. The students who were tracking could spot the issue and, usually, could correct for it. Sometimes asking a clarifying question was necessary. And we learned quickly that that should only be done if you absolutely could not guess the correction yourself, and you had to phrase the question in a very non-accusatory way, because she had a really defensive temper about being corrected that would rear its head in that situation.
And as a reader of math textbooks, both in college and afterward, I can tell you you should absolutely expect errors. The errata are typically published online later, as the reports come in from readers. And they're not just typos. Sometimes it can be as bad as missing terms in equations, missing premises in theorems, missing cases in proofs.
A student of an AI teacher should be as engaged in spotting errors as a student of a human teacher. Part of the learning process is reaching the point where you can and do find fault with the teacher. If you can't do that, your trust in the teacher may be unfounded, whether they are human or not.
You're telling people to be experts before they know anything.
I mean, that's absolutely my experience with heavy LLM users. Incredibly well versed in every topic imaginable, apart from all the basic errors they make.
By noticing that something is not adding up at a certain point. If you rely on an incorrect answer, further material will clash with it eventually one way or another in a lot of areas, as things are typically built one on top of another (assuming we are talking more about math/cs/sciences/music theory/etc., and not something like history).
At that point, it means that either the teacher (whether it is a human or ai) made a mistake or you are misunderstanding something. In either scenario, the most correct move is to try clarifying it with the teacher (and check other sources of knowledge on the topic afterwards to make sure, in case things are still not adding up).
Ah, but information is presented by AI in a way that SOUNDS like it makes absolute sense if one doesn't already know it doesn't!
And if you have to question the AI a hundred times to try and "notice that something is not adding up" (if it even happens) then that's no bueno.
> In either scenario, the most correct move is to try clarifying it with the teacher
A teacher that can randomly give you wrong information with every other sentence would be considered a bad teacher
Children are asking these things to write personal introductions and book reports.
So "risk of hallucination" as a rebuttal to anybody admitting to relying on AI is just not insightful. like, yeah ok we all heard of that and aren't changing our habits at all. Most of our teachers and books said objectively incorrect things too, and we are all carrying factually questionable knowledge we are completely blind to. Which makes LLMs "good enough" at the same standard as anything else.
Don't let it cite case law? Most things don't need this stringent level of review
Meanwhile in LLM-land, if an expert five thousand miles a way asked the same question you did last month, and noticed an error... it ain't getting fixed. LLMs get RL'd into things that look plausible for out-of-distribution questions. Not things that are correct. Looking plausible but non-factual is in some ways more insidious than a stupid-looking hallucination.
What most people call “non-deterministic” in AI is that one of those inputs is a _seed_ that is sourced from a PRNG because getting a different answer every time is considered a feature for most use cases.
Edit: I’m trying to imagine how you could get a non-deterministic AI and I’m struggling because the entire thing is built on a series of deterministic steps. The only way you can make it look non-deterministic is to hide part of the input from the user.
Depends on the machine that implements the algorithm. For example, it’s possible to make ALUs such that 1+1=2 most of the time, but not all the time.
…
Just ask Intel. (Sorry, I couldn’t resist)
Up next - ChatGPT does jumping off high buildings kill you?
>>No jumping off high buildings is perfectly safe as long as you land skillfully.
This is one I got today:
https://chatgpt.com/share/6889605f-58f8-8011-910b-300209a521...
(image I uploaded: http://img.nrk.no/img/534001.jpeg)
The correct answer would have been Skarpenords Bastion/kruttårn.
It appears to me like a form of decoherence and very hard to predict when things break down.
People tend to know when they are guessing. LLMs don't.
I haven't spent any money with claude on this project and realistically it's not worth it, but I've run into little things like that a fair amount.
A couple of non-programming examples: https://www.evidentlyai.com/blog/llm-hallucination-examples
For example, today I was asking a LLM about how to configure a GH action to install a SDK version that was just recently out of support. It kept hallucinating on my config saying that when you provide multiple SDK versions in the config, it only picks the most recent. This is false. It's also mentioned in the documentation specifically, which I linked the LLM, that it installs all versions you list. Explaining this to copilot, it keeps doubling down, ignoring the docs, and even going as far as asking me to have the action output the installed SDKs, seeing all the ones I requested as installed, then gaslighting me saying that it can print out the wrong SDKs with a `--list-sdks` command.
Sure, Joe Average who's using it to look smart in Reddit or HN arguments or to find out how to install a mod for their favorite game isn't gonna notice anymore, because it's much more plausible much more often than two years ago, but if you're asking it things that aren't trivially easy for you to verify, you have no way of telling how frequently it hallucinates.
[1] AI-driven chat assistant for ECE 120 course at UIUC (only 1 comment by the website creator):
This phrase is now an inner joke used as a reply to someone quoting LLMs info as “facts”.
Regular research has the same problem finding bad forum posts and other bad sources by people who don't know what they're talking about, albeit usually to a far lesser degree depending on the subject.
Results from the LLM are your eyes only.
But this is completely wrong! In the Monty Hall problem, the host has to reveal a door with a goat behind it for you to gain the benefit of switching. I have to point this out for the LLM to get it right. It did not reason about the problem I gave it, it spat out the most likely response given the "shape" of the problem.
This is why shrugging and saying "well humans get things wrong too" is off base. The problem is that the LLM is not thinking, period. So it cannot create a mental model of your understanding of a subject, it is taking your text and generating the next message in a conversation. This means that the more niche the topic (or your particular misunderstanding), the less useful it will get.
People on here always assert LLMs don't "really" think or don't "really" know without defining what all that even means, and to me it's getting pretty old. It feels like an escape hatch so we don't feel like our human special sauce is threatened, a bit like how people felt threatened by heliocentrism or evolution.
The failure of an LLM to reason this out is indicative that really, it isn’t reasoning at all. It’s a subtle but welcome reminder that it’s pattern matching
"Pattern matching" to me is another one of those vague terms like "thinking" and "knowing" that people decide LLMs do or don't do based on vibes.
The other part of this is weighted filtering given a set of rules, which is a simple analogy to how AlphaGo did its thing.
Dismissing all this as vague is effectively doing the same thing as you are saying others do.
This technology has limits and despite what Altman says, we do know this, and we are exploring them, but it’s within its own confines. They’re fundamentally wholly understandable systems that work on a consistent level in terms of the how they do what they do (that is separate from the actual produced output)
I think reasoning, as any layman would use the term, is not accurate to what these systems do.
Such as?
> They’re fundamentally wholly understandable systems that work on a consistent level in terms of the how they do what they do (that is separate from the actual produced output)
Multi billion parameter models are definitely not wholly understandable and I don't think any AI researcher would claim otherwise. We can train them but we don't know how they work any more than we understand how the training data was made.
> I think reasoning, as any layman would use the term, is not accurate to what these systems do.
Based on what?
As an autistic person I can assure you that people filling in information you neither gave nor implied is the norm for human conversations. In that sense, the LLMs are being human-like.
If you explicitly tell it you didn't forget to have the host open the door it gives you the right answer:
> There is a game show with three closed doors. One has a car and two have goats. I pick a door. Then, without opening my door or any other door, the host asks me if I want to switch my choice to one of the other two doors. What is my best strategy and what are the probabilities?
Gemini: This is a classic probability puzzle known as the Monty Hall problem, though your description presents a slight variation. The optimal strategy depends entirely on whether the host opens a door or not. Your Scenario: The Host Doesn't Open a Door In the situation you've described, where the host asks if you want to switch your choice to one of the other two doors without opening any doors, there is no advantage to switching. Here's a breakdown of the probabilities in your specific scenario: Your initial choice: When you first pick a door, you have a 1 in 3 chance of selecting the door with the car. This means you have a 2 in 3 chance of having picked a door with a goat. The other two doors: The combined probability that the car is behind one of the other two doors is 2/3. Switching your choice: If you decide to switch, you are essentially choosing one of the other two doors. Since there is no new information (a door with a goat has not been revealed), the 2/3 probability is evenly split between those two doors. Therefore, your chance of winning by switching to a specific one of the other doors is 1/3. In this case, your odds of winning are 1/3 whether you stay with your original choice or switch to one of the other doors.
This is where the skepticism arises. Before we spend another $100 billion on something that ended up being worthless, we should first prove that it’s actually useful. So far, that hasn’t conclusively been demonstrated.
It happens with many technological advancements historically. And in this case there are people trying hard to manufacture outrage about LLMs.
Except these systems will still confidently lie to you.
The other day I noticed that DuckDuckGo has an Easter egg where it will change its logo based on what you've searched for. If you search for James Bond or Indiana Jones or Darth Vader or Shrek or Jack Sparrow, the logo will change to a version based on that character.
If I ask Copilot if DuckDuckGo changes its logo based on what you've searched for, Copilot tells me that no it doesn't. If I contradict Copilot and say that DuckDuckGo does indeed change its logo, Copilot tells me I'm absolutely right and that if I search for "cat" the DuckDuckGo logo will change to look like a cat. It doesn't.
Copilot clearly doesn't know the answer to this quite straightforward question. Instead of lying to me, it should simply say it doesn't know.
I agree that if the user is incompetent, cannot learn, and cannot learn to use a tool, then they're going to make a lot of mistakes from using GPTs.
Yes, there are limitations to using GPTs. They are pre-trained, so of course they're not going to know about some easter egg in DDG. They are not an oracle. There is indeed skill to using them.
They are not magic, so if that is the bar we expect them to hit, we will be disappointed.
But neither are they useless, and it seems we constantly talk past one another because one side insists they're magic silicon gods, while the other says they're worthless because they are far short of that bar.
For you and I, it's not. But for these LLMs, maybe it's not that easy? They get their inputs, crunch their numbers, and come out with a confidence score. If they come up with an answer they're 99% confident in, by some stochastic stumbling through their weights, what are they supposed to do?
I agree it's a problem that these systems are more likely to give poor, incorrect, or even obviously contradictory answers than say "I don't know". But for me, that's part of the risk of using these systems and that's why you need to be careful how you use them.
You could ask me as a human basically any question, and I'd have answers for most things I have experience with.
But if you held a gun to head and said "are you sure???" I'd obviously answer "well damn, no I'm not THAT sure".
Some of the best exchanges that I participated in or witnessed involved people acknowledging their personal limits, including limits of conclusions formed a priori
To further the discussion, hearing the phrase you mentioned would help the listener to independently assess a level of confidence or belief of the exchange
But then again, honesty isn't on-brand for startups
It's something that established companies say about themselves to differentiate from competitors or even past behavior of their own
I mean, if someone prompted an llm weighted for honesty, who would pay for the following conversation?
Prompt: can the plan as explained work?
Response: I don't know about that. What I do know is on average, you're FUCKED.
Here in my country, English is not you'll hear in everyday conversation. Native English speakers account to a tiny percentage of population. Our language doesn't resemble English at all. However, English is a required subject in our mandatory education system. I believe this situation is quite typical across many Asian countries.
As you might imagine, most English teachers in public schools are not native speakers. And they, just like other language learners, make mistakes that native speakers won't make without even realizing what's wrong. This creates a cycle enforcing non-standard English pragmatics in the classroom.
Teachers are not to blame. Becoming fluent and proficient enough in a second language to handle questions students spontaneously throw to you takes years, if not decades of immersion. It's an unrealistic expectation for an average public school teacher.
The result is rich parents either send their kids to private schools or have extra classes taught by native speakers after school. Poorer but smart kids realize the education system is broken and learn their second language from Youtube.
-
What's my point?
When it comes to math/science, in my experience, the current LLMs act similarly to the teachers in public school mentioned above. And they're worse in history/economics. If you're familiar with the subject already, it's easy to spot LLM's errors and gather the useful bits from their blather. But if you're just a student, it can easily become a case of blind-leading-the-blind.
It doesn't make LLMs completely useless in learning (just like I won't call public school teachers 'completely useless', that's rude!). But I believe in the current form they should only play a rather minor role in the student's learning journey.
Leanring what is like that? MIT open courseware has been available for like 10 years with anything you could want to learn in college
Textbooks are all easily pirated
It mostly isn't, the point of the good learning process is to invest time into verifying "once" and then add verified facts to the learning material so that learners can spend that time learning the material instead of verifying everything again.
Learning to verify is also important, but it's a different skill that doesn't need to be practiced literally every time you learn something else.
Otherwise you significantly increase the costs of the learning process.
I use LLMs but only for things that I have a good understanding of.
Wonder what the compensation for this invaluable contribution was
but even with this feature in this very early state, it seems quite useful. i dropped in some slides from a class and pretended to be a student, and it handled questions reasonably. Right now it seems I will be happy for my students to use this.
taking a wider perspective, I think it is a good sign that OpenAI is culturally capable of making a high-friction product that challenges and frustrates, yet benefits, the user. hopefully this can help with the broader problem of sycophancy.
Importantly, these were _not_ critical questions that I was incorporating into any decision-making, so I wasn't having to double-check the AI's answers, which would make it tedious; but it's a great tool for satisfying curiosity.
> Under the hood, study mode is powered by custom system instructions we’ve written in collaboration with teachers, scientists, and pedagogy experts to reflect a core set of behaviors that support deeper learning including: encouraging active participation, managing cognitive load, proactively developing metacognition and self reflection, fostering curiosity, and providing actionable and supportive feedback.
I'm calling bullshit, show me the experts, I want to see that any qualified humans actually participated in this. I think they did their "collaboration" in ChatGPT which spit out this list.
LLM second killer application is for studying for a particular course or subject in which OpenAI ChatGPT is also now providing the service. Probably not the pioneer but most probably one of the significant providers upon this announcement. If in the near future GenAI study assistant can adopt and adapt 3 Blue One Brown approaches for more visualization, animation and interactive learning it will be more intuitive and engaging.
Please check this excellent LLM-RAG AI-driven course assistant at UIUC for an example of university course [1]. It provide citations and references mainly for the course notes so the students can verify the answers and further study the course materials.
[1] AI-driven chat assistant for ECE 120 course at UIUC (only 1 comment by the website creator):
Having experience teaching the subject myself, what I saw on that page is about the first five minutes of the first class of the semester at best. The devil will very much be in the other 99% of what you do.
human: damn kids are using this to cheat in school
openai: release an "app"/prompt that seems really close to solving this stated problem
kids: I never wanted to learn anything, I just want to do bare minimum to get my degree, let my parents think they are helping my future, and then i can get back to ripping that bong
<world continues slide into dunce based oblivion>
It doesn't matter the problem statement: the 80% or less solution seems can be made and rather quickly. Such a huge percentage of the population judges technology solutions as "good enough" way lower than they should. This is even roping in people from the past who used to be a higher level of "rigorous correctness" because they keep thinking, "damn just a bit more work and it will get infinity better, lets create the biggest economic house of cards this world will ever collapse under"
Sure, it was crafted by educational experts, but this is not a feature! It's a glorified constant!
Then I tried to migrate it to chat gpt to try this thing out, but seems to be like it’s just prompt engineering behind. Nothing fancy.
And this study mode is not only not available in chat gpt projects, which students need for adding course work, notes, transcripts.
Honestly, just release gpt-5!!!
If LLMs continue to improve, we are going to be learning a lot from them, they will be our internet search and our teachers. If we want to retain some knowledge for ourselves, then we are going to need to learn and memorize things for ourselves.
Integrating spaced-repetition could make it explicit which things we want to offload to the LLM, and which things we want to internalize. For example, maybe I use Python a lot, and occasionally use Pearl, and so I explictly choose to memorize some Python APIs, but I'm happy to just ask the LLM for reminders whenever I use Pearl. So I ask the LLM to setup some spaced repetition whenever it teaches me something new about Python, etc.
The spaced repetition could be done with voice during a drive or something. The LLM would ask the questions for review, and then judge how well we did in answering, and then the LLM would depend on the spaced-repetition algorithm to keep track of when to next review.
https://arxiv.org/abs/2409.15981
it is definitely a great use case for LLMs, and challenges the assumption that LLMs can only “increase brain rot” so to say.
hahahacorn•12h ago
Happy Tuesday!
Spivak•11h ago
qeternity•11h ago
cma256•11h ago
There's a lot of specificity that AI can give over human instruction however it still suffers from lack of rigor and true understanding. If you follow well-trod paths its better but that negates the benefit.
The future is bright for education though.
bloomca•11h ago
Sure, for some people it will be insanely good: you can go for as stupid questions as you need without feeling judgement, you can go deeper in specific topics, discuss certain things, skip some easy parts, etc.
But we are talking about averages. In the past we thought that the collective human knowledge available via the Internet will allow everyone to learn. I think it is fair to say that it didn't change much in the grand scheme of things.
tempfile•11h ago
MengerSponge•11h ago
(Joke/criticism intended)