When they’re here I’ll make an upvote farming bot that learns from experience how not to get caught and unleash it on HM.
After that I’ll make an agent that runs a SaaS company that learns from experience how to make money and I’ll finally be able to chill out and play video games.
That last thing I’ll actually do myself, I won’t use an agent, although the experience revolution stared with games. Ironic!
But I’ll make an agent that learns from experience what kind of games I like and how to make them. This way I’ll have an endless supply.
If they're not careful, they'll be sued for copyright violation of the Real Soon Now™ brand.
We just need it to get better at building agents.
I’m burning out from all this hypester type of thing, it’s really really tiring.
The constant barrage of excrement makes critical thinking ever harder, which is by design (it has been proven that pumping out BS en masse is way easier than debunking it). Stop using your brain already and just buy what they tell you. Thinking is done by machines now. As is pumping out BS about how good machines are at thinking.
As for the graph, it’s too generic, it doesn’t provide any real value, other than a certain pseudo-appeal reminiscent of paper-style visuals. In my humble opinion, it’s designed to mislead people who fall for hype, much like some of Google’s recent pseudo-scientific blog posts on machine learning.
I have deep respect for Sutton and his work, but this kind of things are a hard pass for me.
...It is in a format that resembles a published article because it is going to be a published article? "This is a preprint of a chapter that will appear in the book Designing an Intelligence, published by MIT Press." on the first page.
> As for the graph, it’s too generic
A history of RL from DQN to AlphaProof/LLM computer use in Gemini is not 'generic', and could not be.
> it doesn’t provide any real value
It provides value to people who were not around then and not familiar with how RL attention peaks and crests, and a similar chart about TD-Gammon and Deep Blue, say, would likewise be useful for the many people who did not actually live through those eras, and helps contextualize material from back then. (I did, and maybe you did, and so it's not useful to us, but there exist other, younger people in the world, who are not us{{citation needed}}.) And the fact that these cycles exist is something worth reflecting on - Karpathy and others have reflected on how there were expectations of DRL leading to AGI in the 2015-2020 period, which wound up being swamped by self-supervised learning and DRL relegated to a backwater (and contributed very directly to many major events like how OA and DM became like they are now - and why Sutton is at Keen rather than DM with Silver), but now suddenly becoming super-relevant again.
It doesn't make any difference and doesn't invalidate my critique. It appears to be a science communication book, so it could easily be a web page. Even if it was a LaTeXish PDF there were multiple ways to not making it a PDF that resembles a scientific article, there's a precise choice being made about how to communicate. The medium is the message.
> A history of RL from DQN to AlphaProof/LLM computer use in Gemini is not 'generic', and could not be.
History of RL is not 'generic' and is indeed really interesting, I look forward to reading Sutton's book! But the graph in the PDF is. The y-axis is ill-defined because
1. it combines different technologies (DQN, AlphaGo, GPT models) on a single continuum implying direct comparison.
2. the evergreen hypester future trajectory toward "superhuman intelligence"
I will not comment further on the graph, it's not a interesting visualization in my opinion and only serves the author's purpose for the narrative of "feeling the AGI (through RL)” There would be more interesting way of plotting this information for a general public. I agree that is harsh from me that it doesn’t provide value. Maybe it provides value to people who want to explore RL now, but again, the medium is the message, and this format is clearly saying out loud “look at me, I’m a paper, trust me.”
If this is really your response, I agree with your original comment about you being burned-out.
And they imitate the style pretty well. It's kinda funny. (I don't follow this topic closely (I clicked because I imagined something entirely different from the title), so it's the first time I see it)
But I guess you mean this? https://www.sciencedirect.com/science/article/pii/S000437022...
"Reward is enough", David Silver, Satinder Singh, Doina Precup, Richard S. Sutton, 2021.
They specifically talk about this in the position paper and describe the need for a "flexible" reward function that's adaptive. It's very hand-wavy and doesn't really describe how they would do this, but there's a lot of research along these lines. "agi" is also not really the subject of the paper.
I think the core idea from the paper is that while we have already hit the ceiling of normal kind of data; there's a new kind of data from agents acting in the real world and users (or some one else?) providing rewards based on some ground truth.
Somehow I misinterpreted from this paper that this kind of learning would be autonomous and continuous.
Meaning we need less strong arms and more strong brains. Not something that new anyway, the "information age" already makes clear intelligent people could do pretty anything they want to do, while less intelligent are constrained in what they can actually do even if they want.
Experience means essentially automation in the chapter terms, something we have already "solved" could be automated by some machine. To solve new things we need humans. That's is.
Small potatoes new knowledge, meaning knowledge emerging merely crossing per-existing knowledge like from a literature review paper could be a machine game, it's not really creation of new knowledge in the end.
BUT the real point is another: who own the model? LLMs state a clear thing, we need open knowledge just to train them, copyright can't be sustained anymore. But once a model is created who own it? Because the current model is dramatically dangerous since training is expensive and not much exiting, so while it could be a community procedure in practice is a giant-led process, and the giant own the result while harvesting anything from anyone. The effect implied by such evolution are much more startling then the mere automation risk in Lisanne Bainbridge terms https://ckrybus.com/static/papers/Bainbridge_1983_Automatica... or short/mid term job losses.
jgbmlg•9mo ago
unsnap_biceps•9mo ago
whatnow37373•9mo ago
It’s quite beautiful. Once a civilization tries to build machine intelligence it slowly degrades its own capacity during the process thus eventually losing all hope of ever achieving their goal - assuming they still understand their goal at that point. Maybe it’s an algorithm in the Universe to keep us from being naughty.
dullcrisp•9mo ago
teberl•9mo ago
anal_reactor•9mo ago
It's not. It's just that previously we were unaware how stupid people are, and now we're starting to understand this.
baxtr•9mo ago
CuriouslyC•9mo ago
huijzer•9mo ago
lazide•9mo ago
Happening right out in the open, and quite blatantly.
chneu•9mo ago
With social media and the Internet, stupid just got louder. I don't think people got stupid.
croes•9mo ago
baxtr•9mo ago
Also well documented. Anyone interested, read the book: Attention Merchants by Wu.
sdsd•9mo ago
Eg: https://www.popularmechanics.com/science/a43469569/american-...
_Algernon_•9mo ago
trollbridge•9mo ago
jimbob45•9mo ago
codeflo•9mo ago
disqard•9mo ago
...and TV brought us Ronald Reagan, and the Internet gave us Trump as POTUS.
daseiner1•9mo ago
KineticLensman•9mo ago
Well the 30-year war definitely showed technology-driven speed-up. Before the printing press we had wars that lasted for 100 years [0]
[0] https://en.wikipedia.org/wiki/Hundred_Years%27_War
codeflo•9mo ago
apwell23•9mo ago
is it ? i am listening to most beautiful music that was ever created. it was created in 2024.
daseiner1•9mo ago
bethekidyouwant•9mo ago
daseiner1•9mo ago
enaaem•9mo ago