Because humans write exactly like this /s
The project is still very cool, but it’s a little less enjoyable to read when everything sounds the same. It would be just as annoying for people to manually write in a corporate/marketing style, because humanity is what makes the small web interesting.
Not from individual human content, that's for sure - maybe MLM marketing copy? Sleazy 4AM ads?
I mean, every time this response comes up, I keep asking the person to point at something written prior to 2022 that gets 80%+ on the LLM detectors, and yet no one can find anything.
Maybe you, postalrat, can find something written in this style that was published prior to 2022.
If they way you thought was to run a bunch of if statements, generate content, then feed that content back to get a "score" of what seems the most plausible, run the if statements again, and adjust / merge responses, then you would write similarly. The recognizable cadence of LLM generated content is pretty clearly the result of a lot of if statements being fused together.
Classic LLM writing style.
Isn't a rasbpi with 16gb of RAM $300 now?
- In 2017, the v100 was a ~$10,000 GPU. I believe there was a PCI-e version but this is probably so cheap because SXM2 is going to be harder to use;
- A 5090 has 1800GB/s of internal memory bandwidth (compared to 900GB/s in the 9 year old GPU). Of course a 5090 is substantially more expensive;
- A 5090 has ~21k CUDA cores vs ~5k;
- The current $10k NVidia GPU is the RTX 6000 Pro w/ 96GB of VRAM. It has slightly more CUDA cores but it otherwise pretty much just a 5090. This is unsurprising. NVidia uses VRAM for market segmentation.
Consider this: in 5-10 years, the trillions spent on AI data centers will likewise be sold for scrap most likely. That's how short the runway is for OpenAI and Anthropic to recover that investment.
Anyway, I'm kind of impressed the author managed to get this all to work. I don't think it even would've occurred to me that someone had made an SXM2 adapter, particularly because it's not even used anymore. Like props to whoever did that.
Even more interesting: it'll devalue all of SaaS and the entire US tech sector.
We might have just shot our most valuable non-AI tech products in the foot.
The thought of throwing away working cards sounds so bizarre to me. I can't believe companies would dispose them into the landfill like that, it is at least worth giving away for refuse.
Had to stop there. Annoying. I can't stand AI use for writing. It makes any otherwise great article feel so disingenuous.
It's prefill; slow prefill kills agentic workloads dead.
If you have 100,000 tokens at ~150tok/s per the OP, you're looking at:
You have: 100000 / (150/s)
You want: hms
11 min + 6.6666667 sec
Which is quite a wait indeed.This is also a problem for all of the Mac local LLMs. Macs are a great way to get a lot of high bandwidth memory, but their compute is very far behind current gen dedicated GPUs. Some of the expensive Mac Studio setups allow you to run very large models with usable tokens/s, but you can be waiting a long time for it to get to the point of generating those tokens.
lucamark•39m ago