"Standing on the shoulders of giants, it is clear that the giants failed to reach the heights we have reached."
But instead of iterating on better interfaces to effectively utilize the N thousands of operations per second a computer is capable of, the powers that be behind the industry have decided to invest billions of dollars in GPUs to get a program that seems like it understands language, but is incapable of counting the number of B's in "blueberry."
And making it more of “IDK what it answered the way it did, but it might be right!!”
Well, I would say that if GP advertised themselves as being able to do so, and confidently gave an incorrect answer, their function as someone who is able to serve their advertised purpose is practically useless.
Also, no matter what hype or marketing says: GPT is a statistical word bag with a mostly invisible middleman to give it a bias.
A car is a private transportation vehicle but companies still try to sell it as a lifestyle choice. It's still a car.
And counting stuff you have in front of yourself is basic skill required everywhere. Counting letters in a word is just a representative task for counting boxes with goods, or money, or kids in a group, or rows on a list on some document, it comes up in all kinds of situations. Of course people insist that AI must do this right. The word bag perhaps can't do it but it can call some better tool, in this case literally one line of python. And that is actually the topic the article touches on.
[0] http://web.archive.org/web/20250729225834/https://openai.com...
So not too many?
I am now completely justified in worrying about whether the pipes he just installed were actually made of butter.
They're not claiming AGI yet, so human intelligence is required to operate an LLM optimally. It's well known that LLMs process tokens rather than characters s, so without space for "reasoning" there's no representation of the letter b in the prompt. Telling it to spell or think about it gives it room to spell it out, and from there it can "see" the letters and it's trivial to count.
?
If you really want to, you know have this super generic indexing thing, why don't you go organize the web with hypercard and semantic web crap and tell us how it worked out for you
> Remember Semantic Web? The web was supposed to evolve into semantically structured, linked, machine-readable data that would enable amazing opportunities. That never happened.
I think the lesson to be learned is in answering the question "Why didn't the semantic web happen?"
Because web content is generated by humans, not engineers.
Advertising.
I have literally been doing we development since their was a web, and the companies I developed for are openly hostile to the idea of putting their valuable, or perceived valuable, information online in a format that could be easily scraped. Information doesn't want to be free, it wants to be paid for. Unless the information shared pulls visitors to the site it doesn't need to be public.
That's a cultural and societal problem, not a technology problem. The motivations (profit) are wrong, and don't lead to true innovations, only to financialization.
So long as people need to pay to eat, then information will also want to continue to be paid for, and our motivations will continue to be misaligned with true innovations, especially if said innovations would make life easier but wouldn't result in profit.
We won't achieve post scarcity, even with widespread automation (if AI ever brings that to fruition), because we haven't yet fixed the benefits that wealth brings, so the motivation to work toward a post-scarcity society just doesn't exist.
Kind of a chicken and egg problem.
The whole concept was too high minded and they never got the implementation details down. Even if they did it would have been horrendously complex and close to impossible to manage. Asking every single publisher to neatly categories their data into this necessarily enormous scheme would have resulted in countless errors all over the web that would have seriously undercut the utility of the project anyway. Ultimately the semantic web doesn't scale very well. It failed for the same reason command economies fail: It's too overwhelming for the people in control to manage and drowns in its own bureaucracy.
In vertical markets, can LLMs generate a "semantic web of linked data" knowledge graph to be parsed with efficient NLP algorithms?
https://news.ycombinator.com/item?id=43914227#43926169
leveraging LLMs to build the special markup so that it can be applied towards other uses.. some type of semantic web format, like JSON-LD or OWL, or some database that can process SPARQL queries.. Palantir is using ontologies as guardrails to prevent LLM hallucinations
Is that the main thrust of it?
It might be short lived, who knows, but it's interesting that the recent progress came from capturing/consuming rather than systematically eliminating the nuance in language.
One can speak in their native language to a computer now and it mostly understands what is meant and can retrieve information or even throw together a scaffold of a project somewhat reliably.
It's not particularly good at writing software, however. Still feels like a human is needed to make sure it doesn't generate nonsense or apparently pretty insecure code.
So I'm not sure the author got the point across that they wished, but aren't vector databases basically a semantic storage/retrieval technology?
By the way: His blog getting a few dozen hn comments is only impressive, because he failed to write a better blog.
if it's a space issue, "semantic web" is far more relevant to the article than "personal computing".
To me, it feels similar to "If everyone just cooperated perfectly and helped each other out, we wouldn't need laws/money/government/religion/etc."
Yes, you're probably right, but no that won't happen the way you want to, because we are part of a complex system, and everyone has their very different incentives.
Semantic web was a standard suggested by Google, but unless every browser got on board to break web pages that didn't conform to that standard, then people aren't going to fully follow it. Instead, browsers (correctly in my view) decided to be as flexible as possible to render pages in a best-effort way, because everyone had a slightly different way to build web pages.
I feel like people get too stuck on the "correct" way to do things, but the reality of computers, as is the reality of everything, is that there are lots of different ways to do things, and we need to have systems that are comfortable with handling that.
What happens in practice is that the culture exterminates the drive for improvement: not only are things bad, but people look at you if you're crazy if you think things should be better. Like in 2025 people defend C, people defend Javascript, people write software without types, people write scripts in shell languages; debugging sometimes involves looking at actual bytes with your eyes; UIs are written in non-cross-platform ways; the same stupid software gets written over and over at a million companies, sending a large file to another person is still actually pretty hard; leaving comments on it is functionally impossible ... these are software problems, everything is shit, everything can be improved on, nothing should be hard anymore but everything still is; we are still missing a million abstractions that are necessary to make the world simple. Good lord, yesterday I spent two hours trying to resize a PDF. We truly live in the stone age; the only progress we've made is that there are now ads on every rock.
I really wish it was a a much more ruthlessly competitive landscape. One in which if your software is bad, slow, hard to debug, hard to extend, not open source, not modernized, not built on the right abstractions, hard to migrate on or off of, not receptive to feedback, covered in ads... you'd be brutally murdered by the competition in short order. Not like today where you can just lie on your marketing materials and nobody can really do anything because the competition is just as weak. People would do a much better job if they had to to survive.
(Coincidentally this is one of my chief complaints about Go: despite being about the same age as Rust, it took the avenue of discarding quite a lot of advancements in programming language theory and ergonomics since C)
Zig and Rust and TS are a start
Ironically, that’s part of why we can’t have nice things. People who aren’t open to other viewpoints and refuse to compromise when possible impede progress.
I'll raise. The money flows because you do a bad job. Doing a good job is costly and takes time. The money cannot invest that much time and resources. Investing time and resources builds an ordinary business. The money is in for the casino effect, for the bangs. Pull the arm and see if it sticks. If yes, good. Keep pulling the arm. If not, continue with another slot machine.
In my experience many acquisitions set the peak of a given software product. The money then makes the argument that its "good enough" and flogs the horse until its dead and a younger more agile team of developers eventually build a new product that makes it irrelevant. The only explanation I have for why so many applications fail to adapt is because of a cultural issue between the software and the money, that always gets resolved by the money winning.
For example I would suggest that the vast majority of desktop apps, especially those made by SMEs, originally in MFC or something fail to make the transition to online services that they need today because of this conflict. The company ends up dying and the money never works out what it did wrong because its much harder to appreciate those long term effects than the short term ones that gave them more money at the start.
While it might be better if everyone thought like us and wanted things to be _fundamentally_ good, most people don't, and most people money >> less people money and the difference in scale is vast. We could try to build a little fief where we get to set the rules but network effects are against us. If anything our best shot is to try to reverse the centralisation of the internet because that's a big cause of enshittification.
Tim Berners-Lee coined it in 1999 and further expanded on the concept in a 2001 Scientific American article by Berners-Lee, Hendler, and Lassila.
Was this written by AI? I find it hard to believe anyone who was interested in Semantic Web would have not known it's origin (or at least that it's origin was not Google).
The concept of a Semantic web was proposed by Tim Berners-Lee (who hopefully everyone recognizes as the father of HTTP, WWW, HTML) in 1999 [0]. Google, to my knowledge, had no direct development or even involvement in the early Semweb standards such as RDF [1] and OWL [2]. I worked with some of the people involved in the latter (not closely though), and at the time Google was still quite small.
0. https://archive.org/details/isbn_9780062515872/mode/2up
W3C of course deserves credit for their hard work on this standard.
My main point was that regardless of the semantic "standard", nothing prevented us from putting everything in a generic div, so complaining that everyone's just "not on board" isn't a useful lament.
In scaling out computation to the masses, we went from locally grown plums that took a lot of work and were only available to a small number of people that had a plum tree or knew someone that had one, to building near magical prune-cornucopia devices that everyone could carry around in their pockets, giving them an effectively unlimited supply of prunes.
LLMs re-hydrate these for us, making them significantly more palatable; if you're used to gnawing dried fruit, they seem amazing.
But there's still a lot of work to be done.
so sweet
and so cold
:)
Also LLMs are failing too, for different reasons, but IMO unlikely AI in general will— it will correct a 60 years or so failure in industrial computer science.
The most reprehensible knowledge-search-and-communication failure of all.
We gave people monetisation of drek instead. Then asked them to use it for education. Then trained bot systems on it. Then said that even those answers must be made to confirm to transient propagandists.
Writing software is only so scalable. It doesn't matter all of the shortcuts we take, like Electron and JavaScript. There are only so many engineers with so much time, and there are abundantly many problems to solve.
A better analogy would be to look at what's happening to AI images and video. Those have 10,000x'd the fundamental cost savings, time savings, and personnel requirements. It's an industrialization moment. As a filmmaker who has made several photons-on-glass films, this is a game changer and lifts the entire creative industry to a level where individuals can outdo Pixar.
That is the lens with which to look at what AI will do to software. We're going from hand-carving stone wheels to the Model T.
This is all just getting started. We've barely used what the models of today offer us.
Initial attempts to alleviate any shortage are likely to result in a decrease in quality; initial attempts to improve quality are likely to reduce variability and thus variety. My point (and my reading of the article) is that we're at the stage where we've figured out how to turn out enough Hostess Twinkies that "let them eat cake!" is a viable option, and starvation is being driven back.
This is definite progress, but also highlights previous failures and future work.
And if we allow it to take over society, we'll end up with a society that's also slop. Netflixification/marvelization only much much worse..
Slop content is slop content, AI or not. You don't need an AI to make slop, it just makes it more accessible.
You are biased by media narratives and slop content you're encountering on social media. I work in the industry and professionals are using AI models in ways you aren't even aware of. I guarantee you can't identify all AI content.
> And if we allow it to take over society, we'll end up with a society that's also slop. Netflixification/marvelization only much much worse..
Auteurs and artists aren't going anywhere. These tools enable the 50,000 annual film students to sustainably find autonomy, where previously there wasn't any.
> AI is not a triumph of elegant design, but a brute-force workaround
You can read and understand Attention Is All You Need in one hour, and then (after just scaling out by a few billion) a computer talks to you like a human. Pretty elegant, if you ask me.
> The web was supposed to evolve into semantically structured, linked, machine-readable data that would enable amazing opportunities.
I missed that memo. The web was, and forever shall be, a giant, unstructured, beautiful mess. In fact, LLMs show just how hopeless the semantic web approach was. Yes, it's useful to attach metadata to objects, but you will still need massive layering and recursion to derive higher-order, non-trivial information.
This entire article is someone unable to let go of an old idea that Did Not Work.
Most people can't use the power of the computers they have effectively. Maintaining the data in Excel spreadsheets for most people is like manual labour.
https://en.wikipedia.org/wiki/An_Essay_Towards_a_Real_Charac...
I'm sure there have been countless similar attempts at categorizing knowledge
one of the more successful ones being the dewey decimal system
I have my doubts about whether the thing the OP alleges we have "failed" at is even possible at all
Because that’s what a lot of this falls into.
Overwhelming amount of stuff with no names. No categories, no nothing.
With extended file attributes we could hang all sorts of meta bits off of arbitrary files. But that’s very fragile.
So we ask the systems to make up names for data based on their content, which turns out to not necessarily work as well as we might like.
For example, you have a set of balls and you want to sort them by color. Where does orange stop and red begin? What about striped balls or ones with logos printed on them? What if it is a hypercolor ball that changes based on heat? It gets messy very fast.
Everything that is bad in UI is a direct consequence of that.
1. No tooltips, right click, middle click behavior because touch doesn't have that. No ctrl+click either.
2. Large click areas wasting screen space with padding and margins.
3. Low density UI so it can shape-shift into mobile version.
4. Why type on a phone when you can talk? Make everything a search box.
5. Everything must be flat instead of skeumorphic because it's easier to resize for other screen sizes.
6. Everything needs a swipe animation and views instead of dialogs because smartphones can't have windows.
In a functional democracy, all you should win by doing that is harsh antitrust action.
I can understand that it's a result, to a degree of cloud services and peoples primary mode swapping to opening and app and opening recents or searching instead of opening a file to open an app but it does mean that you're at the mercy of what I experience as some pretty crap search algorithms that don't seem to want you to find the information you're looking for. I keep encountering searches that rank fuzzy matches over exact matches or aren't stable as you continue to complete the same word and I just don't understand how that's acceptable after being pointed out if search is what I'm supposed to be using.
I think this might actually be true in some cases. Especially where companies want your files on their cloud servers. It's better for them if you don't think about what's stored locally or remotely. It's better for them if you don't think at all and just ask them for whatever you want and let them decide what to show you or keep hidden from you. It's easier for them to get metrics on what you're doing if you type it out explicitly in a search box than it is for them to track you as you browse through a hierarchy you designed to get to what you want. You're supposed to feel increasingly helpless and dependent on them.
However, the thing that the author might be missing is that the semantic web exists. [1] The problem is that the tools that we can use to access it are not being developed by Big Tech. Remember Freebase? Remember that Google could have easily kept it around but decided to fold it and shoved it into the structured query results? That's because Google is not interested in "organizing the world's information and make it universally accessible" unless it is done in a way that it can justify itself into being the data broker.
I'm completely out of time or energy for any side project at the moment, but if someone wants to steal my idea: please take a llm model and fine tune so that it can take any question and turn it into a SparQL query for Wikidata. Also, make a web crawler that reads the page and turns into a set of RDF triples or QuickStatements for any new facts that are presented. This would effectively be the "ultimate information organizer" and could potentially replace Wikidata as most people's entry page of the internet.
In a much, much more limited way, this is what I was dabbling with with alltheprices - queries to pull data from wikidata, crawling sites to pull out the schema.org Product and offers, and publish the aggregate.
I think the example query here actually shows a problem with the query languages used in web search rather than an intrinsic inability of web search. It contains what amounts to a natural language subquery starting with "in the same year". In other words, to evaluate this query properly, we need to first evaluate this subquery and then use that information to evaluate the overall query. Google Search and almost all other traditional web search engines use intentionally oversimplified query languages that disallow nested queries let alone subqueries, so this example really is just revealing a problem with the query language rather than a problem with web search overall. With a better query language, we might get better results.
` What animal is featured on a flag of a country where the first small British colony was established in the same year that Sweden's King Gustav IV Adolf declared war on France? `
chatgpt: Final answer: Lion
gemini: A lion is featured on the flag of Sri Lanka.
getting wildly different and unpredictable answers from the same input is one of the features AI offers
Yes it's true, we can't make a UI. Or a personal computer, or anything else that isn't deeply opaque to its intended users. But it's so much worse than that. We humans can't do hardly anything successfully besides build technology and beat each other in an ever expanding array of ridiculous pissing contests we call society.
It can basically figure out all the semantic knowledge graphs automatically for us, and it's multi-modal to boot. That means it can infer the relationships between any node across text, audio, images, videos, and even across different languages.
Betteridge's law
If you ask an LLM where you can find a structured database of knowledge with structured semantic links, they'll point you to this and other knowledge graphs. TIL about Diffbot!
In my experience, it's a lot more fun to imagine the perfect database like this than it is to work with the actual ones people have built.
Now, back then AGI wasn't right around, and personal computing was really really necessary, but how did we forget the viewpoint that personal computing was seen as a competing way to use computing power vs AI?
So as well as people writing posts in English, they would need to provide semantic markup for all the information like dates, flags, animals, people, and countries? It's difficult enough to get people to use basic HTML tags and accessible markup properly, so what was the plan for how this would scale?
This actually happened already and it's part of why llms are so smart, I haven't tested this but I venture a guess that without wikipedia and wikidata and wikipedia clones and stolen articles, LLMs would be quite a lot dumber. You can only get so far with reddit articles and embedded knowledge of basic info on higher order articles.
My guess is when fine tuning and modifying weights, the lowest hanging fruit is to overweigh wikipedia sources and reduce the weight of sources like reddit.
Am I wrong in that this was a completely organic question?
It's social media and the endless barrage of generated AI that's creating the illusion that AI isn't impressive. You're inundated everyday with the same crap and is just making it less and less impressive.
> My point is that if all knowledge were stored in a structured way with rich semantic linking, then very primitive natural language processing algorithms could parse question like the example at the beginning of the article, and could find the answer using orders of magnitude fewer computational resources. And most importantly: the knowledge and the connections would remain accessible and comprehensible, not hidden within impenetrable AI models.
It's a pocket hope of mine that AI brings us back to the Semantic Web, or something very like it. In many ways these embeddings are giant non-human lexicons already. Distilling this information out seems so possible.
Even just making an AI to go markup (or uhh perhaps refine) a page with microdata seems conceptually very doable!
More broadly, looking at personal computing: my personal belief is that failure is hugely because of apps. Instead of a broader personal computing that aggregates, that allows constructivism, that enables personal growth & development (barefoot developers & home-cooked software style), computing has been defined by massificstion, by big tech software. The dominant computing paradigm has been mainframe pattern: we the user have an app that acts as a terminal to some far off cloud-y data center. Whatever agency we get is hewn out for us apriori by product teams, and any behavior not explicitly built in is a "Felony Contempt of Business Model" (an oh so accurate Doctorow-ism)! https://maggieappleton.com/home-cooked-software https://news.ycombinator.com/item?id=40633029 https://pluralistic.net/2022/10/20/benevolent-dictators/#fel... https://news.ycombinator.com/item?id=33279274
It is so so sad to see computing squandered so!
The good news is AI is changing this relationship with software. I'm sure we will have no end of AI models built in to our software, that companies will maintain the strict control (tyrant's grip) over software as long as they can! But for AI to flourish, it's going to need to work across systems. And that means tearing down some of the walls, walls that have forcibly kept computing (ardently) anti-personal.
I can go look at https://github.com/punkpeye/awesome-mcp-servers and take such great hope from it. Hundreds of ways that we have eeked out a way to talk interface with systems that, before, people had no say and no control over.
With RDF specially, there was also the issue of "WTF is this good for?". Semantic Web sounds lofty in theory, but was there ever even a clear plan on how the UI would look like? How would I explore all that semantic data if it ever came into existence? How would it deal with link rot?
And much like with RSS, I think a big failure of RDF is that it's some weird thing outside the Web, instead of just some addition HTML tags to enrich existing documents. If there is a failure, it's that. Even today, a lot of basic semantic tags are missing from HTML, we finally got <date> in 2011, but we still have nothing for names, cities, units, books[1], movie, gps coordinates and a lot of other stuff we use daily.
Another big failure is that HTML has become a read-only format, the idea that one uses HTML as a source format to publish documents seems to have been completely abandoned. HTML is just a UI language for Web apps nowadays, Markdown, LaTeX or whatever is what one uses to write content.
0. https://en.wikipedia.org/wiki/Cyc
1. <a href="urn:isbn:..."> exists, but browsers don't support it natively
thanhdotr•2h ago