frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Collecting All Causal Knowledge

https://causenet.org/
91•geetee•5h ago

Comments

pavlov•3h ago
The sample set contains:

    {
        "causal_relation": {
            "cause": {
                "concept": "boom"
            },
            "effect": {
                "concept": "bust"
            }
        }
    }
It's practically a hedge-fund-in-a-box.
kolektiv•3h ago
Plus, regardless of what you might think of how valid that connection is, what they're actually collecting, absent any kind of mechanism, is a set of all apparent correlations...
bbor•3h ago
> CauseNet aims at creating a causal knowledge base that comprises all human causal knowledge and to separate it from mere causal beliefs

Pretty bold to use a picture of philosophers as your splash page and then make a casual claim like this. To say the least, this is an impossible task!

The tech looks cool and I'm excited to see how I might be able to work it into my stuff and/or contribute. But I'd encourage the authors to reign in the rhetoric...

maweki•3h ago
It's nice to see more semantic web experiments. I always wanted to do more reasoning with ontologies, etc., and it's such an amazing idea, to reference objects/persons/locations/concepts from the real world with uris and just add labeled arrows between them.

This is such a cool schemaless approach and has so much potential for open data linking, classical reasoning, LLM reasoning. But open data (together with RSS) has been dead for a while as all big companies have become just data hoarders. And frankly, while the concept and the possibilities are so cool, the graph databases are just not that fast and also not fun to program.

thicknavyrain•3h ago
I know it's a reductive take to point to a single mistake and act like the whole project might be a bit futile (maybe it's a rarity) but this example in their sample is really quite awful if the idea is to give AI better epistemics:

    {
        "causal_relation": {
            "cause": {
                "concept": "vaccines"
            },
            "effect": {
                "concept": "autism"
            }
        }
    },
... seriously? Then again, they do say these are just "causal beliefs" expressed on the internet, but seems like some stronger filtering of which beliefs to adopt ought to be exercised for an downstream usecase.
kolektiv•3h ago
Oh, ouch, yeah. We already know that misinformation tends to get amplified, the last thing we need is a starting point full of harmful misinformation. There are lots of "causal beliefs" on the internet that should have no place in any kind of general dataset.
Amadiro•1h ago
It's even worse than that, because the way they extract the causal link is just a regex, so

"vaccines > autism"

because

"Even though the article was fraudulent and was retracted, 1 in 4 parents still believe vaccines can cause autism."

I think this could be solved much better by using even a modestly powerful LLM to do the causal extraction... The website claims "an estimated extraction precision of 83% " but I doubt this is an even remotely sensible estimate.

kykat•3h ago
In the precision dataset, there are the sentences that led to this, some are:

>> "Even though the article was fraudulent and was retracted, 1 in 4 parents still believe vaccines can cause autism."

>> On 28 February 1998 Horton published a controversial paper by Dr. Andrew Wakefield and 12 co-authors with the title "Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children" suggesting that vaccines could cause autism.

>> He was opposed by vaccine critics, many of whom believe vaccines cause autism, a belief that has been rejected by major medical journals and professional societies.

All that I've seen don't actually say that vaccines cause autism

refactor_master•3h ago
Might as well go ahead and add https://tylervigen.com/spurious-correlations?page=135 from the looks of it.
jack_riminton•3h ago
Reminds me of the early attempts at hand categorising knowledge for AI
rhizome•3h ago
"The map is not the territory" ensures that bias and mistakes are inextricable from the entire AI project. I don't want to get all Jaron Lanier about it, but they're fundamental terms in the vocabulary of simulated intelligence.
tgv•3h ago
This makes little sense to me. Ontologies and all that have been tried and have always been found to be too brittle. Take the examples from the front page (which I expect to be among the best in their set): human_activity => climate_change. Those are such a broad concepts that it's practically useless. Or disease => death. There's no nuance at all. There isn't even a definition of what "disease" is, let alone a way to express that myxomatosis is lethal for only European rabbits, not humans, nor gold fish.
koliber•3h ago
Exactly. In some cases disease causes death. In others it causes immunity which in turn causes “good health” and postpones death.
Nevermark•1h ago
Contradictory cause-effect examples, each backed up with data, are a reliable indicator of a class of situations that need a higher chain-effect resolution.

Which is directly usable knowledge if you are building out a causal graph.

In the meantime, a cause and effect representation isn't limited to only listing one possible effect. A list of alternate disjoint effects, linked to a cause, is also directly usable.

Just as an effect may be linked to different causes. Which if you only know the effect, in a given situation, and are trying to identify cause, is the same problem in reverse time.

notrealyme123•2h ago
Koller and Friedman write in "Probabilistic Graphical Models" about the "clarity test", so that state variables should be clear for an all seeing observer.

States like "human_activity" are not objectively measurable.

Fairly PGMs and causal models are not the same, but this way of thinking about state variables is an incredible good filter.

jiggawatts•2h ago
Even more importantly, it's not even a simple probability of death, or a fraction of a cause, or any simple one-dimensional aspect. Even if you can simplify things down to an "arrow", the label isn't a scalar number. At a bare minimum, it's a vector, just like embeddings in LLMs are!

Even more importantly, the endpoints of each such causative arrow are also complex, fuzzy things, and are best represented as vectors. I.e.: diseases aren't just simple labels like "Influenza". There's thousands of ever-changing variants of just the Flu out there!

A proper representation of a "disease" would be a vector also, which would likely have interesting correlations with the specific genome of the causative agent. [1]

Next thing is that you want to consider the "vector product" between the disease and the thing it infected to cater for susceptibility, previous immunity, etc...

A hop, skip, and a small step and you have... Transformers, as seen in large language models. This is why they work so well, because they encode the complex nuances of reality in a high-dimensional probabilistic causal framework that they can use to process information, answer questions, etc...

Trying to manually encode a modern LLM's embeddings and weights (about a terabyte!) is futile beyond belief. But that's what it would take to make a useful "classical logic" model that could have practical applications.

Notably, expert systems, which use this kind of approach were worked on for decades and were almost total failures in the wider market because they were mostly useless.

[1] Not all diseases are caused by biological agents! That's a whole other rabbit hole to go down.

Nevermark•1h ago
That was very well said.

One quibble, and really mean only one:

> a high-dimensional probabilistic causal framework

Deep learning models aka neural network type models, are not probabilistic frameworks. While we can measure on the outside a probability of correct answers across the whole training set, or any data set, there is no probabilistic model.

Like a Pachinko game, you can measure statistics about it, but the game itself is topological. As you point out very clearly, these models perform topological transforms, not probabilistic estimations.

This becomes clear when you test them with different subsets of data. It quickly becomes apparent that the probabilities of the training set are only that. Probabilities of the exact training set only. There is no probabilistic carry over to any subset, or for generalization to any new values.

They are estimators, approximators, function/relationship fitters, etc. In contrast to symbolic, hard numerical or logical models. But they are not probabilistic models.

Even when trained to minimize a probabilistic performance function, their internal need to represent things topologically creates a profoundly "opinionated" form of solution, as apposed to being unbiased with respect to the probability measure. The measure never gets internalized.

tossandthrow•2h ago
Ontology, not ontologies, have been tried.

We have quite a good understanding that a system cannot be both sound a complete, regardless people went straight in to make a single model of the world.

Xmd5a•50m ago
Could you define sound and complete in this context ? IIRC Rust's borrow checker is sound (will not mark something dysfunctional as functional) but not complete: some programs would take too long to verify, the checker times out, and compilation fails even though the program is potentially correct.
kachnuv_ocasek•11m ago
> a system cannot be both sound a complete

Huh, what do you mean by this? There are many sound and complete systems – propositional logic, first-order logic, Presburger arithmetic, the list goes on. These are the basic properties you want from a logical or typing system. (Though, of course, you may compromise if you have other priorities.)

DrScientist•1h ago
I totally agreed that in the past years of hammering out an ontology for a particular area just results in a common understanding between those who wrote the ontology and a large gulf between them and the people they want to use it ( everyone else ).

What's perhaps different is that the machine, via LLM's, can also have an 'opinion' on meaning or correctness.

Going fully circle I wonder what would happen if you got LLM's to define the ontology....

Xmd5a•54m ago
>what would happen if you got LLM's to define the ontology.

https://deepsense.ai/resource/ontology-driven-knowledge-grap...

>hammering out an ontology for a particular area just results in a common understanding between those who wrote the ontology and a large gulf between them and the people they want to use it

This is the other side of the bitter lesson, which is just the empirical observation of a phenomenon that was to be expected from first principles (algorithmic information theory): a program of minimal length must get longer if the reality it models becomes more complex.

For ontologists, the complexity of the task increases as the generality is maintained while model precision is increased (top down approach), or conversely, when precision is maintained the "glue" one must add to build up a bigger and bigger whole while keeping it coherent becomes more and more complex (bottom up approach).

vintermann•1h ago
As I understand it, this is a dataset of claimed causation. It should contain vaccines->autism, not because it's true, but because someone, in public, claimed that it was.

So, by design, it's pretty useless for finding new, true causes. But maybe it's useful for something else, such as teaching a model what a causal claim is in a deeper sense? Or mapping out causal claims which are related somehow? Or conflicting? Either way, it's about humans, not about ontological truth.

morpheuskafka•1h ago
Also, it seems to mistake some definitions as causes.

A coronavirus isn't "claimed" to cause SARS. Rather, SARS is a name given to the disease cause by a certain coronavirus. Or alternatively, the name SARS-nCov-1 is the name given to the virus which causes SARS. Whichever way you want to see it.

For a more obvious example, saying "influenza virus causes influenza" is a tautology, not a causal relationship. If influenza virus doesn't cause influenza disease, then there is no such thing as an influenza virus.

vintermann•45m ago
Yes, I agree there are a lot of definitions or descriptions masquerading as explanations, especially in medicine and psychology. I think maybe insurance has a lot to do that. If you just describe a lot of symptoms, insurance won't know whether to cover it or not. But if you authoritatively name that symptom set as "BWZK syndrome" or something, and suddenly switch to assuming "BWZK syndrome" is a thing, the unknown cause to the symptoms, then insurance has something it can deal with.

But this description->explanation thing, whatever the reason, is just another error people make. It's not that different from errors like "vaccines cause autism". Any dataset collecting causal claims people make is going to contain a lot of nonsense.

dr_dshiv•1h ago
Democritus (b 460BCE) said, “I would rather discover one cause than gain the kingdom of Persia,” which suggests that finding true causes is rather difficult.
hugh-avherald•43m ago
Or is less of a hassle.
asplake•47m ago
Agreed. About the strongest we can hope for are causal mechanisms, and most of those will be at most hypotheses and/or partial explanations that only apply under certain conditions.

Honestly, I don’t know understand how these so-ontologies have persisted. Who is investing in this space, and why?

TofuLover•3h ago
This reminds me of an article I read that was posted on HN only a few days ago: Uncertain<T>[1]. I think that a causality graph like this necessarily needs a concept of uncertainty to preserve nuance. I don't know whether this would be practical in terms of compute, but I'd think combining traditional NLP techniques with LLM analysis may make it so?

[1] https://github.com/mattt/Uncertain

9dev•3h ago
Right. The first example on the site shows disease as a cause, and death as an effect. This is wrong on several levels: There is no such thing as healthy or sick. You’re always fighting off something, it just becomes obvious sometimes. Also, a disease doesn’t necessarily lead to death, obviously.
kaashif•2h ago
Since you're always going to die, the problem is solved - the implication is true by the right side always being true, and the left side doesn't matter.
9dev•2h ago
Then it’s correlation instead of causation and the entire premise of a causation graph is moot.
notrealyme123•2h ago
I get some vibes of fuzzy logic from this project.

Currently a lot of people research goes in the direction that there is "data uncertainty" and "measurement uncertainty", or "aleatoric/epistemic" uncertainty.

I foumd this tutorial (but for computer vision ) to be very intuitive and gives a good understanding how to use those concepts in other fields: https://arxiv.org/abs/1703.04977

koliber•3h ago
I wonder how they will quantize causality. Sometimes a particular cause has different, and even opposite, effects.

Alcohol causes anxiety. At the same time it causes relaxation. These effects depend on time frame, and many individual circumstances.

This is a single example but the world is full of them. Codifying causality will involve a certain amount of bias and belief. That does not lead to a better world.

lwansbrough•2h ago
I was hoping this would be actual normalized time series data and correlation ratios. Such a dataset would be interesting for forecasting.
ivape•2h ago
I don’t know if it’s inadvertent, but it’s headed toward just becoming an engine for over fitted generalizations. Each casual pair will just emerge based on frequency, which will reinforce itself in preemptively and prematurely classifying all future information.

Unfortunately, frequency is the primary way AI works, but it will never be accurate for causality because causality always has the dynamic that things can happen just “because”. It’s hacked into LLMs via deliberate randomness in next-token prediction.

huragok•2h ago
the cyc of this current ai winter
daloodewi•1h ago
this will be super cool if it can be done!
rwmj•1h ago
Isn't this like Cyc? There have been a couple of interesting articles about that on HN:

https://news.ycombinator.com/item?id=43625474 "Obituary for Cyc"

https://news.ycombinator.com/item?id=40069298 "Cyc: History's Forgotten AI Project"

athrowaway3z•1h ago
A cool idea, in desperate need of an example use case.
AlienRobot•36m ago
I wonder what is this for.
larodi•33m ago
Why not use PROLOG then, is the essence of cause and effect in programming. And also can expound syllogisms.

Astronoby: Astronomy, astrometry Ruby library for astronomical data and events

https://github.com/rhannequin/astronoby
2•thunderbong•3m ago•0 comments

How do you handle JDK/JRE patch updates for Java apps on K8s?

2•bgalek•3m ago•0 comments

Show HN: Concrete Calculator – Fast, accurate estimates

https://concrete-calculator.pro/en/
1•yunweiguo•9m ago•0 comments

Elon Musk Is Remaking Grok in His Image

https://www.nytimes.com/2025/09/02/technology/elon-musk-grok-conservative-chatbot.html
1•cainxinth•10m ago•0 comments

What's New with Firefox 142

https://www.mozilla.org/en-US/firefox/142.0.1/whatsnew/?oldversion=139.0.4&utm_medium=firefox-des...
3•keepamovin•12m ago•1 comments

Founders don't fail from lack of money. They fail because they build blind

https://startup-solve.lovable.app
1•Maulik_hacker•13m ago•1 comments

Brazilian wax customer horrified after noticing beautician wearing Meta glasses

https://www.dailymail.co.uk/femail/article-15054573/customer-brazilian-wax-horrified-beautician-w...
1•cebert•18m ago•0 comments

Show HN: AgentSea – Private, fast, and safe AI chat to access latest AI models

https://agentsea.com
4•miletus•25m ago•0 comments

Claude code now uses Opus by default – beware if you use API

3•vicek22•27m ago•0 comments

Show HN: Using nano-banana for outfit extraction and shopping links

https://github.com/olivierloverde/nano-banana-outfit-shopping
1•olivierloverde•28m ago•0 comments

Tidal disruption events at galactic supermassive black holes

https://telescope.livjm.ac.uk/News/#TDEs
1•sklargh•33m ago•0 comments

Comment by whmxtra (YouTube), MS Windows, SSD: "update (and possible solution)" [video]

https://www.youtube.com/watch?v=TbFIUu_7LIc&lc=Ugy2W5D-KDtpNk-BE3R4AaABAg
2•sipofwater•35m ago•2 comments

DDoS is the neglected cybercrime that's getting bigger. Let's kill it off

https://www.theregister.com/2025/09/01/ddos_opinion/
1•pseudolus•36m ago•0 comments

Exploration, Exploitation, and Thinking

https://joshs.bearblog.dev/exploration-exploitation-and-thinking/
1•protagonist_hn•36m ago•0 comments

Show HN: Graph Embeddings and Musical Artists

https://sparakala21.github.io/artist2vec2d/
1•sravanparakala•37m ago•0 comments

Flow state is the best pleasure in life. How to keep it sustainable?

1•ianberdin•37m ago•0 comments

For first time in 40 Years, Panama's deep and cold ocean waters fail to emerge

https://phys.org/news/2025-09-years-panama-deep-cold-ocean.html
3•pseudolus•40m ago•0 comments

Europol said ChatControl doesn't go far enough; they want to retain data forever

https://old.reddit.com/r/europe/comments/1n6cjw1/europol_said_chat_control_doesnt_go_far_enough/
8•nickslaughter02•41m ago•3 comments

Show HN: Safe-fetch – fetch() without try/catch

https://github.com/Asouei/safe-fetch
1•asouei•42m ago•0 comments

Unfortunately, the ICEBlock app is activism theater

https://micahflee.com/unfortunately-the-iceblock-app-is-activism-theater/
43•Improvement•43m ago•8 comments

Of Rats and Ratchets (2024)

https://matklad.github.io/2024/01/03/of-rats-and-ratchets.html
1•Bogdanp•43m ago•0 comments

Email Signatures and the Power of Defaults

https://buttondown.com/blog/email-signatures-his
1•Twixes•46m ago•0 comments

Free Will's Tiny Probabilistic Edge

https://xlii.space/thoughts/free-will-probabilistic-edge/
1•xlii•47m ago•0 comments

Kadrey vs. Meta Platforms, Inc

https://www.courtlistener.com/docket/67569326/598/kadrey-v-meta-platforms-inc/
1•Topfi•48m ago•0 comments

Ethics in AI: The New Frontier for Product Managers

https://www.prodpad.com/blog/ethics-in-ai/
1•kiyanwang•49m ago•0 comments

Ask HN: What is your biggest regret about a decision you made?

4•yu3zhou4•51m ago•1 comments

When an American Town Massacred Its Chinese Immigrants

https://www.newyorker.com/magazine/2025/03/10/when-an-american-town-massacred-its-chinese-immigrants
2•rbanffy•56m ago•0 comments

Let's all raise a glass to the 11-inch MacBook Air

https://ptrbrynt.com/posts/lets-all-raise-a-glass-to-the-11-inch-macbook-air
1•codeclimber•57m ago•3 comments

The crawl-to-click gap: Cloudflare data on AI bots, training, and referrals

https://blog.cloudflare.com/crawlers-click-ai-bots-training/
1•NicoJuicy•57m ago•0 comments

ArrowSpace: A primer for spectral indexing in vector search

https://www.tuned.org.uk/arrowspace-paper
1•tuned•58m ago•0 comments