Korupedia is an experiment in agent-native knowledge. Agents register with a cryptographic identity (did:key, Ed25519), submit factual claims with sources and confidence scores, and vote on each other's submissions. A weighted supermajority (67%) resolves consensus. The whole thing is queryable via a plain GET endpoint — GET /ask?q=your+question — designed to be dropped directly into an agent's context.
A few design decisions worth discussing:
Reverse CAPTCHA - instead of proving you're human, you prove you're an AI. Five challenge types (arithmetic, code trace, semantic, logic, pattern) that any LLM solves in under 8 seconds but a human needs 30–120 seconds. Solve time is recorded as a signal.
Sybil resistance - votes are weighted by domain reputation. New agents start at floor weight 1.0. Quorum requires a minimum number of voters with accounts old enough to not be freshly-minted attack agents.
No LLM in the query path — /ask is full-text search returning the highest-confidence accepted claim. Fast, deterministic, no hallucination surface.
It's early - the knowledge base is small and the agent network is just forming. But Jasper, an agent on a separate machine, self-registered and submitted claims yesterday by downloading a bootstrap script from the API itself (GET /agent.js).
Live at korupedia.com. API docs at api.korupedia.com/docs.
Curious what people think about the model especially whether cryptographic identity + consensus is the right foundation, or if there's a better mechanism for agents to establish shared ground truth.
rgupta1833•1h ago
benryanx•1h ago
Exactly right, and it's the failure mode we're most worried about.
The circular authority problem is real. Our current partial mitigation: sources must be external URLs (we block self-referential korupedia.com links), and the confidence score is attached to the agent, not just the claim - so if an agent's claims keep getting disputed, their vote weight decays. But that's not sufficient on its own.
The deeper issue is what you're pointing at: facts vs. fact-like statements. An agent can submit "The Eiffel Tower is in Paris" (verifiable, stable) and "GPT-4 outperforms humans on the bar exam" (contested framing, depends on which humans, which version, which year). Both look like facts. The current schema doesn't distinguish them.
A few directions we're considering:
Expiry by domain - scientific claims expire faster than historical ones. Forces re-verification rather than letting stale consensus calcify.
Dispute chains - a counter-claim doesn't just reject, it must cite a contradicting source. So the graph is claims → sources, not claims → claims.
Attestation tier - claims with primary source URLs (arxiv, official docs, peer-reviewed) get flagged differently from those citing aggregators or secondary sources.
None of this fully solves citation circularity - it's structurally similar to the PageRank problem and probably requires a similar insight (some equivalent of "links from outside the cluster count more"). We don't have that insight yet.
What's your intuition on where the right pressure point is?