frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Build real-time knowledge graph for documents with LLM

https://cocoindex.io/blogs/knowledge-graph-for-docs/
89•badmonster•7h ago

Comments

gorpy7•6h ago
idk if it’s precisely the same but o3 recently offered to create one for me in, was it markdown?, recently. suggesting it was something it was willing to maintain for me.
gorpy7•6h ago
i think it offered a few formats but specifically remember it would do it in obsidian to use concept map ability within.
cipehr•5h ago
sorry, what is `o3`? I am not familiar with it... unless you're talking about the open api chat gpt model?

If so thats crazy, and I would love pointers on how to prompt it to suggest this?

Onawa•1h ago
o3 is one of the myriad models offered by OpenAI. You can see some metrics and comparisons with other models here: https://artificialanalysis.ai/models/o3/providers
marviel•5h ago
mermaid probably.
dvrp•4h ago
I feel like you can do the same using a single markdown file and an LLM (e.g. Claude Code).

I do it that way and then I hooked it up with the Telegram API. I’m able to ask things like “What’s my passport number?” and it just works.

Combine it with git and you have a Datomic-esque way of seeing facts getting added and retracted simply by traversing the commits.

I arrived to the solution after trying more complex triplets-based approach and seeing that plain text-files + HTTP calls work as good and are human (and AI) friendly.

The main disadvantage is having unstructured data, but for content that fits inside the LLM context window, it doesn’t matter practically speaking. And even then, when context starts being the limiting factor, you can start segmenting by categories or start using embeddings.

th0ma5•4h ago
People probably don't discuss the problems enough about an open world knowledge graph. Essentially the same class of problems as spam filters. Using an open language model to produce a graph doesn't create a closed world graph by definition. This confusion as well as just general avoidance of measuring actual productivity outcomes seems like an insurmountable problem in knowledge world now and I feel language itself is failing at times to educate on this issues.
lyu07282•2h ago
They don't even do any entity disambiguation, the resulting graph won't be very useful indeed. I also saw people then use a different prompt to generate a cypher query from user input for RAG, I can't imagine that actually works well. It would make a little more sense if they then use knowledge graph embeddings, but I'm not sure if neo4j supports that.
Frummy•3h ago
Now imagine it with theorems as entities and lean proofs as relationships
manishsharan•3h ago
Why not merely upload all relevant documents into Gemini? Split the knowledge into smaller knowledge domains and have agents ( backed by Gemini) for each domain?
8thcross•3h ago
building knowledge graphs (GrahRAGs) are obsolete from a acamedic and technical point of view. LLMs are getting better with built in graph networks capable algorithms like SONAR and knowledge embeddings. like someone said - just use Notebook LM instead. But, they are useful in corporate setup when the infrastructure,teams and skills are lagging by years.
timfrazer•3h ago
Could you provide some academic proofs from what I read this isn’t true so I’d be interested to see what you’re referring to
phren0logy•2h ago
My use case is for documents related to a legal issue, where a foundation model has no knowledge of any of the participants or particular issues. There are many, many such situations. Your statement is ignorant and overly broad.
ianbicking•1h ago
I feel like I should understand the purpose of knowledge graphs, but I just... don't.

Like the example "CocoIndex supports Incremental Processing" becomes the subject/predicate/object triple (CocoIndex, supports, Incremental Processing)... so what? Are you going to look up "Incremental Processing" and get a list of related entities? That's not a term that is well enough defined to be meaningful across a variety of subjects. I can incrementally process my sandwich by taking small bites.

I guess you could actually expand "Incremental Processing" to some full definition. But then it's not really a knowledge graph because the only entity ever associated with that new definition will be CocoIndex, and you are back to a single sentence that contains the information, you've just pretended it's structured. ("Supports" hardly a well-defined term either!)

I can _kind of_ see how knowledge graphs can be used for limited relationships. If you want to map companies to board members, and board members to family members, etc. Very clearly and formally defined entities (like a person or company), with clearly defined relationships (board member, brother, etc). I still don't know how _useful_ the result is, but at least I can understand the validity of the model. But for everything else... am I missing something?

alexchantavy•1h ago
IMO knowledge graphs are a must have for security use-cases because of how well they handle many-to-many relationships. Who has access to read each storage bucket? Via which IAM policies? Who owns each bucket? What is the shortest possible role-assumption path available from internet-exposed compute instances to read this bucket? What is the effective blast radius from a vulnerability that allows remote code execution on an internet exposed compute instance?

Or, I have a docker container image that is built from multiple base images owned by different teams in my organization. Who is responsible for fixing security vulnerabilities introduced by each layer?

We really could model these as tables but getting into all those joins makes things so cumbersome. Plus visualizing these things in a graph map is very compelling for presentation and persuading stakeholders to make security decisions.

badmonster•1h ago
In my understanding, there are two kinds of use cases potentially can be explored with knowledge graph.

- Structured data - this is probably more close to the use case you mention

- Unstructure data and extract relationship and build KG with natural language understanding - which is this article trying to explore. Here is a paper discussing about this https://arxiv.org/abs/2409.13731

In general it is an alternative way to establish connections with entities easily. And these relationships could help with discovery, recommendation and retrieval. Thanks @alexchantavy for sharing use-cases in security.

Would love to learn more from the community :)

Type-constrained code generation with language models

https://arxiv.org/abs/2504.09246
109•tough•5h ago•47 comments

I’ve built an IoT device to let my family know when I’m in a meeting

https://nullonerror.org/2025/05/11/i-have-built-an-iot-device-to-let-my-family-know-when-i-am-in-a-meeting/
25•delduca•2d ago•18 comments

Flattening Rust’s learning curve

https://corrode.dev/blog/flattening-rusts-learning-curve/
106•birdculture•5h ago•84 comments

I failed a take-home assignment from Kagi Search

https://bloggeroo.dev/articles/202504031434
54•josecodea•1h ago•21 comments

Branch Privilege Injection: Exploiting branch predictor race conditions

https://comsec.ethz.ch/research/microarch/branch-privilege-injection/
335•alberto-m•10h ago•132 comments

Google is building its own DeX: First look at Android's Desktop Mode

https://www.androidauthority.com/android-desktop-mode-leak-3550321/
235•logic_node•12h ago•193 comments

Build real-time knowledge graph for documents with LLM

https://cocoindex.io/blogs/knowledge-graph-for-docs/
89•badmonster•7h ago•16 comments

Show HN: HelixDB – Open-source vector-graph database for AI applications (Rust)

https://github.com/HelixDB/helix-db/
131•GeorgeCurtis•10h ago•55 comments

Failed Soviet Venus lander Kosmos 482 crashes to Earth after 53 years in orbit

https://www.space.com/space-exploration/launches-spacecraft/failed-soviet-venus-lander-kosmos-482-crashes-to-earth-after-53-years-in-orbit
116•taubek•3d ago•72 comments

Launch HN: Miyagi (YC W25) turns YouTube videos into online, interactive courses

167•bestwillcui•14h ago•91 comments

Mipmap selection in too much detail

https://pema.dev/2025/05/09/mipmaps-too-much-detail/
7•luu•2d ago•3 comments

Multiple security issues in GNU Screen

https://www.openwall.com/lists/oss-security/2025/05/12/1
350•st_goliath•15h ago•215 comments

PDF to Text, a challenging problem

https://www.marginalia.nu/log/a_119_pdf/
243•ingve•12h ago•134 comments

Airbnb is in midlife crisis mode

https://www.wired.com/story/airbnb-is-in-midlife-crisis-mode-reinvention-app-services/
70•thomasjudge•8h ago•109 comments

How “The Great Gatsby” took over high school

https://www.newyorker.com/books/page-turner/how-the-great-gatsby-took-over-high-school
7•pseudolus•13h ago•1 comments

Map of Palaeohispanic Coins and Inscriptions

http://hesperia.ucm.es/consulta_hesperia/mapas.php
15•brendanashworth•3h ago•0 comments

Fingers wrinkle the same way every time they’re in the water too long

https://www.binghamton.edu/news/story/5547/do-your-fingers-wrinkle-the-same-way-every-time-youre-in-the-water-too-long-new-research-says-yes
90•gnabgib•3h ago•35 comments

A visual history of the safety pin

https://museumofeverydaylife.org/current-exhibitions/a-visual-history-of-the-safety-pin
13•andsoitis•2d ago•0 comments

It Awaits Your Experiments

https://www.rifters.com/crawl/?p=11511
139•pavel_lishin•12h ago•47 comments

DeepSeek’s founder is threatening US dominance in AI race

https://www.bloomberg.com/news/features/2025-05-13/deepseek-races-after-chatgpt-as-china-s-ai-industry-soars
43•blumpy22•2h ago•27 comments

Garbage collection of object storage at scale

https://www.warpstream.com/blog/taking-out-the-trash-garbage-collection-of-object-storage-at-massive-scale
50•ko_pivot•3d ago•8 comments

Coffee for people who don't like coffee

https://ostwilkens.se/blog/coffee
42•ostwilkens•3d ago•122 comments

How (memory) safe is Zig? (2021)

https://www.scattered-thoughts.net/writing/how-safe-is-zig/
31•vortex_ape•5h ago•28 comments

Replicube: A puzzle game about writing code to create shapes

https://store.steampowered.com/app/3401490/Replicube/
7•poetril•1h ago•0 comments

A tool to verify estimates, II: a flexible proof assistant

https://terrytao.wordpress.com/2025/05/09/a-tool-to-verify-estimates-ii-a-flexible-proof-assistant/
24•jjgreen•3d ago•0 comments

The world could run on older hardware if software optimization was a priority

https://twitter.com/ID_AA_Carmack/status/1922100771392520710
601•turrini•16h ago•569 comments

Cardiac: A CARDboard Illustrative Aid to Computation [pdf]

https://www.cs.drexel.edu/~bls96/museum/CARDIAC_manual.pdf
23•throwaway71271•5h ago•11 comments

OpenTelemetry protocol with Apache Arrow

https://opentelemetry.io/blog/2025/otel-arrow-phase-2/
67•tanelpoder•9h ago•14 comments

I learned Snobol and then wrote a toy Forth

https://ratfactor.com/snobol/
120•ingve•2d ago•31 comments

Using obscure graph theory to solve programming languages problems

https://reasonablypolymorphic.com/blog/solving-lcsa/
38•matt_d•7h ago•5 comments