I do it that way and then I hooked it up with the Telegram API. I’m able to ask things like “What’s my passport number?” and it just works.
Combine it with git and you have a Datomic-esque way of seeing facts getting added and retracted simply by traversing the commits.
I arrived to the solution after trying more complex triplets-based approach and seeing that plain text-files + HTTP calls work as good and are human (and AI) friendly.
The main disadvantage is having unstructured data, but for content that fits inside the LLM context window, it doesn’t matter practically speaking. And even then, when context starts being the limiting factor, you can start segmenting by categories or start using embeddings.
Specifically, it’s a file that contains a list of Entity-Attribute-Value assertions in triplets.
It’s called “FACTS.md” and each line represents a fact. Such as “<OP>, PASSPORT_NUMBER, <VALUE>”
Then put it in context, ask question, and then use Telegram API and suddenly I have a “Private ChatGPT” that’s aware of my filesystem, can run my own binaries/tools, and has access to a private document store.
It gets cool once you add function calling to open images on demand (or any type of file) with vision capabilities/OCR and you start running shell commands and combining that with many media types from Telegram.
Funny enough, I called the project “COO” initially. Been thinking of writing up something about it.
I think it’s a no brainer and I’m confident OpenAI, Claude, and Notion will go there.
In the meantime, I have good-ol’ vi, .md/.txt, and HTTP/SMTP!
- Metadata based match (I’ve done that with search system in the past)
- Embedding base match (false positive is definite consideration
- Using knowledge graph itself to do entity resolution before feeding the entities to graph next
Add human in the loop to guide entity resolutions.
What do you think? Would love to learn your thoughts:)
Neo4j also supports building embedding leveraging more information in the graph in addition to single node's property: https://neo4j.com/docs/graph-data-science/current/machine-le... (It's hard to incrementally compute them, but users can still compute them after the graph is built)
looking forward to learn your thoughts :)
Like the example "CocoIndex supports Incremental Processing" becomes the subject/predicate/object triple (CocoIndex, supports, Incremental Processing)... so what? Are you going to look up "Incremental Processing" and get a list of related entities? That's not a term that is well enough defined to be meaningful across a variety of subjects. I can incrementally process my sandwich by taking small bites.
I guess you could actually expand "Incremental Processing" to some full definition. But then it's not really a knowledge graph because the only entity ever associated with that new definition will be CocoIndex, and you are back to a single sentence that contains the information, you've just pretended it's structured. ("Supports" hardly a well-defined term either!)
I can _kind of_ see how knowledge graphs can be used for limited relationships. If you want to map companies to board members, and board members to family members, etc. Very clearly and formally defined entities (like a person or company), with clearly defined relationships (board member, brother, etc). I still don't know how _useful_ the result is, but at least I can understand the validity of the model. But for everything else... am I missing something?
Or, I have a docker container image that is built from multiple base images owned by different teams in my organization. Who is responsible for fixing security vulnerabilities introduced by each layer?
We really could model these as tables but getting into all those joins makes things so cumbersome. Plus visualizing these things in a graph map is very compelling for presentation and persuading stakeholders to make security decisions.
- Structured data - this is probably more close to the use case you mention
- Unstructure data and extract relationship and build KG with natural language understanding - which is this article trying to explore. Here is a paper discussing about this https://arxiv.org/abs/2409.13731
In general it is an alternative way to establish connections with entities easily. And these relationships could help with discovery, recommendation and retrieval. Thanks @alexchantavy for sharing use-cases in security.
Would love to learn more from the community :)
People reach for a database, and of course you need that, but for one thing the data certainly doesn't always come in a nice tabular format, and moreover you often don't know which piece of knowledge will become relevant for a question you care about - maybe two people worked together at the Kings Bay Mining Company, and then the was there accident in 1962, but uncle Hans was inspector at Wilhelmsen etc. Often you make progress because you remember niche geographical or historical information.
Like RAG, it decouples KG size from context size, but unlike RAG, a KG offers deduplication and relational traversal. Some searches based on just similarity or keywords fail when the relation is functional. Both KG and RAG work better when the LLM is planning the search process, doing multiple searches, basing each one off the previous one. In the last few months LLMs have gotten great at exploration with search tools.
I implemented my own KG recently and I put both search and node generation in the hands of the LLM, as MCP tools. The cool trick is that when I instruct the LLM to generate a node it links to previous nodes using inline references (like @45). So I get the graph structure for free. I think coupling RAG with a KG allows for both breadth and precise control. The RAG is assimilating unstructured chunks, the KG is mapping the corpus. All done with human in the loop to guide the process.
https://en.wikipedia.org/wiki/Semantic_triple
They're useful for storing social network graph data, for example, and can be expressed using standards like Open Graph and JSONAPI:
I've stored RDF triples in database tables and experimented with query concepts from neo4j:
https://neo4j.com/docs/getting-started/data-modeling/tutoria...
These are straightforward to translate to SQL but the syntax can get messy due to not always having foreign keys available and hitting limitations with polymorphic relationships. Some object-relational mapping (ORM) frameworks help with this:
https://laravel.com/docs/12.x/eloquent-relationships#polymor...
I feel that document-oriented databases like MongoDB jumped the gun a bit, and would have preferred to have had graph-oriented or key-value-oriented databases providing row/column/document oriented queries and views. Going the other way feels a bit kludgy to me:
https://www.mongodb.com/resources/basics/databases/mongodb-g...
Basically Set Theory internally with multiple query languages externally and indexed by default.
Oh and have all writes generate an event stream like Firebase does so we can easily build reactive apps.
You have no graphs, no concepts, no nothing
[...]
You never understood the meaning of concept
Lyrics are full of depth and ideas connect
[...]
You can only dream to write like I write, I might
Ignite, confuse and leave you blinded by the light
'Cause I been working on graphs, concepts and all of that
Making it difficult for those who might try to follow that
Mark B & Blade – The Way It Has To Be
gorpy7•1mo ago
gorpy7•1mo ago
cipehr•1mo ago
If so thats crazy, and I would love pointers on how to prompt it to suggest this?
Onawa•1mo ago
marviel•1mo ago