1. Overview
This project enables deep code analysis with Large Language Models. By constructing a Neo4j-based Graph RAG, it enables developers and AI agents to perform complex, multi-layered queries on C/C++ codebases that traditional search tools simply can't handle. With only 4 MCP APIs and a vanilla agent, it is already able to accomplish lots of tasks related to the codebases.
2. How it works
Using clangd and clang, the system parses and indices your source files to create a high-fidelity code graph. It captures everything from high-level folder structures to granular relationships, including entities like Folders, Files, Namespaces, Classes/Structs, Variables, Methods, etc.; relationships like: CALLS, INCLUDES, INHERITS, OVERRIDES, and more.
The system generates summaries and embeddings for every level of the codebase (from functions up to entire folders) using a bottom-up approach. This structured context helps AI agents understand the "big picture" without getting lost in the syntax.
To get you started easily, the project includes: an example MCP (Model Context Protocol) server, and a demonstration AI agent to showcase the graph’s power. You can easily build your own custom agents and servers on top of the graph RAG.
3. Efficiency & Performance
Incremental Updates: The system detects changes between commits and updates only what’s necessary. Parallel Processing: Parsing and summary generation are distributed across worker processes with optimized data sharing. Smart Caching: Results are cached to minimize redundant computations, saving you both time and LLM costs.
4. A benchmark: The Linux Kernel
When building a code graph for the Linux kernel (WSL2 release) on a workstation (12 cores, 64GB RAM), it takes about ~4 hours using 10 parallel worker processes, with peak memory usage at ~36GB. Note this process does not include the summary generation, and the total time may vary based on your LLM provider.
artigent•1h ago
This project is by no means a replacement for the clangd language server used in IDEs. Instead, it is designed to complement it by enabling LLMs to perform deep architectural analysis. While clangd handles real-time coding assistance, this tool focuses on high-level reasoning, such as mapping project workflows, tracing complex call paths, and understanding system-wide architecture.