Knows plenty of arcane commands in addition to the common ones, which is really cool & lets it do amazing things for you, the user.
To the author: most of your audience knows what MCP is, may I suggest adding a tl;dr to help people quickly understand what you've done?
I am actually more interested in improving the debugger interface. For example, AI assistant could help me create breakpoint commands that nicely print function parameters when you only partly know the function signature and do not have symbols. I used Claude/Gemini for such tasks and they were pretty good at it.
As a side note, I recall Kevin Gosse also implemented a WinDbg extension [1][2] which used OpenAI API to interpret the debugger command output.
results["info"] = session.send_command(".lastevent")
results["exception"] = session.send_command("!analyze -v")
results["modules"] = session.send_command("lm")
results["threads"] = session.send_command("~")
You cannot debug a crash dump only with these 4 commands, all the time.Just had a quick look at the code: https://github.com/svnscha/mcp-windbg/blob/main/src/mcp_serv...
I might be wrong, but at first glance I don't think it is only using those 4 commands. It might be using them internally to get context to pass to the AI agent, but it looks like it exposes:
- open_windbg_dump
- run_windbg_cmd
- close_windbg_dump
- list_windbg_dumps
The most interesting one is "run_windbg_cmd" because it might allow the MCP server to send whatever the AI agent wants. E.g: elif name == "run_windbg_cmd":
args = RunWindbgCmdParams(**arguments)
session = get_or_create_session(
args.dump_path, cdb_path, symbols_path, timeout, verbose
)
output = session.send_command(args.command)
return [TextContent(
type="text",
text=f"Command: {args.command}\n\nOutput:\n```\n" + "\n".join(output) + "\n```"
)]
(edit: formatting)After that, all that is required is interpreting the results and connecting it with the source code.
Still impressive at first glance, but I wonder how well it works with a more complex example (like a crash in the Windows kernel due to a broken driver, for example)
imo what AI needs to debug is either:
- train with RL to use breakpoints + debugger or to do print debugging, but that'll suck because chains of action are super freaking long and also we know how it goes with AI memory currently, it's not great
- a sort of omniscient debugger always on that can inform the AI of all that the program/services did (sentry-like observability but on steroids). And then the AI would just search within that and find the root cause
none of the two approaches are going to be easy to make happen but imo if we all spend 10+ hours every week debugging that's worth the shot
that's why currently I'm working on approach 2. I made a time travel debugger/observability engine for JS/Python and I'm currently working on plugging it into AI context the most efficiently possible so it debugs even super long sequences of actions in dev & prod hopefully one day
it's super WIP and not self-hostable yet but if you want to check it out: https://ariana.dev/
These kind of demos were cool 2 years ago - then we got function calling in the API, it became super easy to build this stuff - and the reality hit that LLMs were kind of shit and unreliable at using even the most basic tools. Like oh woow you can get a toy example working on it and suddenly it's a "natural language interface to WinDBG".
I am excited about progress in this front in any domain - but FFS show actual progress or something interesting. Show me an article like this [1] where the LLM did anything useful. Or just show what you did that's not "oh I built a wrapper on a CLI" - did you fine tune the model to get better performance ? Did you compare which model performs better by setting up some benchmark and found one to be impressive ?
I am not shitting on OP here because it's fine to share what you're doing and get excited about it - maybe this is step one, but why the f** is this a front page article ?
[1]https://cookieplmonster.github.io/2025/04/23/gta-san-andreas...
real, quality AI breakthrough in software creation & maintenance will require deep rework of many layers in the software stack, low and high level.
Like you said, running over a stream of events, states and data for that debugging scenario is probably way more helpful. It would also be great to prime the context with business rules and history for the company. Otherwise LLMs will make the same mistake devs make, not knowing the "why" something is and thinking the "what" is most important.
How does it compare to using the Ghidra MCP server?
I for one enjoy crashdump analysis because it is a technically demanding rare skill. I know I'm an exception but I enjoy actually learning the stuff so I can deterministically produce the desired result! I even apply it to other parts of the job, like learning to currently used programming language and actually reading the documentation libraries/frameworks, instead of copy pasting solutions from the "shortcut du jour" like stack overflow yesterday and LLMs of today!
Analyzing crash dumps is a small part of my job. I know enough to examine exception context records and associated stack traces and 80% of the time, that’s enough. Bruce Dawson’s blog has a lot of great stuff but it’s pretty advanced.
I’m looking for material to help me jump that gap.
Zebfross•5h ago
cjbprime•3h ago
voidspark•2h ago