ArkhamMirror: Airgapped investigation platform with CIA-style hypothesis testing

https://github.com/mantisfury/ArkhamMirror

168•ArkhamMirror•1mo ago

Comments

ArkhamMirror•1mo ago

I got tired of expensive SaaS tools that want my sensitive documents in their cloud. I built ArkhamMirror to do forensic document analysis 100% locally, free and open source.

What makes this different:

Air-gapped: Zero cloud dependencies. Uses local LLMs via LM Studio (Qwen, etc.)

ACH Methodology: Implements the CIA's "Analysis of Competing Hypotheses" technique which forces you to look for evidence that disproves your theories instead of confirming them

Corpus Integration: Import evidence directly from your documents with source links

Sensitivity Analysis: Shows which evidence is critical, so if it's wrong, would your conclusion change?

The ACH feature just dropped with an 8-step guided workflow, AI assistance at every stage, and PDF/Markdown/JSON export with AI disclosure flags. It's better than what any given 3-lettered agency uses.

Tech stack: Python/Reflex (React frontend), PostgreSQL, Qdrant (vectors), Redis (job queue), PaddleOCR, Spacy NER, BGE-M3 embeddings.

All MIT licensed. Happy to answer questions about the methodology or implementation! Intelligence for anyone.

Links: Repo https://github.com/mantisfury/ArkhamMirror

ACH guide with screenshots at https://github.com/mantisfury/ArkhamMirror/blob/reflex-dev/d...

V__•1mo ago

What field are you in, sounds interesting that one would need such a tool?

cess11•1mo ago

Description on the repo says it's for journalism, but I build similar rigs that I use for research in companies that have entered bankruptcy proceedings.

Commonly there is a lot of information and it might as well be unstructured, and then I need to get answers quickly because my clients aren't going to pay me for going about it slowly.

ArkhamMirror•1mo ago

It's mainly useful for journalism purposes, yes. Audit and compliance uses were also a consideration. It's a unified tool for right now, but I'm working on turning the base of it into the frame and adding individual shards for specialized applications.

ArkhamMirror•1mo ago

It's not just for people doing interesting things. It just helps people answer questions about stuff. The stuff can be interesting or boring or dangerous or silly. The last question I tested the ACH tool on was "Did William Shakespeare really author all of the works he was credited for?" - You can use this stuff to research whatever you want. That's the point of it - it's no one's business what you are interested in getting to the bottom of.

btown•1mo ago

I can say, from a business perspective, I've needed to use similar methodologies, though far from needing air-gap requirements and relying heavily on web search, to evaluate potentially fraudulent transactions and relationships between parties.

What are the competing hypotheses, other than fraud, when a person makes a massive luxury purchase, but with red-flag-adjacent inconsistencies in other information provided? If we need to identify whether there's a weird or competitive ownership relationship behind a potential opportunity, how do we determine if an initial hypothesis about relationships is correct?

If ArkhamMirror has an online mode with web search as a tool call, I'd be curious to try it out to automate some of these ACH-adjacent workflows.

ArkhamMirror•1mo ago

It doesn't have an online mode yet - although there's a lot of stuff in the works. However, since docker and LM Studio are already included in the setup, you can turn on MCP Toolkit in Docker and add the Docker MCP to LM Studio. With Docker Toolkit on, you get access to over 300 different MCPs for your local LLM including web search via DuckDuckGo or Brave Search, automation tools like n8n, web manipulation stuff with playwright, and all sorts of potentially useful stuff. (not a sponsor :P) Then your "local" LLM can suddenly do all sorts of agential stuff. This isn't out-of-the box capability, since I'm only building offline, local, privacy-focused features at the moment, but turning it on isn't a huge undertaking. If you are up for messing with some prompts in the files, you could even specify to the LLM what tool you want it to use for which task if it's not automatically using them when the need arises.

daft_pink•1mo ago

Ironically, is there a way to try this out in the cloud for people who want this tool who aren’t hyper worried about security?

It looks cool.

ArkhamMirror•1mo ago

Thanks, glad to hear it!

Short answer - no, not right now.

However, instead of going through locally hosted docker and local LLMs, you could reroute it wherever you like, but I don't have a cloud option set up at this time.

I'm focused on the developing the local, private applications myself, but nothing is stopping someone from hooking it up to stronger cloud-based stuff if they want.

The good news is that my plans for this include making it more modular, so people have better options for what it does and how powerful it is.

ArkhamMirror•1mo ago

There is now a standalone ACH tool you can try online in-browser. You can download it and run it locally with local llm, or you can use it in the browser and plug in a Groq or OpenAI API key.

gslepak•1mo ago

Excellent! Thank you for releasing this.

Notice the "Knowledge Graph" feature that lets you "Visualize hidden connections between People, Orgs, and Places" just like the cork board meme.

This is the essence of what good "conspiracy theorists" do. Whenever investigative journalists uncover a conspiracy among the elite, they are talked down to and dismissed as "conspiracy theorists". But that is what good conspiracy theorists are: investigative journalists.

ArkhamMirror•1mo ago

For sure - "conspiracy theorists" are just another group of people trying to find truth, patterns in the world and trying to connect the dots. The cork board feel was very much intentional in some of the visualizations. Specifically, the "lie web" visualization that uses "red yarn" visuals to connect detected contradictions across different entities and documents.

If I had the skills, I would totally map that onto a cork board.

truemotive•1mo ago

I've been poking at something like this for a while now, almost exactly the same intention. I'm gonna try and contribute instead, this rules.

ArkhamMirror•1mo ago

I'm very glad to hear that!

Let me tell you this - This version of the toolkit is pretty monolithic and reflex is kind of a pain to work with for me. This version of the tool will be polished from here, but I hesitate to add more features to it since it already has like 35 pages of features.

I'm about to release another version of the tool that's focused on modularity, so you anyone can mix and match the features they want instead of having to take the whole thing or nothing. ACH is going to be the first addon thing added, followed by the rest of the features.

ckbkr10•1mo ago

The idea is good. I do think that is going to be the future for high volume data leaks like the Snowden or Epstein files.

I do think though that this approach will become annoying quick:

https://github.com/mantisfury/ArkhamMirror/blob/main/scripts...

ArkhamMirror•1mo ago

The cheesy noir persona is for the AI assisted install and that's it. Inside the app, the prompts are strictly business. (They still have roles, but not "characters" or "personas").

Theofrastus•1mo ago

It's always interesting to stumble upon a bubble you never heard of.

This is super interesting. I will probably (hopefully?) never need to use it, but interesting nonetheless. It also makes sense to have this type of application airgapped. Journalists need to have near-perfect OPSEC depending on what they are working on.

ArkhamMirror•1mo ago

Thanks for the interest! I agree, the less people that need it the better, but I want it to exist just in case.

ArkhamMirror•1mo ago

In case it wasn't clear, the ACH update is on the reflex-dev branch -

https://github.com/mantisfury/ArkhamMirror/tree/reflex-dev

Garlef•1mo ago

I'm wondering if the ACH Methodology could be used as a general purpose Chain-of-Thought variant.

ArkhamMirror•1mo ago

Probably - LLMs definitely benefit from having decision-making frameworks. ACH is a super-widely useful tool, so I don't see why you couldn't tune an AI with it too.

smallerfish•1mo ago

A video demo would be useful. I can't really tell how much the application is doing from the screenshots. Is it a tool with some smart guidance, or is it doing deep magic?

ArkhamMirror•1mo ago

I didn't think a video would be very exciting. It did feel like deep magic when I tested it though. For the scenario in the screenshots, I provided the question, "Did we really land a man on the moon?" and the null hypothesis "We landed on the moon in 1969", and the low value piece of evidence "My dad told me he saw Stanley Kubrick's moon landing set one time and he never lies." Literally everything else the LLM generated on demand for me based on its existing training data, offline. It gave me hypotheses, challenges, evidence, filled out the matrix, did the calculations, everything.

darkwater•1mo ago

And the answer was... ? :)

ArkhamMirror•1mo ago

Well, based on the evidence provided against our competing hypotheses, The least problematic hypothesis is that we landed on the moon in 1969. Second least problematic hypothesis was "The Apollo 11 mission was a hoax staged by NASA and the U.S. government for public relations and Cold War propaganda, but the moon landing itself was real — only the public narrative was fabricated." Third least problematic was "The Apollo 11 mission was a real event, but the moon landing was not achieved by humans — it was an automated robotic mission that was misinterpreted or falsely attributed to astronauts due to technical errors or media misreporting." - The winning hypothesis had a score of 0 (lower is better), second place had a score of 6 (out of possible 10 for our evidence set), and third place had a score of 8. There was also a tie for 4th place "It was just a government coverup to protect the firmament. There is no "outer space."" and "The Apollo 11 mission never occurred; all evidence — including photos, video, and lunar rocks — was fabricated in secret laboratories using early 20th-century special effects and staged experiments, possibly by a small group of scientists and engineers working under government contract." - both of these scored 10 out of 10, making them the most problematic. Sorry guys.

stocksinsmocks•1mo ago

Yeah, I’m pretty sure you need to sharpen your pencil on this one if the conclusion was that the Apollo program was legitimate.

ArkhamMirror•1mo ago

I'm sure if the right evidence were submitted and run against the right hypotheses a different frontrunner could emerge. Remember - this is a tool to help you investigate better and figure out what to look for, not a tool that tells you the answer. It helps you eliminate unlikely answers more than it ever points at the "right" answer, and even the most unlikely answers can still be the "right" ones! Hang in there

afro88•1mo ago

> Literally everything else the LLM generated on demand for me based on its existing training data, offline

That's a ton of scope for hallucinations, surely?

ArkhamMirror•1mo ago

It would be enough to drive most local LLMs crazy if it tried to generate it all at once or if it was all part of one long session, but it's set up so the LLM doesn't have to produce much at a time. I only batch in small groups (like it will generate only 3 suggestions per request) and the session is refreshed between calls, and the output is generally force structured to fit correctly into the expected format. You can, however, ask for new batches of suggestions or conflicts or evidence more than once. Hallucinations can happen for any LLM use of course, but if they break the expected structure the output is generally thrown out. Even the matrix scoring suggestion - it works on the whole row, but behind the scenes the LLM is asked to return one response in one "chat" session per column, and then they are all entered at the same time once all of them have been individually returned. That way, if the LLM does hallucinate for the score, it outputs a neutral response for that cell and doesn't corrupt any of the neighboring cells.

If you use a smaller model with smaller context, it might be more prone to hallucinations and provide less nuanced suggestions, but the default model seems to be able to handle the jobs pretty well without having to regenerate output very often (it does happen sometimes, but it just means you have to run it again.) Also, depending on the model, you might get less variety or creativity in suggestions. It's definitely not perfect, and it definitely shouldn't be trusted to replace human judgement.

nilamo•1mo ago

That logo is like concentric rings of power around Galadriel's seer-pool, looking at... Hogwarts?

ArkhamMirror•1mo ago

Supposed to be the Mirror of Galadriel showing Arkham asylum. Just joshing on Palantir Gotham a little bit

sloped•1mo ago

This looks interesting, and honestly makes me want to fire up The Roottrees are Dead and see if I can use this to solve the second act.

ArkhamMirror•1mo ago

That would be a cool test - let me know if you decide to do it!

ChrisbyMe•1mo ago

Interesting tool, do you have some domain knowledge as an analyst or something similar? I've always been curious what research tools analysts are using outside of like, Google.

ArkhamMirror•1mo ago

Thank you, I'm glad it's gathered some interest!

I don't have any background as an analyst or anything like that. ACH is a real tool, really used by the CIA, and the existing versions are basically crappy spreadsheets, or not free, or both.

I don't doubt someone with coding skills could do it better, it's just that no one else has stepped up. Probably because there's no profit angle, but that's conjecture on my part.

0xdeadbeefbabe•1mo ago

Doesn't ACH also constrain hypothesis generation in certain ways?

ArkhamMirror•1mo ago

The ACH method actively encourages you to start off with any and all plausible explanations and eliminate them as you go along, but the AI suggestions are definitely more limited than what a human could come up with.

There are LLM limitations on the call to generate hypotheses to return them in a certain format and to return a certain number of them, and that sort of thing, so it's usually in your best interest to use the LLM as more of an assistant to check if you missed anything or for a push to get started looking in different directions more than having the AI doing the whole thing (although if you are being lazy or don't know what to do, you could let the LLM do pretty much everything - I pretty much let the LLM handle everything it could in testing.)

ajcp•1mo ago

This is very compelling, very nice work!

I really would like to know how good this would be for a corporate Internal Audit workflow/professional.

ArkhamMirror•1mo ago

This feature update is all about ACH, but there are several other functions that might also be of use for doing audit or compliance work.

Is there any particular function you had in mind?

ArkhamMirror can also scan your corpus for near duplicates, clusters, can check for signs of people using copy-paste in their work, find designated red flags, regex data, and that sort of thing. It's really generalized for as many use cases as possible at this stage, and I'm about to start working on modularity for specialization soon, so feel free to make suggestions on how you'd want to use it.

gosub100•1mo ago

Is this "investigation platform" any different from legal "e-discovery" software products? It's a great accomplishment either way, but I am posting so other people know that lawyers use this stuff all the time and there are many (paid) off the shelf options.

ArkhamMirror•1mo ago

There's a lot of potential for overlaps in features - e-discovery is one of the core concepts behind this platform, definitely.

Also, it's true that a lot of the existing tools that do similar things are anything but free.

I can imagine most or all of the things ArkhamMirror does are done elsewhere by other programs and tools. I don't know of any unclassified projects that do ACH better, but that's a pretty niche tool, and the government loves their 20-year-old software solutions.

Off-the-shelf programs designed for use by lawyers have layers of protections built in to make sure they are suitable for court-use. I don't make any claims as to the legal utilities of this program whatsoever. In fact, the ACH PDF report generated specifically calls attention to the AI-generated nature of the materials and warns against using any data generated or entered without human review and approval.

That said, you can make some pretty cool, non-legally useful, connections with tools like author unmask, where you feed the system docs by a known author and run them against docs written by an unknown or suspected alias to check for similar voice. During ingestion, the system automatically yanks all detected Regex data and puts it into a nice sortable, searchable list for you.

Legal e-discovery products are going to be highly polished, reliable programs designed to be used in a legal setting, while ArkhamMirror is designed to be used while you sit in your faraday cage in your hacker cabin in the woods with no Wi-Fi.

No shade intended - my stuff's not nearly as pretty or as well-put together as a decent off-the-shelf e-discovery program and I'm not trying to imply that it's better in any way, it's just differently aligned.

jerlendds•1mo ago

Beautiful work and it's always nice to see new projects in these spaces! I'm the creator of OSINTBuddy which is a somewhat similar project if you squint haha. We've just recently finished porting our web app to an electron binary (unreleased) for people who perform sensitive investigations (aka we have encryption at rest via Turso database) and collaboration features will be done via WebRTC + a signalling server.

I'm loving the approach you took to the UI! I had some similar ideas in mind and plan to build narrative reconstruction and timeline view tools too so it's really nice to see how others have done so! I'll definitely be following your work and I shared your project in the OSINTBuddy discord to hopefully get some more eyes on it :)

Great work, I hope you keep at it :)

ArkhamMirror•1mo ago

That's awesome, thank you so much for getting more eyeballs on it!

My approach to security so far has been to keep it air-gapped and include a nukeitfromorbit.bat that will do everything but physically destroy your SSD to keep your privacy intact.

The narrative reconstruction tool was pretty fun to make, and it's been impressive in testing, but the real test will be if it actually helps someone in a real investigation.

If you see anything in my project that could help your project, then that's awesome news to me!

I'm definitely going to keep working, and hopefully soon it's going to do some pretty cool stuff. All the best to you and OSINTBuddy

VerifiedReports•1mo ago

This looks very interesting. I already have Python and Docker set up the way I want. Will the installer mess with them?

ArkhamMirror•1mo ago

TL;DR is it should be fine, and thank you!

There's an isolated venv/ in the project folder, so no global packages or system python mods.

If your python is 3.11+, the install should recognize it. If you have 3.10 or lower, it's going to prompt you to install 3.11 for the project environment through winget or python.org. If you are running multiple pythons, it uses py -3.11 to pick the version.

For Docker, the app is going to want you to already have docker running, and will want to make and utilize 3 containers (PostgresSQL, Qdrant, Redis) in their own isolated docker-compose project. It uses nonstandard ports, but there could be conflicts there if you have stuff running on 5435, 6343/6344 or 6380. The backend wants to run at 8000, and the frontend wants to run at 3000, so those could conflict potentially as well.

The script is going to check if docker is running - if it is, you should be set. If it's not, it's going to prompt you to start it up.

Nothing in the install should touch your docker daemon config or your existing containers.

Let me know how it works for you!

VerifiedReports•1mo ago

Great, thanks! I don't know much about Python or Docker, for that matter. But I just learned about and installed uv for Python management, and I have used Orbstack for containers in the past because I'm on Mac and the Docker Desktop blows.

I do development on my machine, so I like to control its environment deliberately.

ArkhamMirror•1mo ago

You're very welcome

I get it - pretty much everything I've been working with to build this platform is basically brand new to me, or just brand new in general, so I have to be wary of how I do things too.

zero0529•1mo ago

I know it is supposed to be airgapped but can’t this be dockerized ?

ArkhamMirror•1mo ago

I dockerized most of what I could, the rest of it I put in a data silo for ease of destruction when necessary. Is there something in particular you think would work better or be more secure being added to the docker setup?

DoctorOetker•1mo ago

In which files can we find the code relating to hypothesis testing (p-value calculation, null hypothesis specification, formal negation of the null hypothesis, ..)?

ArkhamMirror•1mo ago

Good question. TL;DR is you won't find those here.

ACH is a different sort of hypotheses testing tool. This tool does not do formal "Statistical Hypothesis Testing" using the formal statistical methods that would be used for the rigorous testing required in the scientific world. There is a null hypothesis found in ACH, but that's about the only real crossover. So, you won't be able to find any p-values, t-tests, chi squares or any statistical significance at all in this.

My Analysis of Competing Hypotheses tool uses the methodology created by Richard Heur - it's an intelligence analysis methodology designed to be used by analysts who need a way to avoid their own biases as much as possible when investigating a question. There's no p-value calcs, no actual statistical significance, or any real math at all (the only math involves adding up rows of values from -2 to 2 basically, and then putting them in order).

DoctorOetker•1mo ago

The whole reason hypothesis testing is used in science, is to prevent the exact same biases deluding ourselves: is this a spontaneous statistical cluster of events, or is it a statistically significant deviation from our understanding of the world?

Is what you posted really how the CIA works?? I think lots of taxpayers would want their money back...

ArkhamMirror•1mo ago

We've added a standalone ACH tool you can use locally or even try in-browser:

https://mantisfury.github.io/ArkhamMirror/ach/

Enjoy

Show HN: A unique twist on Tetris and block puzzle

The logs I never read

How to use AI with expressive writing without generating AI slop

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

Cppsp v1.4.5–custom pattern-driven, nested, namespace-scoped templates

The next frontier in weight-loss drugs: one-time gene therapy

At Age 25, Wikipedia Refuses to Evolve

Show HN: ReviewReact – AI review responses inside Google Maps ($19/mo)

Why AlphaTensor Failed at 3x3 Matrix Multiplication: The Anchor Barrier

Ask HN: How much of your token use is fixing the bugs Claude Code causes?

Show HN: Agents – Sync MCP Configs Across Claude, Cursor, Codex Automatically

Hello

FSD helped save my father's life during a heart attack

Show HN: Writtte – Draft and publish articles without reformatting, anywhere

Portuguese icon (FROM A CAN) makes a simple meal (Canned Fish Files) [video]

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

Transcribe your aunts post cards with Gemini 3 Pro

.72% Variance Lance

ReKindle – web-based operating system designed specifically for E-ink devices

Encrypt It

NextMatch – 5-minute video speed dating to reduce ghosting

Personalizing esketamine treatment in TRD and TRBD

SpaceKit.xyz – a browser‑native VM for decentralized compute

NotebookLM: The AI that only learns from you

Show HN: An open-source starter kit for developing with Postgres and ClickHouse

Game Boy Advance d-pad capacitor measurements

South Korean crypto firm accidentally sends $44B in bitcoins to users

Apache Poison Fountain

Web.whatsapp.com appears to be having issues syncing and sending messages

Google in Your Terminal

Show HN: A unique twist on Tetris and block puzzle

The logs I never read

How to use AI with expressive writing without generating AI slop

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

Cppsp v1.4.5–custom pattern-driven, nested, namespace-scoped templates

The next frontier in weight-loss drugs: one-time gene therapy

At Age 25, Wikipedia Refuses to Evolve

Show HN: ReviewReact – AI review responses inside Google Maps ($19/mo)

Why AlphaTensor Failed at 3x3 Matrix Multiplication: The Anchor Barrier

Ask HN: How much of your token use is fixing the bugs Claude Code causes?

Show HN: Agents – Sync MCP Configs Across Claude, Cursor, Codex Automatically

Hello

FSD helped save my father's life during a heart attack

Show HN: Writtte – Draft and publish articles without reformatting, anywhere

Portuguese icon (FROM A CAN) makes a simple meal (Canned Fish Files) [video]

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

Transcribe your aunts post cards with Gemini 3 Pro

.72% Variance Lance

ReKindle – web-based operating system designed specifically for E-ink devices

Encrypt It

NextMatch – 5-minute video speed dating to reduce ghosting

Personalizing esketamine treatment in TRD and TRBD

SpaceKit.xyz – a browser‑native VM for decentralized compute

NotebookLM: The AI that only learns from you

Show HN: An open-source starter kit for developing with Postgres and ClickHouse

Game Boy Advance d-pad capacitor measurements

South Korean crypto firm accidentally sends $44B in bitcoins to users

Apache Poison Fountain

Web.whatsapp.com appears to be having issues syncing and sending messages

Google in Your Terminal

ArkhamMirror: Airgapped investigation platform with CIA-style hypothesis testing

Comments