When something goes wrong in a large codebase, the hardest part is usually not fixing it, but knowing where you should touch — and where you shouldn't.
In real development work, we usually know one of two things:
- Something has already broken, or - Nothing has broken yet, but something feels fragile.
What we usually don't know is:
- Which parts of the system are structurally safe - Which changes are likely to amplify risk
Reading all the code is unrealistic. So we started by making this judgment visible.
## What MistSeeker actually does
MistSeeker is not a bug-finding tool.
The question we're trying to answer is simpler:
Is this code structurally suitable for change, or is it likely to fail during modification?
To answer that, we evaluate code from three independent perspectives.
1) COI — Structural fitness
COI looks at how code is organized.
- How responsibilities are divided - How deeply logic is nested - How much structural duplication or entanglement exists
A high COI does not mean "perfect code." It means the structure is less likely to cause unexpected ripple effects when changed.
Low-COI code, on the other hand, often turns small edits into wide-reaching consequences.
2) ORI — Execution stability
ORI focuses on behavior, not structure.
- Hidden I/O dependencies - Global state mutations - Logic dependent on time, randomness, or environment
Code can look clean and well-organized, yet still be fragile at runtime. ORI surfaces these invisible execution risks.
3) GSS — Semantic stability
GSS addresses a pattern that appears frequently in AI-assisted coding environments:
Code works correctly and passes tests, but its intent collapses easily with small changes.
MistSeeker does not claim to "understand" code semantics. Instead, it measures how much structural and behavioral change is triggered by small edits.
If minor modifications cause disproportionate shifts, GSS drops. This pattern appears often in generated code or after repeated refactoring.
## What the scores tell you
Each file or module ends up with a profile:
- Is it structurally fit for change? - Is it risky from an execution standpoint? - How easily does its meaning break when modified?
From this, we derive a single stability score (GSI) and a risk level.
The goal is not to rank code. There is only one question we want to answer:
When upgrades or refactoring are needed, where is the safest place to start? And which areas require extra caution?
## Try it (no signup)
HN readers: 5-day Pro evaluation key (no credit card required)
If the command fails, pull the image directly. Windows (CMD/PowerShell): see the install guide on our site. Just set the license key to the value below: License key: 716f3617b11685ba1af36bea74f929a3 Docker image: tongro2025/mistseeker
## Where this has been useful
In practice, it helped in situations like:
- Setting refactoring priorities instead of changing things blindly - Reviewing AI-generated code changes - Identifying areas that should not be touched during upgrades - Finding structurally fragile areas even when tests pass
## What this is not
- It does not replace linters - It is not a bug detector - It is not an auto-fix tool
MistSeeker is not a mechanic. It's a map.
## Why I’m sharing this
I'm curious about others' experiences.
- How do you decide what is "safe to change" in large codebases? - Have you had systems that passed tests but became increasingly hard to modify? - Does the idea of structural fitness and change risk resonate with you?
Opinions, counterarguments, and real-world examples are all welcome. If useful, I'm also happy to discuss boundaries and limitations.
guide / project(manual) page: https://convia.vip
Convia•1h ago
A concrete eval method: run it on a file that caused trouble in the past, then on the same file after patch/refactor. Before vs. after tends to be clearer than looking at scores in isolation.
Tech notes: multi-language via tree-sitter (Python, JS/TS, Java, Go, Rust, C/C++, etc.), 100% local (no telemetry / no external APIs), deterministic (no LLMs for the scores).
Happy to answer questions about the metrics.