A couple months into using Codex heavily, I realized I had delegated too much of a data pipeline without really tracking the details. When the model results degraded, I traced it back to feature-processing decisions that had quietly changed across iterations. The mistake was fixable. The uncomfortable part was realizing I no longer knew exactly where I had stopped following the logic.
MindCheck reads local AI conversation logs from Claude Code, Cursor, Codex CLI, and Gemini CLI. It breaks sessions down by task type and heuristically estimates how much of the work involved active reasoning versus delegation.
The goal is not to reduce AI usage. I’m trying to make the boundary visible: where I’m still forming hypotheses, testing ideas, pushing back, and owning the direction of the work — versus where I’m mostly prompting and accepting.
For me, the task breakdown was the most useful part. Data analysis still looked relatively engaged, but planning and writing were much more delegated than I expected.
By default, Tier 2 runs locally using embeddings. Optional Tier 3 refinement only runs if explicitly configured, and then only low-confidence individual user messages are sent — not full sessions or AI responses.
The scoring is heuristic — it won’t tell you whether you understand the work, but it can show where you started handing the reasoning off. I’d be interested in feedback from people using AI coding tools heavily: does this kind of delegation map seem useful, and what signals would make it more trustworthy?