The work is rigorous. Someone with serious credentials has engaged and asked substantive questions. The systems function as designed. But I can't point to the traditional markers that would establish legitimacy—degrees, publications, years of experience in the field.
This isn't about whether AI "did the work." I made every decision, evaluated every output, iterated through hundreds of refinements. The AI was a tool that compressed what would have taken years of formal education into months of intensive, directed learning and execution.
Here's what interests me: We're entering a period where traditional signals of competence—credentials, institutional validation, experience markers—no longer reliably predict capability. Someone can now build sophisticated systems, conduct rigorous analysis, and produce novel insights without any of the credentials that historically signaled those abilities. The gap between "can do" and "should be trusted to do" is widening rapidly.
The old gatekeeping mechanisms are breaking down faster than new ones are forming. When credentials stop being reliable indicators of competence, what replaces them? How do we collectively establish legitimacy for knowledge and capability?
This isn't just theoretical—it's happening right now, at scale. Every day, more people are building things and doing work they have no formal qualification to do. And some of that work is genuinely good.
What frameworks should we use to evaluate competence when the traditional signals are becoming obsolete? How do we establish new language around expertise when terms like "expert," "rigorous," and "qualified" have been so diluted they've lost discriminatory power?
thenaturalist•9h ago
The one difference between "can do" and "should be trusted to do" is the ability to systematically prove that "can do" holds up close to 100% of task instances and under adverserial conditions.
Hacking and pentesting are already scaling fully autonomously - and systematically.
For now, lower level targets aren't yet attractive as such scale requires sophisticated (state) actors, but that is going to change.
So building systems that white-hat prove your code is not only functional but competent are going to be critical not to be ripped apart by black-hat later on.
One nice example that applies this quite nicely is roborev [0] by the legendary Wes McKinney.
0: https://github.com/roborev-dev/roborev
falsework•9h ago
But I think there's a distinction worth making between technical robustness (does the code have vulnerabilities?) and epistemic legitimacy (should we trust the analysis/conclusions?).
Pentesting and formal verification can tell us whether a system is secure or functions correctly. That's increasingly automatable and credential-independent because the code either survives adversarial conditions or it doesn't.
But what about domains where validation is murkier? Cross-domain analysis, research synthesis, strategic thinking, design decisions? These require judgment calls where "correct" isn't binary. The work can be rigorous and well-reasoned without being formally provable.
The roborev example is interesting because code review is somewhat amenable to systematic validation. But we're also seeing AI collaboration extend into domains where adversarial testing isn't cleanly applicable—policy analysis, theoretical frameworks, creative work with analytical components.
I wonder if we need different validation frameworks for different types of work. Technical systems: adversarial testing and formal verification. Analytical/intellectual work: something else entirely. But what?
The deeper question: when the barrier to producing superficially plausible work drops to near-zero, how do we distinguish genuinely rigorous thinking from sophisticated-sounding nonsense? Credentials were a (flawed) heuristic for that. What replaces them in domains where adversarial testing doesn't apply?