Autonomous AI benchmark-testing

1•jodytornado•2h ago

Comments

jodytornado•2h ago

Show HN: I built a bare-metal OS with an autonomous reasoning engine in no_std Rust I've been working on this for the last 24 months as a solo developer and wanted to share what I've built and the results we just validated. What it is: PROTOS is a bare-metal operating system (no Linux, no Windows, no cloud) with an integrated reasoning engine called "EARL" (EPISTEMIC ADAPTIVE REASONING LAYER) that does autonomous multi-source intelligence fusion. Everything runs on a single laptop — no GPU, no network connection required. It's 174K+ lines of no_std Rust across 189 modules, including custom NVMe drivers, a filesystem, a knowledge base, the reasoning engine, autonomous learning systems, and safety controls. The core problem it solves: Current AI tools (LLMs included) can't explain their reasoning, give different answers to the same question, require cloud connectivity, hallucinate, and can't be audited. For defense and intelligence applications, this is a non-starter — DoD Directive 3000.09 requires that autonomous systems explain every decision with a verifiable audit trail. No existing AI system meets this standard for intelligence analysis. How it works: EARL uses symbolic reasoning rather than neural networks. It reads raw documents, automatically identifies entities (people, organizations, locations, weapons, financial transactions), discovers hidden connections across sources, learns new concepts from context, and detects when sources contradict each other. Every conclusion is backed by a cryptographically signed (SHA-256) evidence chain. Same inputs always produce the same outputs. What we just proved: We tested EARL against the buildup to Russia's 2022 invasion of Ukraine. We reconstructed 12 intelligence documents spanning satellite imagery analysis, intercepted comms, HUMINT reports, financial intelligence, OSINT, and technical weapons assessments from Oct 2021–Feb 2022. We defined 10 intelligence connections that Five Eyes agencies actually identified during this period (all verifiable against public sources — Maxar imagery, CRS reports, OSCE data, investigative journalism). EARL discovered 8 out of 10 autonomously. No training on the scenario, no pre-labeled data, no human guidance. It read raw text, built its own understanding, and independently arrived at conclusions that took the combined intelligence apparatus months to assemble. The system also correctly enforced safety constraints — connections below the confidence threshold were flagged but blocked from triggering autonomous action, exactly as 3000.09 requires. Technical details for the curious:

Pure no_std Rust, bare-metal execution Custom NVMe drivers and filesystem Symbolic reasoning engine (not neural/statistical) PMI-based knowledge representation with 15M+ edges Three-layer cognitive architecture: symbolic reasoning, metacognition, values/constraints Deterministic — reproducible results on every run Cryptographic forensics chain for full auditability Air-gap capable by design

My background: I'm a civil engineer who transitioned into systems programming after retirement. The technology has received preliminary validation from USASOC analysts who described the "reproducible reasoning" capability as disruptive. We're currently positioned for strategic acquisition and are open to strategic investment for formal DoD certification, team buildout, and field deployment optimization. Happy to answer technical questions about the architecture, the benchmark methodology, or the bare-metal Rust experience.

Can Elon Musk run AI in space?

Show HN: Vis Pro – A Formula-Based Workout Program Editor

Revisiting the Steam Controller

Opus 4.6 completed the Blender Donut Tutorial by watching it on YouTube

Devin 2.2

Show HN: Imsg-TUI – A Console App for Sending and Receiving iMessages

Host Leadership

Claude Code Remote Control

Manjaro website off-line again due to lapsed certificate

Agents of Chaos: a red team study of autonomous LLM agents with full access

Show HN: Datapoint – replacing mobile ads with data labelling tasks

What spec-driven development gets wrong

npm i chat – One codebase, every chat platform

The vulnerability of aging states (2023)

Show HN: Open-source EU AI Act compliance layer for AI agents (8/2026 deadline)

Continuous inhalation of essential oil increases gray matter volume in the brain

Influencers are promoting peptides for better health. What does the science say?

I got my phone bill down to $6.25/month after years of overpaying

Add drip email system with onboarding and coverage milestone emails

Agents of Chaos

Show HN: GenogramAI – Create Genograms in Seconds

Use Lyria 3 to create music tracks in the Gemini app

Show HN: Tools Are Lying to You

Show HN: Recall – A personal CRM you use over text messages

TAWS – The Amiga Workbench Simulation 0.40

Reframed – Open-source alternative to Screen Studio, have editor, auto-zoom

Show HN: MacCoolinator – Putting the "Cool" in Mac

Inequality aversion can be taught through learning of others' preferences

simple timezone tracker

The whole point of OpenAI's Responses API is to help them hide reasoning traces