The core problem I’m trying to solve is the "black box" nature of LLM research. Most agents just give you a final answer; GIA is designed so that every claim must have traceable support.
Some technical choices I made for this project:
Filesystem-first architecture: Instead of keeping state in memory, the pipeline writes durable artifacts (Markdown and JSON) to a project folder at every stage. This makes the entire thought process inspectable and allows you to re-run "gates" deterministically.
Schema-first contracts: I’m using JSON schemas as strict contracts between agents. If an agent’s output doesn’t hit the schema, the "gate" blocks the workflow.
Safety & Sandboxing: Since the pipeline can generate and execute analysis scripts, it runs them in a subprocess with isolated Python mode (-I) and a minimal allowlist. It’s not a full jail, but it’s a step toward safer autonomous code execution.
The "Referee" System: I’ve implemented a series of "Referees" (Agents A12–A15) that act as a quality control layer, checking for contradictions and style enforcement before the final draft is produced.
Current Status: This is very much a work in progress. It’s a prototype pipeline, not a finished product. I’m currently looking for contributors to help refine the "Evidence Layer" and the LaTeX paper structuring.
I’d love to hear your thoughts on the architecture, especially the use of schema-driven gates for grounding LLM outputs.