We are building a pipeline to convert unstructured documents into normalized Postgres rows.
The core struggle isn't extraction; it's the collision between a probabilistic model and a deterministic database. We are stuck in a vicious cycle:
Strict Schema (FKs, NOT NULL): The LLM gets "scared" of the constraints. To satisfy the schema validator, it starts hallucinating fake Foreign Key IDs or inventing default values just to get the JSON to parse.
Loose Schema (JSONB/Text): The LLM returns garbage data (invented Enums, inconsistent date formats) that requires massive post-processing code to clean up.
We are currently maintaining a 400-line system prompt that is essentially "Postgres constraints re-written in English."
Has anyone solved this reliability problem without a human-in-the-loop reviewing every INSERT?
Yarden_Bruch_El•36m ago
The core struggle isn't extraction; it's the collision between a probabilistic model and a deterministic database. We are stuck in a vicious cycle:
Strict Schema (FKs, NOT NULL): The LLM gets "scared" of the constraints. To satisfy the schema validator, it starts hallucinating fake Foreign Key IDs or inventing default values just to get the JSON to parse.
Loose Schema (JSONB/Text): The LLM returns garbage data (invented Enums, inconsistent date formats) that requires massive post-processing code to clean up.
We are currently maintaining a 400-line system prompt that is essentially "Postgres constraints re-written in English."
Has anyone solved this reliability problem without a human-in-the-loop reviewing every INSERT?