Forge – Transform nested JSON into governed dbt models for BQ/Snowflake

https://forge.foxtrotcommunications.net/portal

1•brady_bastian•1w ago

Comments

brady_bastian•1w ago

Hi HN, I’m Brady. I’ve spent years watching data engineers burn out writing brittle parsers for nested JSON, only to have dashboards crash because an upstream API changed a field name.

I built Forge to solve this. It’s an autonomous data infrastructure platform that ingests raw, nested JSON and automatically generates production-ready dbt models.

The Problem Traditional ETL tools (Fivetran/Stitch) often dump raw JSON into a VARIANT or string column, leaving you to write complex parsing logic manually. This is expensive to query and hard to govern. If the schema changes, your SQL breaks.

What Forge Does Forge parses your JSON and compiles it into optimized, native tables for BigQuery, Snowflake, Databricks, and Redshift.

Deep Unnesting: It flattens arrays and objects 5+ levels deep into relational tables with proper keys. AI Classification (Excalibur): We use a Graph Neural Network (GraphSAGE) to classify data patterns (e.g., identifying "customer" vs. "inventory" data) without the data leaving your environment.

Auto-Governance (Pridwen): It detects PII and automatically applies hashing or masking policies based on the classification.

Multi-Warehouse Support: One JSON source generates native SQL for all supported warehouses simultaneously.

How it works Under the hood, Forge generates a full dbt project. You get the exact SQL code it generates, complete with lineage and documentation. We focused heavily on transparency—no black boxes.

Where we're going We are currently working on Llamrei (Q2 2026), which will handle schema evolution by automatically normalizing legacy API versions into "golden schemas" to prevent breaking changes.

We have a free tier (no credit card required) that lets you run full jobs to test the output.

I’d love to hear your feedback on the generated SQL structure and our approach to using GNNs for schema inference.

EVs Are a Failed Experiment

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

CCC (Claude's C Compiler) on Compiler Explorer

Homeland Security Spying on Reddit Users

Actors with Tokio (2021)

Can graph neural networks for biology realistically run on edge devices?

Deeper into the shareing of one air conditioner for 2 rooms

Weatherman introduces fruit-based authentication system to combat deep fakes

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

A Curated List of ML System Design Case Studies

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

Open Problems in Mechanistic Interpretability

Bye Bye Humanity: The Potential AMOC Collapse

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

Digital Iris [video]

Essential CDN: The CDN that lets you do more than JavaScript

They Hijacked Our Tech [video]

Vouch

HRL Labs in Malibu laying off 1/3 of their workforce

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: I built a Mac screen recorder Recap.Studio

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

Vectors and HNSW for Dummies

Sanskrit AI beats CleanRL SOTA by 125%

'Washington Post' CEO resigns after going AWOL during job cuts

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

TSMC to produce 3-nanometer chips in Japan

Quantization-Aware Distillation

List of Musical Genres