I built dbt-llm-evals – a dbt package that evaluates LLM outputs using your data warehouse's native AI functions (Snowflake Cortex, BigQuery Vertex, Databricks).
The problem:
We're running AI analytics using native warehouse functions, but had no good way to monitor if our LLM outputs were actually correct without sending data to external eval services.
Most monitoring tools want you to:
- Send data to external APIs
- Set up separate infrastructure
- Manually configure and track baselines
For warehouse-native AI workflows, that felt backwards.
How it works:
dbt-llm-evals runs evaluations where your data already lives.
It uses your warehouse's native AI functions, automatically detects baselines from your data, and includes monitoring/alerting. No additional infrastructure needed.
fdileta•1h ago
I built dbt-llm-evals – a dbt package that evaluates LLM outputs using your data warehouse's native AI functions (Snowflake Cortex, BigQuery Vertex, Databricks).
The problem: We're running AI analytics using native warehouse functions, but had no good way to monitor if our LLM outputs were actually correct without sending data to external eval services.
Most monitoring tools want you to: - Send data to external APIs - Set up separate infrastructure - Manually configure and track baselines
For warehouse-native AI workflows, that felt backwards.
How it works: dbt-llm-evals runs evaluations where your data already lives.
It uses your warehouse's native AI functions, automatically detects baselines from your data, and includes monitoring/alerting. No additional infrastructure needed.
Try it here: https://github.com/paradime-io/dbt-llm-evals