Show HN: CLI to score AI prompts after a prod failure

https://costguardai.io

1•techcam•1h ago

About six months ago I shipped a customer-facing feature where the system prompt had a subtle ambiguity in the instruction hierarchy. Within two days, users found a natural-language path that caused the model to ignore the safety constraint entirely.

It wasn’t a jailbreak — just phrasing I hadn’t anticipated. The prompt looked fine. It passed code review. It failed in production.

That made me realize how little tooling exists between “write a prompt” and “ship it.”

We have linters for code. We have type checkers. We have static analysis.

For prompts, we mostly have vibes.

So I built CostGuardAI.

npm install -g @camj78/costguardai costguardai analyze my-prompt.txt

It analyzes prompts across a few structural risk dimensions: - jailbreak / prompt injection surface - instruction hierarchy ambiguity - under-constrained outputs (hallucination risk) - conflicting directives - token cost + context usage

It outputs a CostGuardAI Safety Score (0–100, higher = safer) and shows what’s driving the risk.

Example:

CostGuardAI Safety Score: 58 (Warning)

Top Risk Drivers: - instruction ambiguity - missing output constraints - unconstrained role scope

The scoring isn’t trying to predict every failure — it’s closer to static analysis: catching structural patterns that correlate with prompts breaking in production.

If you want to see output before installing: https://costguardai.io/report/demo https://costguardai.io/benchmarks

I’m a solo founder and this is still early, but it’s already caught real issues in my own prompts.

Curious what HN thinks — especially from people working on prompt evals or LLM safety tooling.

Comments

techcam•1h ago

Happy to explain how the scoring works since that’s the obvious first question.

The core idea is:

Safety Score = 100 − riskScore

The risk score is based on structural prompt properties that tend to correlate with failures in production systems:

- instruction hierarchy ambiguity - conflicting directives (system vs user) - missing output constraints - unconstrained response scope - token cost / context pressure

Each factor contributes a weighted amount to the total risk score.

It’s not trying to predict exact model behavior — that’s not possible statically.

The goal is closer to a linter: flagging prompt structures that are more likely to break (injection, hallucination drift, ignored constraints, etc).

There’s also a lightweight pattern registry. If a prompt matches structural patterns seen in real jailbreak/injection cases (e.g. authority ambiguity), the score increases.

One thing that surprised me while building it: instruction hierarchy ambiguity caused more real-world failures than obvious injection patterns.

The CLI runs locally — no prompts are sent anywhere.

If you want to try it:

npm install -g @camj78/costguardai costguardai analyze your-prompt.txt

Curious what failure modes others here have seen in production prompts.

LLM simultaneous training and 3-way generating

LangChain is powerful, but running it in production isn't

Show HN: I turned GitHub's contribution graph into a life journal

LunarGate – Self-hosted AI gateway with EU privacy and zero leakage

Engineered bacteria can consume tumors from the inside out

Show HN: On-device meeting transcription for your Mac

Christopher Sims, Economist Who Taught the Data to Speak, Dies at 83

World's Most Private Voice Assistant

Cesar Chavez, a Civil Rights Icon, Is Accused of Abusing Girls for Years

Redpanda pushes the envelope on Nvidia Vera

Is Spotify's AI 'killing' Australian music?

Pimco Sees Private Credit Strains Triggering Wake-Up Call on Liquidity Risks

570k Lines of LLM Code Compiled Fine. It Was 20,171x Slower Than SQLite

Ask HN: How is your company managing internal AI agents?

Is there an AI garage startup path?

Show HN: Atria – terminal UI for managing multiple coding agents

Who want's to buy this anonymous messaging site in 1000 rupees

Polymarket gamblers threaten Israeli journalist over missile strike story

Show HN: WattSeal – PC power consumption monitor

Deno Employees Leave

The Vibe Thinker Bible

DarkSword: iOS Exploit Chain Adopted by Multiple Threat Actors

Users hate it, but age-check tech is coming

Autoproto – minimal C++ MTProto client library stripped from TDLib

Rapper Afroman's trial over using raid footage in music video enters second day

Geely Eyes Canadian Auto Market After Deal Allowing Chinese EVs

Another Forbes 30 Under 30 startup founder in trouble with the Feds for lying

Show HN: GitComet speedy Git GUI written in Rust end-to-end

Test in Prod or Live a Lie

Mining your team's PR reviews into automated code review rules