frontpage.

Show HN: Helply – AI support agents with guaranteed results

https://helply.com

1•jscheel•1h ago

Hey everyone, I’m the head of engineering at GrooveHQ. We’ve spent the last 10+ years building customer support tools, and today we’re launching Helply. Helply is an AI customer support agent that we are so confident in, we’ve backed it up with a guaranteed resolution rate.

AI support agents live and die by the data they have access to and the way they finesse the customer experience. Basic automations worked well in the past, but now people expect more.

Here are some semi-technical notes:

--- 1) Controlled data syncing ---

We wanted tight integrations with existing helpdesks, but also needed to give users flexibility over what the agent actually knows. Building integrations that sync data while still being tweakable turned out to be harder than expected.

People want one source of truth, right up until they need an exception. We built a sync policy system that lets us manage sync and overwrite behaviors for external data.

The data itself is synced via Airbyte’s pyairbyte package. I ended up working with the pyairbyte team on some performance issues around memory leaks with certain types of pagination on large datasets.

--- 2) Retrieval, escalation, and conversation behavior ---

Users hate when they have to fight with an AI agent, especially when they know that the AI agent can’t help. We wanted our AI agent to search well, but also know when it was time to gracefully bow out.

At first it was too eager to give up. We didn’t want hallucinations, so it would just fail with “I don’t know.” The experience sucked.

We found our retrieval pipeline was subpar. We built evals and found some quick wins, but the agent was still too eager to escalate. We considered fine-tuning, but first built more evals. In the end, better retrieval, tools, and prompts were enough.

We also added an action engine that lets users define custom actions. It uses a two-tiered approach: a pre-qualifying model decides whether an action is appropriate, and only then are tools invoked. Too many tools up front made the agent worse, so this tiered staging helped a lot.

--- 3) Analysis of historical ticket data ---

We built a pipeline that analyzes historical ticket data, creates derivatives, and evaluates the agent’s performance against them. It then finds gaps in your agent’s knowledge that you can fill with AI-generated answers from ticket history. Doing this safely, reliably, and at scale has been a challenge.

We use Dagster to run these jobs, but are looking at moving everything to DBOS (which we use in other products and love). Ticket ingestion relies on Airbyte connectors running via pyairbyte. Some integrations use Airbyte’s connectors, others are our own built on their declarative schema.

We use JinaAI for embeddings, reduce dimensionality with UMAP, and currently cluster in-memory with FAISS (though we plan to move away from FAISS). The pipeline runs a small optimization step to tune UMAP and clustering parameters.

This system is powerful but still a wip. We’ve got a new version coming, but it isn’t quite ready for primetime yet.

--- 4) Taste vs evals ---

You hear a lot about taste in AI. We believe taste is incredibly important. Evals helped us catch obvious failures, but nothing can replace taste.

We worked closely with our pre-release customers to develop it. We have so, so many external VIP Slack channels open. I’m really thankful for their input.

--- 5) Compliance ---

Compliance is the C-word for small tech companies, and we jumped straight into SOC2 Type II. Because we’re a multi-product company, scoping everything appropriately was tricky.

We’re using Vanta to manage most of this, but there’s still a lot of manual grunt work that isn’t automated. I know several folks are trying to simplify this for startups… good luck to them.

---

I’m really proud of what our team has accomplished, and I’m excited to share it. Happy to answer any questions, technical or otherwise. You can sign up and try it directly on the site.

Does Intermittent Fasting Live Up to the Hype?

2 days left to comment on DOT's plan to hike US fuel costs by $23B

Released: Ace-Step 1.5: Pushing the Boundaries of Open-Source Music Generation

Dual N-Back

FERC: Renewables made up 88% of new US power generating capacity to Nov 2025

Four theories about the SpaceX – xAI merger

Bruce Schneier: AI and the scaling of betrayal

Pornhub shuts off access to new UK users, citing age verification constraints

BYD's next-gen megawatt charger leaks: 1,500 kW vs. 1k kW first gen

In Under 500 Words, a Judge Weaponized Wit to Free the Child Detained by ICE

Postgres managed by ClickHouse

Life without good internet is boring

Where the Work Goes When Agents Arrive

Mad Rust: The JVM Developer's Journey. Kotlin/Java Developer's Road to Valhalla

Show HN: Tenuo – Capability-Based Authorization (Macaroons for AI Agents)

Show HN: Real-world speedrun timer that auto-ticks via vision on smart glasses

My deep thoughts and considered opinions on AI

Why are Spain and Portugal growing twice as fast as the Eurozone?

Elon Musk is taking SpaceX's minority shareholders for a ride

PayPal Appoints Enrique Lores as Chief Executive Officer

Elevated error rates for ChatGPT users – OpenAI Status

5M installs, $1M Open Source Grant program, and the story of how we got here

Ruptures in China's Leadership Could Be Due to Paranoia and Power Plays

Nava Acquires Beam to Raise the Bar for Public Service IT

Epstein Backed Coinbase in Crypto Exchange's Early Years

Rules_Claude: Hermetic Bazel toolchain and rules for Claude Code

Future home might be framed with printed plastic

Fintech CEO and Forbes 30 Under 30 alum has been charged for alleged fraud

Net Neutrality for AI

When Vibe Coded Consumer Agents Go Rogue