Launch HN: Airweave (YC X25) – Let agents search any app

72•lennertjansen•1h ago

Hey HN, we’re Lennert and Rauf. We’re building Airweave (https://airweave.ai/), an open-source tool that lets AI agents search and retrieve context from your existing apps and databases through a single LLM-friendly API (or an MCP server, if that’s your thing). Our Github is at https://github.com/airweave-ai/airweave. We previously did a Show HN https://news.ycombinator.com/item?id=43964201 and since then we’ve recently launched the managed service and new search functionality.

Here’s an example of Cursor using Airweave https://www.youtube.com/watch?v=IvxidK9Ciy4. And here’s a general example of our new search functionality: https://www.youtube.com/watch?v=iqEqc_iGUO8

We came to this problem while building agentic applications for webshop owners and customer service, and noticing most failure modes weren’t about tool execution, but finding the right internal context to enable the right actions.

We started solving, what seemed at the time, a problem for our own use case, and quickly fell into a rabbithole of issues. Company and user data lives across SaaS and databases; it’s sparse, messy, and constantly changing. Agents need a data orchestration and retrieval layer that accepts free-form natural language queries and returns actionable results quickly.

Simply pointing an agent at an MCP server does not equate to fine-grained search functionality or deep understanding of the underlying resource. Most MCP servers are thin wrappers that expose an existing API in a more LLM-friendly way, but this doesn’t actually give the agent any new capabilities beyond what the resource or app already offered. Specifically, it doesn’t give the agent a way to thoroughly search and understand the contents of the resource.

Airweave connects to sources via their APIs, crawls and normalizes content, chunks it, extracts entity relationships, and indexes the chunks in a vector store alongside keyword fields and lightweight graph metadata in Postgres. Data sync is orchestrated with Temporal (handles pagination/rate limits, schedules, and change detection via timestamps and content hashes) so collections stay close to real-time with their sources.

On retrieval, Airweave can run semantic and BM25 keyword search in parallel, fuse results (RRF), apply recency bias, and re-rank. Agents can fetch ranked chunks with citations or ask for a synthesized answer. The same interface is exposed via REST, Python/TS SDKs, and MCP so agents can discover it like any other tool.

It’s been fun to see what users have built with Airweave; from legal AI assistants to research discovery agents and context augmentation for coding agents. We’re currently experimenting with agentic search patterns, layering different types of enrichment and indexing, RBAC on indexed data, and streaming architectures.

If this is interesting to you, feel free to take it for a spin. Curious to hear your thoughts and feedback on the problem and our solution!

Comments

suprnurd•58m ago

Looks great! It's cool how you are able to unify multiple sources into a single searchable layer. I’m curious how you chose which connectors to support first (e.g. GitHub, Notion, Slack) and how you plan to scale connector coverage? Thanks!

lennertjansen•43m ago

it's currently guided by community feedback, github issues, and user talks. and we rely on private e2e test suites for maintaining quality as we scale coverage

EGreg•3m ago

"Give us access to any information on your computer."

And who is "us"?

"Well, our agents, of course. We'll send the information down to our servers, because -- surprise -- we have the GPU infrastructure to run it, and you don't. Don't worry, it's secure."

"Alright, well--"

https://www.wiz.io/blog/38-terabytes-of-private-data-acciden...

"Oops! Well don't worry, it's not like we're the first ones to sell your usage data..."

https://ferrumit.com/resources/it-s-now-legal-for-isps-to-se...

"You see! Well, just send us your DNA we'll analyze it -- with science! I mean with AI..."

"Alright, here is--"

https://www.nytimes.com/2025/05/19/business/regeneron-pharma...

"Oops! Well don't worry, it's not like the company that bought us will do anything with your data, that we wouldn't have done."

Here's my question...

1) How much can we feasibly run on a consumer-grade GPU today, on-board the computer, either the latest macbook or latest mobile iphone? Does Apple Metal + Silicon ship with any models that are on board the latest iOS 26?

2) How can we extend the security boundary to GPU servers that are attested black boxes that store data encrypted at rest, guaranteed not to train on it and are not owned by some corporation that can peek at the data?

Answering your top questions about Android developer verification

A 127-year old ham sandwich ekiben

Do you think Liquid Glass will be widespread outside the Apple ecosystem?

ColdFusion (2025)'s CFOAUTH Tag

Ruby Central's "security measures" leave front door wide open

Why is Linux still trash in 2025?

IP over Lasers

Save Quantum Computing from Regulation

AAUP vs. Rubio [pdf]

Hedge Funds Have to Be Big

Sneakernet

The bold gamble that helped Wiz CEO Assaf Rappaport win a $32B deal

Tunix: A JAX-native LLM Post-Training Library

White House Announces ‘TrumpRx’ Drug-Buying Site, and Pricing Deal With Pfizer

Shellshock

Sesame – Maya: A Personal Companion

Show HN: Desktop app to self-host static sites on a VPS without sysadmin skills

Ask HN: What are the major world tarot cards?

Show HN: A USB controller makes one flash drive 4 independent disks (no drivers)

Ask HN: If AI results in UBI, will everyone stop criticizing AI on social media?

New Framework for Auditable, Composable Reasoning (10 Years of Independent Work)

Show HN: Strix – Open-source alternative to XBOW

OpenAI's new social video app will let you deepfake your friends

OpenAI releases Sora 2

Boeing Has Started Working on a 737 MAX Replacement

Trump to announce 'TrumpRx' site for discounted drugs and deal with Pfizer

Never Bet Against America

Falsehoods Programmers Believe About Names

Pencil – Design Mode for Cursor

Large Language Muddle