Show HN: E-commerce data from 100k stores that is refreshed daily

https://www.searchagora.com/data-connector

15•astronautmonkey•5mo ago

Hi HN! I'm building Agora, an AI search engine for e-commerce that returns results in under 300ms. We've indexed 30M products from 100k stores and made them easy to purchase using AI agents.

After launching here on HN, a large enterprise reached out to pay for access to the raw data. We serviced the contract manually to learn the exact workflow and then decided to productize the "Data Connector" to help us scale to more customers.

The Data Connector enables developers to select any of our 100k stores in the index, view sample data, format the output, and export the up-to-date data. Data can be exported as CSV or JSON.

We've built crawlers for Shopify, WooCommerce, Squarespace, Wix, and custom built stores to index the store information, product data, stock, reviews, and more. The primary technical challenge is to recrawl the entire dataset every 24 hours. We do this with a series of servers that "recrawl" different store-types with rotating local proxies and then add changes to a queue to be updated in our search index. Our primary database is Mongo and our search runs on self-hosted Meilisearch on high RAM servers.

My vision is to index the world's e-commerce data. I believe this will create market efficiencies for customers, developers, and merchants.

I'd love your feedback!

Comments

amcunicorns•5mo ago

Nice idea! Sounds like a lot of servers are needed to pull this off.

astronautmonkey•5mo ago

Thank you! And yes, the number of servers needed to scale from 100k to 1M stores (the next goal) will be significant.

eastbayjake•5mo ago

Couple thoughts for you:

(1) What are the use cases you envision? I can see the value for a really large marketplace in having a ton of pricing data, or the value to a hedge fund etc in having raw data to analyze macro trends... what is the use case for someone paying $200/month for the developer tier? (If I'm a retailer myself I probably only need data on my direct competitors, unless there's something cool you're imagining that I've failed to see.)

(2) You've got some logos on the store splash that don't show up in store search (eg Nike). Is that a data error or a coding error?

(3) You should probably think about how you scrape and categorize marketplace data... the Walmart tab has a lot of products that are clearly third-party sellers transacting via walmart.com, which pollutes quite a bit of the data value if I primarily want to know what a big retailer is doing on products where they actually set the prices.

(4) Have you looked at grocery data? Have wished someone would build a grocery prices API for like a decade now... lots of cool consumer and hedge-fund monetization opportunities if you can show the price of strawberries in every store across the US (and graph the trendlines over time).

astronautmonkey•5mo ago

Thanks for checking it out!

1. Here are the use-cases we've seen so far: marketplaces, search apps, fashion try-on apps, shopping agents, general purpose agents, web search for LLMs, e-commerce aggregators, hedge funds, etc. The most surprising has been new discovery experiences. Here's an example of an app that uses our data: https://www.forbes.com/sites/charliefink/2025/06/04/glance-a...

2. Great catch. We need to make this more clear on the site but we provide ~100k stores out of the box but keep the bigger brands behind an Enterprise paywall. We're working on fixing this.

3. Absolutely. We have purposely separated out search on the home page between our core index vs searching on Amazon, Walmart, etc. from within Agora. We haven't indexed products from the major marketplaces yet because of this challenge. Generally, we also focus on direct sellers and have filters in place with our crawler to parse out resellers.

4. Haven't looked at this but sounds interesting. And similar to how we think about storing e-commerce data with price history over time.

I'd love to chat more. I'm at param [at] searchagora.com if you want to reach out.

What Is Ruliology?

Jon Stewart – One of My Favorite People – What Now? With Trevor Noah Podcast [video]

P2P crypto exchange development company

Vocal Guide – belt sing without killing yourself

Write for Your Readers Even If They Are Agents

Knowledge-Creating LLMs

Maple Mono: Smooth your coding flow

Sid Meier's System for Real-Time Music Composition and Synthesis

Show HN: Slop News – HN front page now, but it's all slop

Show HN: Empusa – Visual debugger to catch and resume AI agent retry loops

Show HN: Bitcoin wallet on NXP SE050 secure element, Tor-only open source

White House Explores Opening Antitrust Probe on Homebuilders

Show HN: MindDraft – AI task app with smart actions and auto expense tracking

How do you estimate AI app development costs accurately?

Going Through Snowden Documents, Part 5

Show HN: MCP Server for TradeStation

Canada unveils auto industry plan in latest pivot away from US

The essential Reinhold Niebuhr: selected essays and addresses

Rentahuman.ai Turns Humans into On-Demand Labor for AI Agents

StovexGlobal – Compliance Gaps to Note

Show HN: Afelyon – Turns Jira tickets into production-ready PRs (multi-repo)

Trump says America should move on from Epstein – it may not be that easy

Tiny Clippy – A native Office Assistant built in Rust and egui

LegalArgumentException: From Courtrooms to Clojure – Sen [video]

US moves to deport 5-year-old detained in Minnesota

If you lose your passport in Austria, head for McDonald's Golden Arches

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

RFCs vs. READMEs: The Evolution of Protocols

Kanchipuram Saris and Thinking Machines

Chinese chemical supplier causes global baby formula recall