frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
214•isitcontent•12h ago•25 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
319•vecti•14h ago•141 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•24m ago•1 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
271•eljojo•15h ago•159 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
69•phreda4•12h ago•13 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
90•antves•1d ago•66 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
16•denuoweb•1d ago•2 comments

Show HN: Compile-Time Vibe Coding

https://github.com/Michael-JB/vibecode
9•michaelchicory•1h ago•1 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
47•nwparker•1d ago•11 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
150•bsgeraci•1d ago•63 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
17•NathanFlurry•20h ago•7 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
8•keepamovin•2h ago•2 comments

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

https://apps.apple.com/us/app/fitspire-5-minute-workout/id6758784938
2•devavinoth12•5h ago•0 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
23•JoshPurtell•1d ago•5 comments

Show HN: I built a RAG engine to search Singaporean laws

https://github.com/adityaprasad-sudo/Explore-Singapore
4•ambitious_potat•6h ago•4 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
14•toborrm9•17h ago•7 comments

Show HN: Sem – Semantic diffs and patches for Git

https://ataraxy-labs.github.io/sem/
2•rs545837•7h ago•1 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
172•vkazanov•2d ago•49 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
25•dchu17•17h ago•12 comments

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

https://rahuljaguste.github.io/Nethack_Falcons_Eye/
4•rahuljaguste•11h ago•1 comments

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

https://github.com/AGDNoob/FastLog
5•AGDNoob•8h ago•1 comments

Show HN: Local task classifier and dispatcher on RTX 3080

https://github.com/resilientworkflowsentinel/resilient-workflow-sentinel
25•Shubham_Amb•1d ago•2 comments

Show HN: Gohpts tproxy with arp spoofing and sniffing got a new update

https://github.com/shadowy-pycoder/go-http-proxy-to-socks
2•shadowy-pycoder•9h ago•0 comments

Show HN: I built a directory of $1M+ in free credits for startups

https://startupperks.directory
4•osmansiddique•9h ago•0 comments

Show HN: A Kubernetes Operator to Validate Jupyter Notebooks in MLOps

https://github.com/tosin2013/jupyter-notebook-validator-operator
2•takinosh•10h ago•0 comments

Show HN: A password system with no database, no sync, and nothing to breach

https://bastion-enclave.vercel.app
11•KevinChasse•17h ago•16 comments

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

https://github.com/SawyerHood/gitclaw
9•sawyerjhood•18h ago•0 comments

Show HN: 33rpm – A vinyl screensaver for macOS that syncs to your music

https://33rpm.noonpacific.com/
3•kaniksu•11h ago•0 comments

Show HN: Chiptune Tracker

https://chiptunes.netlify.app
3•iamdan•11h ago•1 comments

Show HN: Craftplan – I built my wife a production management tool for her bakery

https://github.com/puemos/craftplan
567•deofoo•5d ago•166 comments
Open in hackernews

Show HN: CommerceTXT – An open standard for AI shopping context (like llms.txt)

https://commercetxt.org/
20•tsazan•1mo ago
Hi HN, author here.

I built CommerceTXT because I got tired of the fragility of extracting pricing and inventory data from HTML. AI agents currently waste ~8k tokens just to parse a product page, only to hallucinate the price or miss the fact that it's "Out of Stock".

CommerceTXT is a strict, read-only text protocol (CC0 Public Domain) designed to give agents deterministic ground truth. Think of it as `robots.txt` + `llms.txt` but structured specifically for transactions.

Key technical decisions v1.0:

1. *Fractal Architecture:* Root -> Category -> Product files. Agents only fetch what they need (saves bandwidth/tokens).

2. *Strictly Read-Only:* v1.0 intentionally excludes transactions/actions to avoid security nightmares. It's purely context.

3. *Token Efficiency:* A typical product definition is ~380 tokens vs ~8,500 for the HTML equivalent.

4. *Anti-Hallucination:* Includes directives like @INVENTORY with timestamps and @REVIEWS with verification sources.

The spec is live and open. I'd love your feedback on the directive structure and especially on the "Trust & Verification" concepts we're exploring.

Spec: https://github.com/commercetxt/commercetxt Website: https://commercetxt.org

Comments

reddalo•1mo ago
We should stop polluting website roots with these files (including llms.txt).

All these files should be registered with IANA and put under the .well-known namespace.

https://en.wikipedia.org/wiki/Well-known_URI

tsazan•1mo ago
I understand the theoretical argument.

We follow the precedent of robots.txt, ads.txt, and llms.txt.

The reason is friction. Platforms like Shopify and Wix make .well-known folders difficult or impossible for merchants to configure. Root files work everywhere.

Adoption matters more than namespace hygiene.

JimDabell•1mo ago
How about following the precedent of all of these users of /.well-known/

https://en.wikipedia.org/wiki/Well-known_URI#List_of_well-kn...

robots.txt was created three decades ago, when we didn’t know any better.

Moving llms.txt to /.well-known/ is literally issue #2 for llms.txt

https://github.com/AnswerDotAI/llms-txt/issues/2

Please stop polluting the web.

tsazan•1mo ago
I prioritize simplicity and adoption for non-technical users over strict IETF compliance right now. My goal is to make this work for a shop owner on Shopify and Wix, not just for sysadmins.

That said, I am open to supporting .well-known as a secondary location in v1.1 if the community wants it.

xemdetia•1mo ago
How is using a standard path 'just for sysadmins' again? You are introducing something new today.
tsazan•1mo ago
Try uploading a file to /.well-known/ on Shopify or Wix. You cannot. Their file managers block hidden directories (starting with a dot). To do it, you need a custom app, a meta-field hack, or a reverse proxy. That is sysadmin work. Uploading a file to the root is user work. That is the difference.
reddalo•1mo ago
So the solution is that both Shopify and Wix should let the user edit those files. That's it.
tsazan•1mo ago
Agreed. In a perfect world, they would. But I cannot merge PRs into Shopify's core. Waiting for trillion-dollar corporations to change their security models is a death sentence for a new protocol. We build for the infrastructure that exists today, not the one we wish for. When they open the gates, we will move. Until then, we live in the root.
hrimfaxi•1mo ago
How are comments handled on proposals? Are you the final authority on the standard or is there some community consensus?
tsazan•1mo ago
I am the initiator, not the dictator. Governance is defined in Section 13: Contributing & Governance. Decisions are made by consensus of the Working Group. Right now, we are bootstrapping. I make the initial calls to ship v1.0, but the roadmap involves the community. I invite you to open an Issue.
robotstxtwasbad•1mo ago
RFC 8820 section 2.3, also known as BCP 190, is not a theoretical argument. That's why we call it a "best current practice".

You, nor any other standard like you, are not entitled to declare what a root URI means in my Web namespace, and you are up against an IETF MUST NOT by fighting for this. This is a _very_ philosophical argument, not a practical one, and it's why I'm firmly against your standard out of the gate (and would work to reject it as, say, an RFC).

The same paragraph takes you to RFC 8615, which is the .well-known you are being told to use. That is not your "secondary location" for v1.1. That is the only path you are permitted to consider as someone with intent to standardize a portion of the HTTP URI namespace. The decades-old precedent you are citing here, and leaning on as foundational, was rejected at the philosophical level by the IETF, and it is completely rejected as appropriate precedent for the writing of standards going forward.

You are being told how the Web works. It's not about you, the magical universe of agentic, or your community. You are attempting to standardize a part of the technical commons. If you want the public to obey your standard, this isn't the way to engage while selling it -- despite it being CC0, you're phasing in and out of "my" standard and "our" standard a little oddly, and you're a little standoffish to (correct) feedback, feedback that in this case is existential to your project making it to a dozen stars and a discussion.

Wix and Shopify have zero bearing on the standardization of the Web. Companies in general shouldn't, in fact (har har), which is useful background for an aspiring standards writer.

tsazan•1mo ago
I appreciate the detailed feedback (and the edits).

You are technically correct regarding IETF norms.

But you say: "Wix and Shopify have zero bearing on the standardization of the Web."

I fundamentally disagree. The Web is not just a namespace for engineers; it is an economy for millions of small businesses. If a standard is technically "pure" but unusable by 80% of merchants on hosted platforms, it fails the Web.

However, to respect the namespace: We will mandate checking /.well-known/commerce.txt first.

But we will keep the root location as a fallback. We prioritize accessibility for the "aspiring" shop owner over strict purity for the standards writer.

robotstxtwasbad•1mo ago
If you fundamentally disagree with that, you are simply never going to deliver a workable standard via the IETF process. Yeah, yeah, SPDY, QUIC, elephants in rooms, I realize what I'm saying, but that doesn't mean I'm wrong about this. Commerce is a _subset_ of what happens on the _technical_ Web, standards for which must consider all users (and the arena you're now playing in). We haven't even gotten to the merits yet, or how you collide with Open Graph philosophically, etc. This is just one piece of technical feedback, and I'm discouraged by your approach to it.

Thankfully, you've licensed your work CC0, so someone who wants to see this standardized could simply fork your work, fix the offending parts, and move for successful standardization without you.

You really gotta stop saying "we," too, like, it's a nit, but it speaks to your long-term intentions. You're here to build a community around an effort you've singlehandedly spearheaded over the last few weeks (I can read GitHub). Claiming you have one already, and there's Big Discussion on these points, is pretty transparent. You and I both know where you're at in the lifecycle, and that you definitely have room to consider the feedback being offered.

tsazan•1mo ago
The CC0 license is not a bug. It is a feature. If you fork this and build a standard that helps merchants better, the mission succeeds. I will be the first to applaud. As for "We": It is an invitation, not a pretension. A standard cannot be a solo act. I am bootstrapping the working group. You are welcome to join it, disagreements and all.
robotstxtwasbad•1mo ago
Thank you for the invitation. One of the things you realize reading and writing a lot of standards -- and I really don't mean that to be condescending towards you, promise -- is that there's a certain orthodoxy to the whole thing regarding keeping an arm's length from commerce.

Consider C#. Yeah, yeah, we all know the provenance of the language, that what ECMA has standardized is basically a Microsoft specification, but once it's an ECMA standard it's Something Else. Competitors can work on it together, and we're all fine with that. Carrying on C# development in the open is harder for Microsoft in some ways, and easier for them in others. This opinion is about ten years old, mind you, and speaks more to the origin of C# (I'm not a practitioner), so I'm sure the Core stuff has changed all of this and made me look silly saying this, but that speaks to my point -- work evolves in public. But they work on it, their competitors work on it, randoms like you and me work on it, and everybody benefits.

Say I work at Apple. I tell my boss I had lunch with a Samsung guy, I might get a side eye. I tell my boss I had lunch with a Samsung guy because we're collaborating on some revision to SSD TRIM or something, it's oh, cool. That's the orthodoxy. Look at, like, WebKit threads before the schism (itself very relevant to this point, in fact). It's extremely important to even _attain_ public standards and collaboration that we all suspend the rules of commerce and competition and conflict and all that. You're arguing the opposite in saying the words "Wix" or "Shopify" should be anywhere near influencing the effort you're proposing. Step back practically, even, and ask yourself: "why should every Web operator deal with some standards crap due to a Shopify product decision? Why is /llms.txt or /products.txt or /yourthing.txt a new land mine for an unsuspecting nginx admin to find?"

There's a collaborating on the common good that should be inherent to the production of shared standards of humanity. Much like science, and their centuries of wrestling with this very point in colorful ways. The Internet is one of humanity's most important inventions, and getting trillion-dollar caps to agree on how to operate it is so incredibly fragile.

If you try to argue with me that because Wix and Shopify both have stupid designs that remove control over a URI from a Web author, I should relax my belief that standardization efforts are fundamentally an activity agnostic of commerce itself, I'd rather gnaw off my left leg than collaborate with a group you lead. We're just going to fight too much. I don't mean this to be disrespectful, for the record, I'm only trying to vividly illustrate how far apart philosophically that seemingly minor opinion places us.

And sure, you're addressing commerce as a subject matter, but one of the ways to lift this from idea to standard is realize the generality behind your effort ("things" available here, not items available for purchase, i.e., philosophically Open Graph's approach, one of the few ways I see your work succeeding).

tsazan•1mo ago
I respect that orthodoxy. It is the bedrock that allows the Internet to function. But we are optimizing for different variables. You optimize for architectural purity on a timeline of decades. You protect the namespace from temporary corporate flaws. I optimize for utility on a timeline of now. I want the flower shop owner to be visible to AI today, even if their platform is rigid. We have different North Stars. That is okay. You guard the temple. I will help the merchants outside. No leg-gnawing required. Thank you for the perspective.
amitav1•1mo ago
Wait, am I dumb, or did the authors hallucinate? @INVENTORY says that 42 are in stock, but the text says "Only 3 left". Am I misunderstanding this or does stock mean something else?
tsazan•1mo ago
Good eye. This demonstrates the protocol’s core feature.

The raw data shows 42. We used @SEMANTIC_LOGIC to force a limit of 3. The AI obeys the developer's rules, not just the CSV.

We failed to mention this context. It causes confusion. We are changing it to 42.

nebezb•1mo ago
Ah, so dark patterns then. Baked right into your standard.
tsazan•1mo ago
Not dark patterns. Operational logic.

Physical stock rarely equals sellable stock. Items sit in abandoned carts. Or are held as safety buffers. If you have 42 items and 39 are reserved, telling the user "42 available" is the lie. It causes overselling.

The protocol allows the developer to define the sellable reality.

Crucially, we anticipated abuse. See Section 9: Cross-Verification.

If an agent detects systematic manipulation (fake urgency that contradicts checkout data), the merchant suffers a Trust Score penalty. The protocol is designed to penalize dark patterns, not enable them.

hrimfaxi•1mo ago
Who maintains this trust score? How is it communicated to other agents?
tsazan•1mo ago
There is no central authority. The Trust Score is a conceptual framework, not a shared database. Each AI platform (OpenAI, Anthropic, Google) builds its own model. They retain full discretion. Agents do not talk to each other. They talk to users. If a score is low, the agent warns the user. It adds caveats or drops the recommendation. It does not broadcast to other bots.
duskdozer•1mo ago
I'm not sure I understand the point of this as opposed to something like a json file, and also, assuming there is any type of structured format, why one would use an LLM for this task instead of a normal parser.
tsazan•1mo ago
You assume JSON is a standalone file. It rarely is.

Even if it were, JSON is verbose. Every bracket and quote costs tokens.

In reality, the data is buried in 1MB+ of HTML. You download a haystack to find a needle.

We fetch a standalone text file. It cuts the syntax tax. It is pure signal.

xemdetia•1mo ago
I believe what the commenter is suggesting is that since this is supposed to be machine readable then why not start with a common format like JSON similar to how things like MCP serve what functions are available or an OpenAPI spec. Generate the JSON and serve that from the well known directory.

People serve plain JSON all the time. This proposed standard is essentially a structured file anyway.. why not YAML? Why not INI? Getting away from bespoke unicorn file formats has been good for everyone.

tsazan•1mo ago
JSON is great for code. It is heavy and deeply nested for Agents. The constraint is the context window. Brackets, quotes, and nesting are token tax. YAML is brittle. Whitespace errors break parsers. We chose the robots.txt model. It is dense and resilient. It is not a unicorn. It is a workhorse.
throwaway_20357•1mo ago
Can shops not just embed Schema/JSON-LD in the page if they want their information to be machine readable?
tsazan•1mo ago
That is the current standard. But it is hard for agents to read efficiently. To access JSON-LD, an agent must download the entire HTML page. This creates a haystack problem where you download 2MB of noise just to find 5KB of data.

Even then, you pay a syntax tax. JSON is verbose. Brackets and quotes waste valuable context window. Furthermore, the standard lacks behavior. JSON-LD lists facts but lacks instructions on how to sell (like @SEMANTIC_LOGIC). CommerceTXT is a fast lane. It does not replace JSON-LD. It optimizes it.

inerte•1mo ago
Wouldn't be easier on everybody (servers and clients) to just expose Structured Data in a text file then? And add the 1 or 2 things it doesn't have?
tsazan•1mo ago
That solves bandwidth. It fails on tokens. JSON syntax is heavy. Brackets and quotes consume context window. More importantly, Schema.org is a dictionary of facts. It lacks behavior. It defines what a product is, but not how to sell it. It has no concept of @SEMANTIC_LOGIC or @BRAND_VOICE. We need a format that carries both data and instructions efficiently. JSON-LD is too verbose and too static for that.
reddalo•1mo ago
> JSON syntax is heavy.

I'd say it's not heavy. JSON syntax is pretty lean compared to XML.

tsazan•1mo ago
JSON is lean for data exchange between machines. But in the LLM economy, the currency is tokens, not bytes. To an LLM tokenizer, every bracket and quote is a distinct cost. In our tests, this 'syntax tax' accounts for up to 30% of the payload. We chose a line-oriented format to minimize overhead and maximize the context window for actual commerce data.
tjhorner•1mo ago
Who says you need to pipe the entire document with JSON-LD directly into the context window? I agree, that is very wasteful. You can just parse the relevant bits out and convert the JSON-LD data into something like your txt format before presenting it to the LLM. Bake that right into whatever tool it uses to scrape websites.
tsazan•1mo ago
That solves the Token Tax. It fails the Bandwidth Tax. To get that JSON-LD, you still download 2MB of HTML. You execute JS. You parse the DOM. You are buying a haystack to find a needle, then cleaning the needle. We propose serving just the needle. Furthermore, JSON-LD is strictly for facts. It cannot express @SEMANTIC_LOGIC. It lacks the instructions on how to sell.
captn3m0•1mo ago
How is "schema.org compatibility" related to Legal Compliance?
tsazan•1mo ago
Schema.org is the dictionary for facts.

We map strictly to Schema.org for all transactional data (Price, Inventory, Policies). This ensures legal interoperability.

But Schema.org describes what a product is, not how to sell it.

So we extend it. We added directives like @SEMANTIC_LOGIC for agent behavior. We combine standard definitions for safety with new extensions for capability.

captn3m0•1mo ago
Is there a specific regulation this is for? What’s the compliance bit?
tsazan•1mo ago
It targets Consumer Protection and Truth-in-Advertising laws globally. The 'compliance bit' is Price Transparency. If an AI quotes a price as 'final' but checkout adds hidden fees or tax, that is a deceptive practice. Our spec enforces fields like TaxIncluded and TaxNote. It instructs the Agent to disclose whether the price is net or gross. It prevents the AI from accidentally committing fraud via misleading omissions.
hrimfaxi•1mo ago
How do you avoid downloading the whole haystack to search through the data? How does the hierarchy work? I have to keep a bunch of .txt files updated in my web root? Doesn't this require essentially mirroring the inventory db as text files (if the intent is for accurate counts of items, etc they would need to be updated in real time)?
tsazan•1mo ago
You do not download the haystack. You traverse it. The architecture is fractal. The agent reads the Root. If the user wants "Headphones", it follows that specific link. It ignores the rest. It is lazy loading for context. Do not mirror your DB manually. For real stores, generate the files dynamically. It is a view layer, just like HTML or sitemap.xml. Real-time? Yes. Since it is a dynamic response, it reflects the DB state instantly. Cache-Control headers handle the freshness.
pdntspa•1mo ago
This would have been great if it was adopted while I was still working on shopping site scrapers
tsazan•1mo ago
It definitely lowers the barrier. But relying on messy HTML as a defense against competitors is 'security through obscurity'. It does not stop them; it just costs you server CPU. The data is public. If you put it on the screen, a scraper can read it. CommerceTXT just ensures that the good bots (AI Agents bringing customers) get it efficiently, while you can still block the bad ones via WAF.
pdntspa•1mo ago
If it delivers accurate data then I can hit that instead of scraping the full HTML. Everybody wins.

What I have found, however, with existing standardization of this kind of data (yours is not the first!), is that shopping sites (big ones) will lie, and you still need to read the HTML as ground truth.

tsazan•1mo ago
You are right. Standardization often drifts from reality. That is why we built Section 9: Cross-Verification. The HTML remains the audit layer. The Agent does not trust blindly. It spot-checks. If commerce.txt says $50 but the HTML says $100, the merchant gets a Trust Score penalty. We do not replace the ground truth. We cache it, and we audit the cache to ensure it matches.
pdntspa•1mo ago
Then why bother with commerce.txt?
tsazan•1mo ago
Because you don't need to audit every single transaction.

Think of it like a cache. You use the commerce.txt for 99% of your agentic workflows because it’s 30% cheaper in tokens and 95% faster than parsing a 2MB HTML haystack.

You only 'bother' with the HTML for periodic spot-checks or when a high-value transaction requires absolute verification.

Without CommerceTXT, you are forced to pay the 'HTML tax' on every single interaction. With it, you get a high-speed fast lane for context, while keeping the HTML as a decentralized source of truth for when trust needs to be verified. It’s about moving the baseline from 'expensive and fragile' to 'efficient and auditable'.

theturtletalks•1mo ago
I’m working on a decentralized marketplace and for now, we tap into the store’s e-commerce platform API to get the inventory, handle cart creation, etc.

I commend you for trying to start a standard. Letting the established players establish standards and protocols just gives them a bigger moat and more influence.

Pay very close attention to e-commerce and conversational commerce, rent seekers are pushing protocols.

tsazan•1mo ago
APIs are toll roads. If you need an API key just to read a price, it is not the Open Web. It is a walled garden. We designed this to be permissionless. A text file has no gatekeeper. It bypasses the rent seekers entirely. The standard must belong to the commons, or it becomes just another extraction layer. Keep fighting the good fight.
theturtletalks•1mo ago
I'm working on a Shopify alternative[0] as part of this decentralized marketplace. If adding support for CommerceTXT is not too difficult, I wouldn't mind adding it.

0. https://github.com/openshiporg/openfront

tsazan•1mo ago
That would be a fantastic first implementation. Openship is exactly the kind of architecture CommerceTXT is built for. Integration is straightforward: it’s essentially just a new 'View' layer. Instead of rendering HTML, you render a .txt endpoint that maps your existing product DB to our fields. I'll head over to your repo and open an Issue to discuss how we can map Openfront's data to the spec. I'd be happy to guide the implementation myself. Let's get this moving!
theturtletalks•1mo ago
Sure, that sounds good! Happy to hop on a call to get things moving.
dehugger•1mo ago
Better idea, how about you just put a link to a csv dump of your inventory data and label it "AI Agents/Scrapers, click here to get all the inventory data", embed that on every page, then call it a day?

When you are being scraper there are two possible reactions: 1 - good, because someone scraping your data is going to help you make a sale (discoverability) 2 - bad, work to obfuscate/block/prevent access.

In the first case, introducing a complex new standard that few if any will adopt achieves nothing compared to "here's a link for all the data in one spot, now leave my site alone. cheers".

In the second case, you actively don't want your data scraped, so why would you ever adopt this?

If you are reading all the inventory data into context then you are doing it wrong. Use your LLM to analyze the website and build a mapping for the HTML data, then parse using traditional methods (bs4 works nicely). You'll save yourself a gajillion tokens and get more consistent and accurate results at 1000x the speed.

tsazan•1mo ago
A CSV is a dump of facts. CommerceTXT is a layer of intent and logic. If you give an AI a giant CSV of your whole inventory, you blow the context window before the conversation even starts. If you serve a CSV per product, you still pay for headers and commas without getting any behavioral control.

Our spec handles this via @SEMANTIC_LOGIC and @BRAND_VOICE. It’s about how the AI represents your brand, not just the raw numbers.

Regarding bs4: mapping HTML to a thousand different store layouts is exactly what we are trying to escape. That is the 'fragility tax'. We are proposing a deterministic fast-lane that bypasses the need for custom scrapers for every single store.

You don't want the AI to 'guess' your data. You want it to 'know' your data.

IgorPartola•1mo ago
Meh. I would rather just have the ability to query any given products catalog in a machine-readable way. Any tool or protocol specifically designed for an LLM to consume is in my opinion a design smell. We should instead design proper APIs and protocols usable by all kinds of program and the LLMs can adapt.

You are also solving a business problem with a technical solution. Shopify recently announced that they will open up their entire catalog via an easy to use API to a select few enterprise partners. Amazon is doing a similar thing. This is because they do not want you and I to have the ability to programmatically query their catalog. They want to extract money out of specific partners who are trying to enshittify AI chat apps by throwing tons of ads in there. The big movers in the industry could have already easily adopted a similar standard but they are not going to on purpose. On top of you technical issues other commenters are pointing out, I don’t see why this should be in use at all.

tsazan•1mo ago
You’ve identified the exact tension we are navigating.

I support platforms like Shopify and Wix because they empower 80% of independent merchants to exist online. But I oppose their move toward 'enterprise-only' data silos. When Shopify gates their catalog API for a few select partners, they aren't protecting the merchant. They are protecting their own rent-seeking position.

CommerceTXT is a way for a merchant on any platform to say: 'My data is mine, and I want it to be discoverable by any agent, not just the ones who paid the platform's entry fee'.

Regarding 'design smell': Every major shift in computing has required specialized protocols. We didn't use Gopher for the web, and we shouldn't use 2010-era REST APIs for 2025-era LLMs. Models have unique constraints-token costs and hallucination risks-that traditional APIs simply weren't built to handle.

We aren't building for the gatekeepers. We are building for the open commons.

IgorPartola•1mo ago
Eh. This is an arrogant and misguided take. Good luck.
dehugger•1mo ago
the entire point of the system I described is that it never needs to load that data into context.

AI is excellent at mapping from one format to another.

I use this method to great affect.

tsazan•1mo ago
The mapping approach assumes the web is static. In reality, you're building a 'maintenance debt' machine. For every 1,000 stores, you need 1,000 AI-generated mappings that break whenever a dev changes a CSS class.

CommerceTXT isn't just about extraction; it's about contract-based delivery. We are moving from 'Guessing through Scraping' to 'Knowing through Protocol'. You're optimizing the process of scraping; we are eliminating the need for it.