There is no built-in Liquid property to directly detect Shopify Collective fulfillment in email notifications.
You can use the Admin GraphQL API to programmatically detect fulfillment source.
In Liquid, you must rely on tags, metafields, or custom properties that you set up yourself to mark Collective items.
If you want to automate this, consider tagging products or orders associated with Shopify Collective, or using an app to set a metafield, and then check for that in your Liquid templates.
What you can do in Liquid (email notifications):
If Shopify exposes a tag, property, or metafield on the order or line item that marks it as a Shopify Collective item, you could check for that in Liquid. For example, if you tag orders or products with "Collective", you could use:
{% if order.tags contains "Collective" %}
<!-- Show Collective-specific content -->
{% endif %}
or for line items: {% for line_item in line_items %}
{% if line_item.product.tags contains "Collective" %}
<!-- Show something for Collective items -->
{% endif %}
{% endfor %}
In the author's 'wrong' vs 'seems to work' answer, the only difference is the tag on the line items vs, the order. The flow (template? as he refers to it as 'some other cryptic Shopify process' ) he uses in his tests does seem to add the 'Shopify Collective' tag to the line items, and potentially also to the order if the whole order is Shopify Collective fullfilled, but without further info we can only guess his setup.While using AI can always lead to non-perfect results, I feel the evidence presented here does not support the conclusion.
P.S. Given the reference to 'cryptic Shopify processes', I wonder how far the author would get with 'just the docs'.
Besides, it is not even incorrect in the way he states it is. It is fully dependent on how he added the tags in his flow, as the complete answer correctly stated. He speculates on some timing issue in some 'cryptic Shopify process' adding the tag at a later stage, but this is clearly wrong as his "working answer" (which is also in the Assistant reply) does rely on the tag having been added at the same point in the process.
My pure and exaggerated on purpose speculation: He just blindly copied some flow template, then from the (same as I got?) Assistant's answer copy/pasted the first Liquid code box, tested on one order and found it not doing what he wanted, this suited his confirmation bias regarding AI, later tried pasting the second Liquid code box (or the same answer you will get from Gemini through Google Search) and found 'it worked' on his one test order, still blamed the Assistant for being 'wrong'.
I just asked chatgpt "whats the best database structure for a users table where you have users and admins?" in two different browser sessions. One gave me sql with varchars and a role column using:
role VARCHAR(20) NOT NULL CHECK (role IN ('user', 'admin')),
the other session used text columns and defined an enum to use first: CREATE TYPE user_role AS ENUM ('user', 'admin', 'superadmin');
//other sql snipped
role user_role NOT NULL DEFAULT 'user',
An Ai Assistant should be better tuned but often isn't. That variance to me makes it feel wildly unhelpful for 'documentation' as two people end up with quite different solutions.Your question is vague (technical reference, not meant derogatory). In which DBMS? By what metric of 'best'? For which size of database? Does it need to support internationalization? Will the roles be updated or extended in the future etc.
You could argue an AI Assistant would need to ask you this clarification if the question is vague rather than make a guess. But in extremis this is in practice not workable. If every minute factor needs to be answered by the user before getting a result, only the very experts would get to the stage of getting an answer if ever.
This is not just an AI problem, but a problem (human) business and technical analysts face every day in their work. When do you switch to proposing a solution rather than asking further details? It is BTW also why all those BPM or RPA platforms that promise to eliminate 'programming' and let the business analyst 'draw' a solution often fail miserably. They either have too narrow defaults or keep needing to be fed detail long past the BA's comfort zone.
This is the same exact problem in coding assistants when they hallucinate functions or cannot find the needed dependencies etc.
There are better and more complex approaches that use multiple agents to summarize different smaller queries and then iteratively buildup etc, internally we and a lot of companies have them, but for external customer queries, way too expensive. You can't spend 30 cents on every query
Everytime I land on help.shopify.com I get the feeling it's one of those "Doc pages for sales people". Like it's meant to show "We have great documentation and you can do all these things" but never actually explains how to do anything.
I tried that bot a couple of months ago and it was utterly useless:
question: When using discountRedeemCodeBulkAdd there's a limit to add 100 codes to a discount. Is this a limit on the API or on the discount? So can I add 100 codes to the same discount multiple times?
answer: I wasn't able to find any results for that. Can you tell me a little bit more about what you're looking for?
Telling it more did not help. To me that seemed like the bot didn't even have access to the technical documentation. Finding it hard to believe that any search engine can miss a word like discountRedeemCodeBulkAdd if it actually is in the dataset: https://shopify.dev/docs/api/admin-graphql/latest/mutations/...
So it's a bit like asking sales people technical questions.
edit: Okay, I should have tried that before commenting. They seem to have updated it. When I ask the same question now it answers correctly (weirdly in German) :
Die Begrenzung von 100 Codes bei der Verwendung von discountRedeemCodeBulkAdd bezieht sich auf die Anzahl der Codes, die Sie in einem einzelnen API-Aufruf hinzufügen können, nicht auf die Gesamtanzahl der Codes, die einem Rabatt zugeordnet werden können. Ein Rabattcode kann bis zu 20.000.000 eindeutige Rabattcodes enthalten. Daher können Sie mehrfach jeweils 100 Codes zum selben Rabatt hinzufügen, bis Sie das Limit von 20.000.000 Codes erreicht haben. Beachten Sie, dass Drittanbieter-Apps oder benutzerdefinierte Lösungen dieses Limit nicht umgehen oder erhöhen können.
~= It's a limit on the API endpoint, you can add up to 20M to a single discount.
Maybe that's the best anthropomorphic analogy of LLMs. Like good sales people completely disconnected from reality, but finely tuned to give you just the answer you want.
Kind of like a bad salesperson, the best salespeople I've had the pleasure of knowing were not afraid to learn the technical background of their products.
I keep seeing bots wrongly prompted with both the browser language and the text "reply in the user's language". So I write to a bot in English and I get a Spanish answer.
You want grounded RAG systems like Shopify's here to rely strongly on the underlying documents, but also still sprinkle a bit of the magic of the latent LLM knowledge too. The only way to get that balance right is evals. Lots of them. It gets even harder when you are dealing with GraphQL schema like Shopify has since most models struggle with that syntax moreso than REST APIs.
FYI I'm biased: Founder of kapa.ai here (we build docs AI assistants for +200 companies incl. Sentry, Grafana, Docker, the largest Apache projects etc).
We concatenated all our docs and tutorials into a text file, piped it all into the AI right along with the question, and the answers are pretty great. Cost was, last I checked, roughly 50c per question. Probably scales linearly with how much docs you have. This feels expensive but compared to a human writing an answer it's peanuts. Plus (assuming the customer can choose to use the AI or a human), it's great customer experience because the answer is there that much faster.
I feel like this is a no-brainer. Tbh with the context windows we have these days, I don't completely understand why RAG is a thing anymore for support tools.
Re cost though, you can usually reduce the cost significantly with context caching here.
However, in general, I’ve been positively surprised with how effective Claude Code is at grep’ing through huge codebases.
Thus, I think just putting a Claude Code-like agent in a loop, with a grep tool on your docs, and a system prompt that contains just a brief overview of your product and brief summaries of all the docs pages, would likely be my go to.
And it's inefficient in two ways-
-you're using extra tokens for every query, which adds up.
-you're making the LLM less precise by overloading it with potentially irrelevant extra info making it harder for it to needle in a haystack the specific relevant answer.
Filtering (e.g. embedding similarity & BM25) and re-ranking/pruning what you provide to RAG is an optimization. It optimizes the tokens, the processing time, and optimizes the answer in an ideal world. Most LLMs are far more effective if your RAG is limited to what is relevant to the question.
RAG is selecting pertinent information to supply to the LLM with your query. In this case they decided that everything was pertinent, and the net result is just reduced efficiency. But if it works for them, eh.
You mention the retrieval stage being a SELECT *? I don't think there's any SQL involved here.
>and adding that as a system/user prompt to the LLM at inference time
You understand this is all RAG is, right? RAG is any additional system to provide contextually relevant (and often more timely) supporting information to a baked model.
People sometimes project RAG out to be a specific combination of embeddings, chunking, vector DBs, etc. But that is ancillary. RAG is simply selecting the augmentation data and supplying it with the question.
Anyways, I think this thread has reached a conclusion and there really isn't much more value in it. Cheers.
I personally define it as not including loading all data in the context-window
Very new field and not a lot of reliable sources. Would be worth it to standardize meaning.
https://en.wikipedia.org/wiki/Information_retrieval
In that sense, calling ”stuff everything in the context” LLM queries a RAG system is analogous to calling a web crawler a search engine.
Additionally the quality of loading the context-window decreases linearly as well, just because your model can handle 1M tokens it doesn't mean that it WILL remember 1M tokens, it just means that it CAN
RAG fixes this, in the simplest configuration a RAG can be an index, and the only context you give the LLM is the table of contents, and you let it search through the index.
Should it be a surprise that this is cheaper and more efficient? Loading the context window is like a library having every book open at every page at the same time instead of using the dewey decimal system
(We tend to have far fewer evals for such humans though.)
This is doing some heavy lifting
https://crespo.business/posts/llm-only-rag/
$ rgd ~/repos/jj/docs "how can I write a revset to select the nearest bookmark?"
Using full corpus (length: 400,724 < 500,000)
# Answer
gemini-2.5-flash | $0.03243 | 2.94 s | Tokens: 107643 -> 56
The provided documentation does not include a direct method to select the
nearest bookmark using revset syntax. You may be able to achieve this using
a combination of ancestors() , descendants() , and latest() , but the
documentation does not explicitly detail such a method.If the training data is full of certain statements you'll get certain sounding statements coming out of the model, too, even for things that are only similar, and for answers that are total bullshit
I get "I don't know" answers from Claude and ChatGPT all the time, especially now that they have thrown "reasoning" into the mix.
Saying that LLMs can't say "I don't know" feels like a 2023-2024 era complaint to me.
I did have one problem (involving SQLite triggers) that I bounced off various LLMs for genuinely a full year before finally getting to an understanding that it wasn't solvable! https://github.com/simonw/sqlite-chronicle/issues/7
I would have much appreciated if it could throw its hands up and say it doesn't know.
I was benchmarking some models the other day via openrouter and I got the distinct impression some of these models treat the thinking token budget as a target rather than a maximum.
Generally i don't trust most low paid (at no fault of their own) customer service centers anymore than i do random LLMs. Historically their advice for most things is either very biased, incredibly wrong, or often both.
I've had good and bad experiences with them thus far, to the other poster's point, just like with human support teams.
(My domain is regulatory compliance, so maybe this goes beyond pure documentation but I'm guessing pushed far enough the same complexities arise)
I guess this is why Kagi Quick Answer has consistently been one of the best AI tools I use. The search is good, so their agent is getting the best context for the summaries. Makes sense.
Just dumping raw reams of text into the 'prompt' isn't the best way to great results. Now I am fully aware that anything I can do on my side of the API, the LLM provider can and eventually will do as well. After all, Search also evolved beyond 'pagerank' to thousands of specialized heuristic subsystems.
A philosophy degree later…
I ended up just generating a summary of each of our 1k docs, using the summaries for retrieval, running a filter to confirm the doc is relevant, and finally using the actual doc to generate an answers.
I feel like we aren't properly using AI in products yet.
It’s great when you’re looking to do creative stuff. But terrible when you’re looking to confirm the correctness of an approach or asking for support on something that you weren’t even aware of its nonexistence.
Ive also had alot of issues with cmake that it just invents syntax and functions. Every new question has to be made in a new chat context to clear the context poisoning.
Its the things that lack good docs i want to ask about. But that's where its most likley to fail.
People seem more willing to ask an AI about certain things then be judged by asking the same question of a human, so in that regard it does seem to surface slightly different feature requests then we hear when talking to customers directly.
We use inkeep.com (not affiliated, just a customer).
And what do you pay? It's crazy that none of these AI CSRs have public pricing. There should just be monthly subscription tiers, which include some number of queries, and a cost per query beyond that.
Very similar sentiment at the height of the crypto/digital currency mania
> What’s the syntax, in Liquid, to detect whether an order in an email notification contains items that will be fulfilled through Shopify Collective?
I suspect the best possible implementation of a documentation bot with respect to questions like this one would be an "agent" style bot that has the ability to spin up its own environment and actually test the code it's offering in the answer before confidently stating that it works.
That's really hard to do - Robin in this case could only test the result by placing and then refunding an order! - but the effort involved in providing a simulated environment for the bot to try things out in might make the difference in terms of producing more reliable results.
They take a screenshot and make fun of the rubbish bot on social media.
If that happens rarely it's still a worthwhile improvement over today. If it happens frequently then the documentation bot is junk and should be retired.
But even if one assumes people mean “probabilistic”, that’s also an odd critique given how probabilistic software has pretty much eaten the world. Most of my career has been building reliable product using probabilistic models.
Finally, there’s nothing inherently probabilistic or non-deterministic about LLM generation, these are properties of the sampler applied. I did quite a lot of LLM benchmarking in recent years and almost always used greedy sampling both for performance (doing things like GSM8K strong benefits from choosing the maximum likely path) and reproducibility. You can absolutely set up LLM tools that have perfectly reproducible results. LLMs have many issues but their probabilistic nature is not one of them.
A business can reduce temperature to 0 and choose a specific seed, and it's the correct approach in most cases, but still the answers might change!
On the other hand, it's true that there is some probability that is independent of determinism, for example maybe changing the order of some words might yield different answers, this might be a deterministic machines, but there's millions of ways to frame a question, if the answer depends on trivial details of the question formatting, there's a randomness there. Similar to how there is randomness in who will win a chess match between two equally rated players, despite the game being deterministic.
This is not correct. Both of the examples I gave where specifically chosen because they use non-determinism without any probabilistic framework associated.
Regex matching using non-deterministic finite automata requires absolutely zero usage of probability. You simply need to keep track of multiple paths and store whether or not any are in valid state at the end of processing the string. The list monad as non-determinism is an even more generic model of non-determinism, that again, requires nothing probabilistic in it's reasoning.
Non-deterministic things do often become probabilistic because typically you have to make a choice of paths, and that choice can have a probabilistic nature. But again, NFA regex matching is a perfect example where no "choice" is needed.
I do agree that there is no designed probability in that example though, but there is emergent probability.
Like being a cancer diagnostician. Or an inspector at a border crossing.
Using LLMs is currently a lot like going to a diagnostian that always responds "no, you're healthy". The answer is probably right. But still we pay people a lot to get that last 1%.
If people paid for docs as an independent product, or had the foresight to evaluate the quality of the docs before making a purchase and use it as part of their criteria (or are able to do that at all), I think attitudes around docs and "docs bots" and their correctness, utility etc. would be a lot different.
Under that definition of (non-)deterministic, ironically, an NFA is deterministic, because it always produces the same result for the same input.
We can just go straight to the Sipser (from the chapter 1, all emphasis is Sipser's)[1]:
> Nondeterminism is a useful concept that has had great impact on the theory of computation. So far in our discussion, every step of a computation follows in a unique way from the preceding step. When the machine is in a given state and reads the next input symbol, we know what the next state will be--it is determined. We call this deterministic computation. In a nondeterministic machine several choices may exist for the next state at any point.
> How does an NFA compute? Suppose that we are running an NFA on an input string and come to a state with multiple ways to proceed. For example, say that we are in state q_1 in NFA N_1 and that the next input symbol is a 1. After reading that symbol, the machine splits into multiple copies of itself and follows all the possibilities in parallel. Each copy of the machine proceeds and continues as before.
This is why the list monad also provides a useful way to explore non-determinism that mirrors in functional programming terms what NFAs do in a classical theory of computation framework.
To this point, LLMs can form this type of nondeterministic computing when they follow multiple paths at once doing beam search, but are unquestionably deterministic when doing greedy optimization, and still deterministic when using other single path sampling techniques and a known seed.
[0]. https://en.wikipedia.org/wiki/Nondeterministic_programming
[1]. https://cs.brown.edu/courses/csci1810/fall-2023/resources/ch...
You seem to be misunderstanding the role automata theory plays in the larger framework of the theory of computation. It is not a "special case" of nondeterminism, it is the foundation for how all of the theory of computation is built.
Additionally, I'm also demonstrating how that exact same concept plays out in the other framework of computation, functional programming, and it works fundamentally the same way.
I have to say it's a bit surprising to need to defend the fundamental principles of computer science on HN. The topic is "how LLMs compute things" so using the computational definition of nondeterminism seem entirely relevant.
But that's not true! Docs are sometimes wrong, and even more so if you could errors of omission. From a users perspective, dense / poorly structured docs are wrong, because they lead users to think the docs don't have the answer. If they're confusing enough, they may even mislead users.
There's always an error rate. DocBots are almost certainly wrong more frequently, but they're also almost certainly much much faster than reading the docs. Given that the standard recommendation is to test your code before jamming it in production, that seems like a reasonable tradeoff.
YMMV!
(One level down: the feedback loop for getting docbots corrected is _far_ worse. You can complain to support that the docs are wrong, and most orgs will at least try to fix it. We, as an industry, are not fully confident in how to fix a wrong LLM response reliably in the same way.)
A lot of the discourse around LLM tooling right now boils down to "it's ok to be a bit wrong if you're wrong quickly" ... and then what follows is an ever-further bounds-pushing on how big "a bit" can be.
The promise of AI is "human-level (or greater)" --- we should only be using AI when it's as accurate (or more accurate) as human-generated docs, but the tech simply isn't there yet.
- "Oh yeah just write this," except the person is not an expert and it's either wrong or not idiomatic
- An answer that is reliably correct enough of the time
- An answer in the form "read this page" or quotes the docs
The last one is so much better because it directly solves the problem, which is fundamentally a search problem. And it places the responsibility for accuracy where it belongs (on the written docs).
I don't know the first thing about Shopify, but perhaps you can create a free "test" item so you don't actually need to make a credit card transaction.
Shopify doesn’t provide a way to test unconventional email formats without actually placing real orders, so I did my customary dance of order-refund, order-refund, order-refund. My credit card is going to get locked one of these days.
The person who wrote the above knows a lot about Shopify, so if you're going to contradict them, it'd be nice to point to some evidence as to why you think they're wrong.
The test systems are broadly good and worth using, but no. Everyone uses real purchases too.
docs with JPEG artifacts, the more you zoom, the more specific your query, the worse the noise becomes
We've done three trials since 2023 and each time we've found them not good enough to put in front of our customers.
Usually the distribution has been about 60% good answers, 20% neutral to bad, 20% actively harmful that wastes the user's time.
Really hoping we'll see better results this time but so far nothing has beat the recommendation to add our docs to your local LLM IDE of choice (Cursor etc) and then ask it questions with your own codebase as context.
The overblown claims? The systems prompt "team" that ensure "I don't know" can never be uttered? Or user expectations?
I find it unfair to blame users, but I tend to think creating a system prompt that accentuates "with an air of certainty" over ... straightforward honesty, both telling of the org culture, and the wider trend of "look like you know"...
BossingAround•7mo ago
I remember being taught that no docs is better (i.e. less frustrating to the user) than bad/incorrect docs.
pmg101•7mo ago
After a certain number of years you learn that source code comments so often fall out of synch with the code itself that they're more of a liability than an asset.
taneq•7mo ago
Although, “All datasheets are wrong. Some datasheets are useful.”
walthamstow•7mo ago
My current place? It's in Confluence, miles away from code and with no review mechanism.