If all the bullshit hype and marketing would evaporate already (“LLMs will replace all jobs!”), stuff like this would float to the top more and companies with large data sets would almost certainly be clamoring for drop-in analysis solutions based on prompt construction. They’d likely be far happier with the results, too, instead of fielding complaints from workers about it (AI) being rammed down their throats at every turn.
Ask away. Best method I’ve found so far for this.
Now when I ask questions about design decisions, the LLM refers to the original paper and cites the decisions without googling or hallucinating.
With just these two things in my local repo, the LLM created test scripts to compare our results versus the paper and fixed bugs automatically, helped me make decisions based on the paper's findings, helped me tune parameters based on the empirical outcomes, and even discovered a critical bug in our code that was caused by our training data being random generated versus the paper's training data being a permutation over the whole solution space.
All of this work was done in one evening and I'm still blown away by it. We even ported our code to golang, parallelized it, and saw a 10x speedup in the processing. Right before heading to bed, I had the LLM spin up a novel simulator using a quirky set of tests that I invented using hypothetical sensors and data that have not yet been implemented, and it nailed it first try - using smart abstractions and not touching the original engine implementation at all. This tech is getting freaky.
- text classification, not text generation
- operating on existing unstructured input
- existing solution was extremely limited (string matching)
- comparing LLM to similar but older methods of using neural networks to match
- seemingly no negative consequences to warranty customers themselves of mis-classification (the data is used to improve process, not to make decisions)“ Fun fact: Translating French and Spanish claims into German first improved technical accuracy—an unexpected perk of Germany’s automotive dominance.”
Given that it was inside a 9-step text preprocessing pipeline, it would be surprising if the AI had that much autonomy.
Looks like they were limited by AWS Bedrock options.
* already known as SotA for text classification and similarity
back in 2023
* natively multi-lingualBut no, they want to pay $0.1 per request to recognize if a photo has a person in it by asking a multimodal LLM deployed across 8x GPUs, for some reason, instead of just spending some hours with CLIP and run it effectively even on CPU.
There have been some developments in the image-of-text/other-than-photograph area though recently. From Meta (although they seem unsure of what exactly their AI division is called): https://arxiv.org/abs/2510.05014 and Qihoo360: https://arxiv.org/abs/2510.27350 for instance.
Over the past couple of years people have made attempts with NLP (lets say standard ML workflows) but NLP and word temperature scores are hard to integrate into a reliable data pipeline much less a operational review workflow.
Enter LLM's, the world is a data gurus oyster for building an detection system on warranty claims. Passing data to Prompted LLM's means capturing and classifying records becomes significantly easier, and these data applications can flow into more normal analytic work streams.
> We didn’t just replace a model. We replaced a process.
That line sticks out so much now, and I can't unsee it.
> That’s not a marginal improvement; it’s a different way of building classifiers.
They've replaced an em-dash with a semi-colon.
Also HN readers: upvote the most obvious chatgpt slop to the frontpage
I'm afraid that people will draw the wrong conclusion from "We didn’t just replace a model. We replaced a process." and see it as an endorsement of the zero-shot-uber-alles "Prompt and Pray" approach that is dominant in the industry right now and the reason why an overwhelming faction of AI projects fail.
If you can get good enough performance out of zero shot then yeah, zero shot is fine. Thing is that to know it is good enough you still have to collect and annotate more data than most people and organizations want to do.
The text says, "...no leaks..." The case statement says, "...AND LOWER(claim_text) NOT LIKE '%no leak%...'"
It would've properly been marked as a "0".
yahoozoo•1h ago
davidsainez•1h ago
killerstorm•1h ago
andai•6m ago
https://www.anthropic.com/engineering/contextual-retrieval
They also found improvements from augmenting the chunks with Haiku by having it add a summary based on extra context.
That seems to benefit both the keyword search and the embeddings by acting as keyword expansion. (Though it's unclear to me if they tried actual keyword expansion and how that would fare.)
Anyway what stands out to me most here is what a Rube Goldberg machine it is. Embeddings, keywords, fusion, contextual augmentation, reranking... each adding marginal gains.
But then the whole thing somehow works really well together (<1% fail rate on most benchmarks. Worse for code retrieval.)
I have to wonder how this would look if it wasn't a bunch of existing solutions taped together, but actually a full integrated system.
andy99•1h ago
suriya-ganesh•25m ago
LLMs still beat a clarifier, because they're able to extract more signals than a text embedding.
It's very difficult to beat an LLM + prompt in terms of semantic extraction.