Ask HN: What are you actually using LLMs for in production?

49•Satam•7mo ago

Beyond the obvious chatbots and coding copilots, curious what people are actually shipping with LLMs. Internal tools? Customer-facing features? Any economically useful agents out there in the wild?

Comments

binarymax•7mo ago

So many things. I have built several customer facing products, a web research platform that works better than the RAG you get from Google, and lots of small tools.

For example, I wrote a recent blog post on how I use LLMs to generate excel files with a prompt (less about the actual product and more about how to improve outcomes): https://maxirwin.com/articles/persona-enriched-prompting/

notjoemama•7mo ago

Thank you for the link! That was a nice read through. I'm just familiarizing myself with using AI in software development and this gives me some structure around how to scaffold up a domain knowledge response. Very cool.

actinium226•7mo ago

We have a prompt that takes a job description and categorizes it based on whether it's an individual contributor role, manager, leadership, or executive, and also tags it based on whether it's software, mechanical, etc.

We scrape job sites and use that prompt to create tags which are then searchable by users in our interface.

It was a bit surprising to see how Karpathy described software 3.0 in his recent presentation because that's exactly what we're doing with that prompt.

adobrawy•7mo ago

In other words, are you using LLM as a text classifier?

blindriver•7mo ago

This is what I'm using it for as well, it's really simple to use for text classification of any sort.

jerpint•7mo ago

Are there currently services (or any demand for) a text classifier that you fine tune on your own data that is tiny and you can own forever? Like use a ChatGPT + synthetic data to fine tune a nanoBERT type of model

Vegenoid•7mo ago

Can you elaborate on what makes this “software 3.0”? I didn’t really understand what the distinction was in Karpathy’s talk, and felt like I needed a more concrete example. What you describe sounds cool, but I still feel like I’m not understanding what makes it “3.0”. I’m not trying to criticize, I really am trying to understand this concept.

diggan•7mo ago

> Can you elaborate on what makes this “software 3.0”?

Software 2.0: We need to parse a bunch of different job ads. We'll have a rule engine, decide based on keywords what to return, do some filtering, maybe even semantic similarity to descriptions we know match with a certain position, and so on

Software 3.0: We need to parse a bunch of different job ads. Create a system prompt that says "You are a job description parser. Based on the user message, return a JSON structure with title, description, salary-range, company, position, experience-level" and etc, pass it the JSON schema of the structure you want and you have a parser that is slow, sometimes incorrect but (most likely) covers much broader range than your Software 2.0 parser.

Of course, this is wildly simplified and doesn't include everything, but that's the difference Karpathy is trying to highlight. Instead of programming those rules for the parser ourselves, you "program" the LLM via prompts to do that thing.

Vegenoid•7mo ago

Thank you for the explanation, I appreciate it.

sethops1•7mo ago

Not really, no. Still just using ChatGPT or Gemini for the occasional search for things that are buried in documentation somewhere. Anything more than that and LLMs make a hash of it fairly quick.

rc_mob•7mo ago

I have enjoyed how these LLMs make a nice wrapper around projects tha are terrible at writing documentation.

VladVladikoff•7mo ago

Nvidia nemo ASR + an 8B LLM to generate transcripts and summaries of phone calls that my support team conducts. It works better than the notes they leave about the calls.

GarnetFloride•7mo ago

We've been encouraged to use LLMs for brainstorming blog posts. The actual posts it generates are usually not good but gives us something to talk about so we can write something better. And doing SEO to posts. It seems to do that pretty well.

intermerda•7mo ago

Mostly for understanding existing code base and making changes to it. There are tons of unnecessary abstractions and indirections in it so it takes a long time for me to follow that chain. Writing Splunk queries is another use.

People use it to generate meeting notes. I don't like it and don't use it.

joeyagreco•7mo ago

Writing test boilerplate.

yamalight•7mo ago

Built vaporlens.app in my free time using LLMs (specifically gemini, first 2.0-flash, recently moved to 2.5-flash).

It processes Steam game reviews and provides one page summary of what people thing about the game. Have been gradually improving it and adding some features from community feedback. Has been good fun.

polishdude20•7mo ago

I usually find that if a game is rated overwhelmingly positive, I'm gonna like it. The moment it's just mostly positive, it doesn't stay as a favorite for me.

yamalight•7mo ago

Those games are usually brilliant - but those are very rare. Like "once in a few years" kind of rare IMO. While that is a valid approach, I play way more than that haha!

What I found interesting with Vaporlens is that it surfaces things that people think about the game - and if you find games where you like all the positives and don't mind largest negatives (because those are very often very subjective) - you're in a for a pretty good time.

It's also quite amusing to me that using fairly basic vector similarity on points text resulted in a pretty decent "similar games" section :D

on_the_train•7mo ago

That rating is not (just) a function of positive to negative ratio. Small number of reviews (ie small games) can't reach that rating although they might be equally well received.

muzani•7mo ago

There's plenty I don't like, like Factorio. It's not bad enough to downvote, but not likeable enough to play.

However, review positivity is usually the best indicator of sales - it's so accurate that there's algorithms that rely entirely on it.

asdev•7mo ago

Still kind of a chatbot, but I've integrated them into a workout tracking app. I'm using them to generate workout programs, log my training by just chatting and adjust my training as I see fit.

https://apps.apple.com/us/app/forceai-ai-workout-generator/i...

rootcage•7mo ago

The most common use case - coding assistant to get more done in less time.

Used it to deeper understand complex code base, create system design architecture diagrams and help onboard new engineers.

Summarizing large data dumps that users were frustrated with.

IdealeZahlen•7mo ago

I've been building some interactive educational stuff (mostly math and science) with react / three.js using Claude.

lazy_afternoons•7mo ago

We use it for lead quality assessment, detecting bad language, scoring language on subtle skills etc

Pretty much 5-6 niche classification use cases.

rootsofallevil•7mo ago

> Beyond the obvious chatbots and coding copilots, curious what people are actually shipping with LLMs.

We're delivering confusion and thanks to LLMs we're 30% more efficient doing it

petercooper•7mo ago

Analyzing firehoses of data. RSS feeds, releases, stuff like that. My job involves curating information and while I still do that process by hand, LLMs make my net larger and help me find more signals. This means hallucinations or mistakes aren't a big deal, since it all ends up with me anyway. I'm quite bullish on using LLMs as extra eyes, rather than as extra hands where they can run into trouble.

captainbland•7mo ago

Is cost a major consideration for you here? Like if you're dealing with firehose data which I'm assuming is fairly high throughput, do you see an incentive for potentially switching to a more specific NLP classifier model rather than sticking with generative LLMs? Or is it that this is good enough/the ROI of switching isn't attractive? Or is the generative aspect adding something else here?

simonw•7mo ago

If you do the calculations against the cheapest available models (GPT-4.1-nano and Gemini 1.5 Flash 8B and Amazon Nova Micro for example - I have a table on https://www.llm-prices.com/ ) it is shockingly inexpensive to process even really large volumes of text.

$20 could cover half a billion tokens with those models! That's a lot of firehose.

meesles•7mo ago

I don't think everyone's using the term 'firehose' the same here. A child comment refers to half a billion tokens for $20.

I did some really basic napkin math with some Rails logs. One request with some extra junk in it was about 400 tokens according to the OpenAI tokenizer[0]. 500M/400 = ~1.25 million log lines.

Paying linearly for logs at $20 per 1.25 million lines is not reasonable for mid-to-high scale tech environments.

I think this would be sufficient if a 'firehose of data' is a bunch of news/media/content feeds that needs to be summarized/parsed/guessed at.

[0] https://platform.openai.com/tokenizer

petercooper•7mo ago

No. It's a tiny expense. I mostly use GPT 4.1 Mini for what I'm doing as it's the best balance between results and cost, but Gemini Flash can do the job just as well for a little less if I need it.

As other commenters have mentioned, a firehose can mean many things. For me it might be thousands of different reasonably small things a day which is dollars a day even in the worst case. If you were processing the raw X feed or the whole of Reddit or something, then all of your questions certainly become more relevant :-)

captainbland•7mo ago

Yeah that makes sense based on those specifics, thanks

scarface_74•7mo ago

I can’t tell you what I’m working on but I can give you a real world example of where traditional models don’t work well.

Sentiment analysis is like the “Hello World” when you’re using Machine Learning.

But I had a use case similar to a platform like Uber eats where someone can be critical of the service provider or be critical of the platform itself. I needed to be able to distinguish sentiment about the platform based on reviews and sentiment about someone on the platform.

No matter what you do, people are going to conflate the reviews.

As far as costs, I mentioned in another comment that I work with online call centers sometimes. There anytime a person has to answer a call, it costs the company from $2-$5.

One call deflection that saves the company $5 can pay for a lot of inference. It’s literally 100x cheaper at least to use an LLM.

nurettin•7mo ago

I use LLMs to provide up to date information (by injecting newer information into the live conversation) and figure out what functions the user wants to call.

orphea•7mo ago

When a customer onboards, we scrap their website to pre-fill some answers and pre-create certain settings (categories, tags, etc.). Ideally the customer spends most of the time just confirming things.

themanmaran•7mo ago

This is something we've been doing as well, and it's pretty magical when the user has a fully customized experience.

That said, it required the user to sign in with their real work email or the results are way off.

karmakaze•7mo ago

Not production I was just playing around but seems useful. On so many platforms bios are mostly blank. The best way to get good ones is to have AIs search for pictures and info about yourself and write a draft that's close but definitely not how you want it. That motivates fixing it up on the spot.

nickandbro•7mo ago

I have a hobby project called https://Vimgolf.ai where users try to best a bot that is powered by O3. Apparently, O3 is really good at vim sequences to transform a start file to an end file albeit with moderate complexity.

impure•7mo ago

Pretty much all of my productivity apps has LLM integration now. My language learning app uses them to break down phrases and get detailed definitions. My RSS app generates summaries. And recently I released an email app that's like Google Inbox in that it uses bundles. It also summarizes emails and extracts expiry and due dates.

on_the_train•7mo ago

And all your users hate it

alonsonic•7mo ago

I created an agent to scan niche independent cinemas and create a repository of everything playing in my city. I have an LLM heavy workflow to scrape, clean, classify and validate the data. It can handle any page I throw at it with ease. Very accurate as well, less than 5% errors right now.

jakevoytko•7mo ago

I work for Hinge, the dating app. We use them for our "prompt feedback" feature, where the LLM gives constructive feedback on how to improve your prompts if it judges them as low-effort or clichéd.

miketery•7mo ago

Doesn't this create a signal problem long term?

If everyone is using it now prompts aren’t a good gauge.

jakevoytko•7mo ago

It's optional and doesn't generate responses for you, instead just nudging you in better directions. So it's certainly not generating a bunch of indistinguishable profiles. Quite the opposite, it gives people a second chance to expand on their own views or experiences.

bronco21016•7mo ago

Won’t this lead to long-term everyone using the same prompt? It seems like this already naturally happens.

jakevoytko•7mo ago

It doesn’t pick your prompt, just evaluates your response. AFAIK it doesn’t suggest other prompts

perk•7mo ago

Several things! But my favourite use-case works surprisingly well.

I have a js-to-video service (open source sdk, WIP) [1] with the classic "editor to the left - preview on the right" scenario.

To help write the template code I have a simple prompt input + api that takes the llms-full.txt [2] + code + instructions and gives me back updated code.

It's more "write this stuff for me" than vibe-coding, as it isn't conversational for now.

I've not been bullish on ai coding so far, but this "hybrid" solution is perfect for this particular use-case IMHO.

[1] https://js2video.com/play [2] https://js2video.com/llms-full.txt

tootie•7mo ago

Coding assistant and audio transcription

wayschultz•7mo ago

I work for Typeform, we do conversational forms. For over a year we've been evolving this internal product (still in Beta) to generate smart insights for the collected responses https://medium.com/typeforms-engineering-blog/under-the-hood...

hoistbypetard•7mo ago

I work with (a few someones) who see fit to send out schedules as PDFs, 3 months at a time. I have a script that feeds Claude the PDFs and gets it to generate an ICS file. Then a script that feeds it both the ICS file and the original PDF and asks it to highlight any differences between the two.

Getting those events onto a usable, sharable calendar is much easier now.

cpursley•7mo ago

Parsing information into structured data as well as classifying information into normalized fields.

miketery•7mo ago

I built a SQL agent with detailed database context and a set of tools. It’s been a huge lift for me and the team in generating rather complex queries that would take non trivial time to construct, even if using cursor or ChatGPT.

dartharva•7mo ago

I'm in the process of building one too. Handing off SQL queries to LLMs feels like a no-brainer.

miketery•7mo ago

Awesome! Let me know how your experience is or if you have any questions.

I went pretty simple, used OpenAI agent sdk and built a couple of tools like “run_query” with read only connection. Initially I also had a tool for getting the join path from A to B, but the context I wrote out was sufficient.

I think main challenge with this agent is how to keep the context up to date.

tony_codes•7mo ago

Enabling users at jumblejournal.org to journal by hand using openAI OCR. Also, for journal extraction of growth vectors

ohxh•7mo ago

Lots of non-chatbot uses in property management. Auditing leases vs. payment ledgers. Classifying maintenance work orders. Creating work orders from inspections (photos + text). Scheduling vendors to fix these issues. Etc.

ArneVogel•7mo ago

I am using it for FisherLoop [1] to translate text/extract vocabulary/generate example sentences in different languages. I found it pretty reliable for longer paragraphs. For one sentence translations it lacks context and I have to manually edit sometimes. I tried adding more context like the paragraph before and after, but then I found it wouldn't follow the instructions and only translate the paragraph I wanted but also the context, which I found no good way to prevent. So now I manually verify, but it saves me still ~98% of the work.

[1] https://www.fisherloop.com/en/

tibbar•7mo ago

Internal research assistants. Essentially 'deep research' hooked up to the internal data lake, knowledge bases, etc. It takes some iterations to make a tool like this actually effective, but once you've fixed the top N common roadblocks, it just sorta works. Modern (last 6 months) of models are amazing.

If all you've built is RAG apps up to this point, I highly recommend playing with some LLM-in-a-loop-with-tools reasoning agents. Totally new playing field.

jabroni_salad•7mo ago

One of my clients is doing m&a like crazy and we are now using it to help with directory merging. Every HR and IT department does things a little differently and we want to match them to our predefined roles for app licensing and access control.

You used to either budget for data entry or just graft directories in a really ugly way. The forest used to know about 12000 unique access roles and now there are only around 170.

gametorch•7mo ago

1. Pre-prompting for image and video generation. Gives you way better results for less than a cent of added cost. Although many image models do this thing for you; you have to understand each individual model and apply this judiciously.

2. I build REPLs into any manual workflow that makes use of LLMs. Instead of just being like "F@ck, it didn't work!" you can instead tell the LLM why it didn't work and help it get the right answer. Saves a ton of time.

3. Coming up with color palettes, themes, and ideas for "content". LLMs are really good at pumping out good looking input for whatever factory you have built.

slake•7mo ago

How do you do the REPL integration into the manual workflow? Could you explain that a bit?

jackthetab•7mo ago

Which LLMs and plans are you guys using for all of these cool ideas?

ATM I use ChatGPT Plus for everything except coding inside my Jetbrains IDEs.

I'm starting to look around at other LLMs for non-coding purposes (brainstorming, docs, being a project manager, summarizing, learning new subjects, etc.).

PestoDiRucola•7mo ago

Claude Sonnet 4 is pretty good for programming stuff, albeit pretty expensive. It's also good for other things as well. I like using it as a conversational partner when developing the systems architecture whenever I'm working on something new.

Gemini 2.5 is pretty cheap and has a huge context window, although not as good as Claude for programming. For that reason I would suggest to use is through the API if you're building a product that has an LLM step.

scarface_74•7mo ago

Chatbots with some nuance. I work with voice and chat call centers hosted on Amazon Connect - the AWS version of the call center that Amazon uses internally.

Traditionally and still how it works in most call centers, you have to explicitly list out the things you can handle (intents), what sentences trigger them (utterances) and slots - ie “I want to get a flight from {origin} to {destination}” the variable parts would be the slots

Anyway, absolutely no company would or should trust an LLM to generate output to a customer. It never ends well. I use Gen ai to categorize free text input from a customer into a set of intents the system can handle and fill in the slots. But the output is very much on rails

It works a lot better than the old school method.

mulmboy•7mo ago

We operate a saas where a common step is inputting rates of widgets in $/widget, $/widget/day, $/1kwidgets, etc etc. These are incredibly tedious and error prone to enter. And usually the source of these rates is an invoice which presents them in ambiguous ways e.g. rows with "quantity" and "charge" from which you have to back calculate the rate. And these invoices are formatted in all different ways.

We offer a feature to upload the invoice and we pull out all the rates for you. Uses LLMs under the hood. Fundamentally it's a "chatgpt wrapper" but there's a massive amount of work in tweaking the prompts based on evals, splitting things up into multiple calls, etc.

And it works great! Niche software, but for power users were saving them tens of minutes of monotonous work per day and in all likelihood entering things more accurate. This complements the manual entry process with full ability to review the results. Accuracy is around 98-99 percent.

pploug•7mo ago

Started using LLMs to do semantic testing of website content, not just the site works functionally, but that it works as the user would expect, given an intent and use-case description

http://plo.ug/llms,/typescript,/testing/2025/06/26/LLMs-for-...

incomingpain•7mo ago

My startup, https://mapleintel.ca has an original threatfeed that's 100% reliable, all ip addresses have directly attacked me.

The new AI threatfeed is everything above + I'm using AI to make rapid decisions for me. I can pull info from sources like dnsbl etc to help judge. If I were to do it manually, maybe 1 ip per 30 seconds? Phi4, omg 1 every second.

careful_ai•7mo ago

That thread nails a common clash: AI tools promise scale, but often just shift complexity to human coordination.

What I’ve noticed in my own projects is similar: every shiny AI integration spawns a hidden cost—coordination overhead, new edge cases, unexpected governance needs—everything that sits between "works in demo" and "works at scale."

We should be wary of framing AI as efficiency silver bullets. Instead, the real work is in system integration—making AI enhancements feel seamless, not another silo.

Al Lowe on model trains, funny deaths and working with Disney

Hoot: Scheme on WebAssembly

First Proof

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Stories from 25 Years of Software Development

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

The Waymo World Model

France's homegrown open source online office suite

Software factories and the agentic moment

Vocal Guide – belt sing without killing yourself

The AI boom is causing shortages everywhere else

Coding agents have replaced every framework I used

A Fresh Look at IBM 3270 Information Display System

What Is Stoicism?

72M Points of Interest

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

British drivers over 70 to face eye tests every three years

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Making geo joins faster with H3 indexes

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

Ga68, a GNU Algol 68 Compiler

Show HN: I spent 4 years building a UI design tool with only the features I use

An Update on Heroku

Show HN: If you lose your memory, how to regain access to your computer?

What Is Ruliology?

Al Lowe on model trains, funny deaths and working with Disney

Hoot: Scheme on WebAssembly

First Proof

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Stories from 25 Years of Software Development

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

The Waymo World Model

France's homegrown open source online office suite

Software factories and the agentic moment

Vocal Guide – belt sing without killing yourself

The AI boom is causing shortages everywhere else

Coding agents have replaced every framework I used

A Fresh Look at IBM 3270 Information Display System

What Is Stoicism?

72M Points of Interest

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

British drivers over 70 to face eye tests every three years

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Making geo joins faster with H3 indexes

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

Ga68, a GNU Algol 68 Compiler

Show HN: I spent 4 years building a UI design tool with only the features I use

An Update on Heroku

Show HN: If you lose your memory, how to regain access to your computer?

What Is Ruliology?

Ask HN: What are you actually using LLMs for in production?

Comments