1. I had two text documents containing plain text to compare. One with minor edits (done by AI).
2. I wanted to see what AI changed in my text.
3. I tried the usual diff tools. They diffed line by line and result was terrible. I searched google for "text comparison tool but not line-based"
4. As second search result it found me https://www.diffchecker.com/ (It's a SaaS, right?)
5. Initially it did equally bad job but I noticed it had a switch "Real-time diff" which did exactly what I wanted.
6. I got curious what is this algorithm. So I asked Gemini with "Deep Research" mode: "The website https://www.diffchecker.com/ uses a diff algorithm they call real-time diff. It works really good for reformatted and corrected text documents. I'd like to know what is this algorithm and if there's any other software, preferably open-source that uses it."
7. As a first suggestion it listed diff-match-patch from Google. It had Python package.
8. I started Antigravity in a new folder, ran uv init. Then I prompted the following:
"Write a commandline tool that uses https://github.com/google/diff-match-patch/wiki/Language:-Py... to generate diff of two files and presents it as side by side comparison in generated html file."
[...]
"I installed the missing dependance for you. Please continue." - I noticed it doesn't use uv for installing dependencies so I interrupted and did it myself.
[...]
"This project uses uv. To run python code use
uv run python test_diff.py" - I noticed it still doesn't use uv for running the code so its testing fails.
[...]
"Semantic cleanup is important, please use it." - Things started to show up but it looked like linear diff. I noticed it had a call to semantic cleanup method commented out so I thought it might help if I push it in that direction.
[...]
"also display the complete, raw diff object below the table" - the display of the diff still didn't seem good so I got curious if it's the problem with the diffing code or the display code
[...]
"I don't see the contents of the object, just text {diffs}" - it made a silly mistake by outputting template variable instead of actual object.
[...]
"While comparing larger files 1.txt and 2.txt I notice that the diff is not very granular. Text changed just slightly but the diff looks like deleting nearly all the lines of the document, and inserting completely fresh ones. Can you force diff library to be more granular?
You seem to be doing the right thing https://github.com/google/diff-match-patch/wiki/Line-or-Word... but the outcome is not good.
Maybe there's some better matching algoritm in the library?" - it seemed that while on small tests that Antigravity made itself it worked decently but on the texts that I actually wanted to compare was still terrible although I've seen glimpses of hope because some spots were diffed more granularly. I inspected the code and it seemed to be doing character level diffing as per diff-match-patch example. While it processed this prompt I was searching for solution myself by clicking around diff-match-patch repo and demos. I found a potential solution by adjusting cleanup, but it actually solved the problem by itself by ditching the character level diffing (which I'm not sure I would have come up with at this point). Diffed object looked great but as I compared the result to https://www.diffchecker.com/ output it seemed that they did one minor thing about formatting better.
[...]
"Could you use rowspan so that rows on one side that are equivalent to multiple rows on the other side would have same height as the rows on the other side they are equivalent to?" - I felt very clumsily trying to phrase it and I wasn't sure if Antigravity will understand. But it did and executed perfectly.
I didn't have to revert a single prompt and interrupted just two times at the beginning.
After a while I added watch functionality with a single prompt:
"I'd like to add a -w (--watch) flag that will cause the program to keep running and monitor source files to diff and update the output diff file whenever they change."
[...]
So I basically went from having two very similar text files and knowing very little about diffing to knowing a bit more and having my own local tool that let's me compare texts in satisfying manner, with beautiful highlighting and formatting, that I can extend or modify however I like, that mirrors interesting part of the functionality of the best tool I found online. And all of that in the time span shorter than it took me to write this comment (at least the coding part was, I followed few wrong paths during my search for a bit).
My experience tells me that even if I could replicate what I did today (keeping motivated is an issue for me), it would most likely be multi-day project full of frustration and hunting small errors and venturing into wrong paths. Python isn't even my strongest language. Instead it was a pleasant and fun evening with occasional jaw drops and feeling so blessed that I live in SciFi times I read about as a kid (and adult).
Um. I don't want to be That Guy (shouting at clouds, or at kids to get off my lawn or whatever) but ... what "usual diff" tools did you use? Because comparing two text files with minor edits is exactly what diff-related tools have excelled at for decades.
There is word-level diff, for example. Was that not good enough? Or delta [0] perhaps?
> The signals I'm seeing
Here are the signals:
> If I want an internal dashboard...
> If I need to re-encode videos...
> This is even more pronounced for less pure software development tasks. For example, I've had Gemini 3 produce really high quality UI/UX mockups and wireframes
> people really questioning renewal quotes from larger "enterprise" SaaS companies
Who are "people"?
Is the author a competent UX designer who can actually judge the quality of the UX and mockups?
> I write about web development, AI tooling, performance optimization, and building better software. I also teach workshops on AI development for engineering teams. I've worked on dozens of enterprise software projects and enjoy the intersection between commercial success and pragmatic technical excellence.
Nope.
Then it dawned on me how many companies are deeply integrating Copilot into their everyday workflows. It's the perfect Trojan Horse.
None of the mainstream paid services ingest operating data into their training sets. You will find a lot of conspiracy theories claiming that companies are saying one thing but secretly stealing your data, of course.
What? That’s literally my point: Enterprise agreements aren’t training on the data of their enterprise customers like the parent commenter claimed.
Nothing is really preventing this though. AI companies have already proven they will ignore copyright and any other legal nuisance so they can train models.
The enterprise user agreement is preventing this.
Suggesting that AI companies will uniquely ignore the law or contracts is conspiracy theory thinking.
It's not really a conspiracy when we have multiple examples of high profile companies doing exactly this. And it keeps happening. Granted I'm unaware of cases of this occuring currently with professional AI services but it's basic security 101 that you should never let anything even have the remote opportunity to ingest data unless you don't care about the data.
This is objectively untrue? Giants swaths of enterprise software is based on establishing trust with approved vendors and systems.
Do you have any citations or sources for this at all?
I hope you find some self awareness when you slip a disc bending over this much for these corpo fascists, especially when they are failing to hold their own language to your level of prevarication and puffery:
> When you use our services for individuals such as ChatGPT, Codex, and Sora, we may use your content to train our models.
https://help.openai.com/en/articles/5722486-how-your-data-is...
Stealing implies the thing is gone, no longer accessible to the owner.
People aren't protected from copying in the same way. There are lots of valid exclusions, and building new non competing tools is a very common exclusion.
The big issue with the OpenAI case, is that they didn't pay for the books. Scanning them and using them for training is very much likely to be protected. Similar case with the old Nintendo bootloader.
The "Corpo Fascists" are buoyed by your support for the IP laws that have thus far supported them. If anything, to be less "Corpo Fascist" we would want more people to have more access to more data. Mankind collectively owns the creative output of Humanity, and should be able to use it to make derivative works.
You know a position is indefensible when you equivocation fallacy this hard.
> The "Corpo Fascists" are buoyed by your support for the IP laws
You know a position is indefensible when you strawman this hard.
> If anything, to be less "Corpo Fascist" we would want more people to have more access to more data. Mankind collectively owns the creative output of Humanity, and should be able to use it to make derivative works.
Sounds about right to me, but why you would state that when defending slop slingers is enough to give me whiplash.
> Scanning them and using them for training is very much likely to be protected.
Where can I find these totally legal, free, and open datasets all of these slop slingers are trained on?
Isn't this a little simplistic?
If the value of something lies in its scarcity, then making it widely available has robbed the owner of a scarcity value which cannot be retrieved.
A win for consumers, perhaps, but a loss for the owner nonetheless.
“How can I control whether my data is used for model training?
If you are logged into Copilot with a Microsoft Account or other third-party authentication, you can control whether your conversations are used for training the generative AI models used in Copilot. Opting out will exclude your past, present, and future conversations from being used for training these AI models, unless you choose to opt back in. If you opt out, that change will be reflected throughout our systems within 30 days.” https://support.microsoft.com/en-us/topic/privacy-faq-for-mi...
At this point suggesting it has never and will her happen is wildly optimistic.
While this isn't used specifically for LLM training, it can involve aggregating insights from customer behaviour.
Merely using an LLM for inference does not train it on the prompts and data, as many incorrectly assume. There is a surprising lack of understanding of this separation even on technical forums like HN.
Many of the top AI services use human feedback to continuously apply "reinforcement learning" after the initial deployment of a pre-trained model.
https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu...
Inference (what happens when you use an LLM as a customer) is separate from training.
Inference and training are separate processes. Using an LLM doesn’t train it. That’s not what RLHF means.
The big companies - take Midjourney, or OpenAI, for example - take the feedback that is generated by users, and then apply it as part of the RLHF pass on the next model release, which happens every few months. That's why they have the terms in their TOS that allow them to do that.
Also I wonder if the ToS covers "queries & interaction" vs "uploaded data" - I could imagine some tricky language in there that says we wont use your word document, but we may at some time use the queries you put against it, not as raw corpus but as a second layer examining what tools/workflows to expand/exploit.
There’s a range of ways to lie by omission, here, and the major players have established a reputation for being willing to take an expansive view of their legal rights.
if they can get away with it (say by claiming it's "fair use"), they'll ignore corporate ones too
despite all 3 branches of the government disagreeing with them over and over again
There may very well be clever techniques that don't require directly training on the users' data. Perhaps generating a parallel paraphrased corpus as they serve user queries - one which they CAN train on legally.
The amount of value unlocked by stealing practically ~everyone's lunch makes me not want to put that past anyone who's capable of implementing such a technology.
Many businesses simply couldn't afford to operate without such an edge.
There are claims all through this thread that “AI companies” are probably doing bad things with enterprise customer data but nobody has provided a single source for the claim.
This has been a theme on HN. There was a thread a few weeks back where someone confidently claimed up and down the thread that Gemini’s terms of service allowed them to train on your company’s customer data, even though 30 seconds of searching leads to the exact docs that say otherwise. There is a lot of hearsay being spread as fact, but nobody actually linking to ToS or citing sections they’re talking about.
Summary is that for agents to work well they need clear vision into all things, and putting the data behind a gui or not well maintained CLI is a hinderance. Combined with how structured crud apps are an how the agents can for sure write good crud apps, no reason to not have your own. Wins all around with not paying for it, having a better understanding of processes, and letting agents handle workflows.
It's not the hackernews i knew even 3 years ago anymore and i'm seriously close to just ditching the site after 15+ years of use.
I use AI heavily but everyday there's crazy optimistic almost manic posts about how AI is going to take over various sectors that are completely ludicrous - and they are all filled with comments from bizarrely optimistic people that have seemingly no knowledge of how software is actually run or built, ie. it's the human organisational, research and management elements that are the hard parts, something AI can't do in any shape or form at the moment for any complex or even small company.
- anything that requires very high uptime
-very high volume systems and data lakes
-software with significant network effects
-companies that have proprietary datasets
-regulation and compliance is still very important
Then this project lets you generate static sites from svelte components (matches protobuf structures) and markdown (documentation) and global template variables: https://github.com/accretional/statue
A lot of the SaaS ecosystem actually has rather simple domain logic and oftentimes doesn't even model data very well, or at least not in a way that matches their clients/users mental models or application logic. A lot of the value is in integrations, or the data/scaling, or the marketing and developer experience, or some kind of expertise in actually properly providing a simple interface to a complex solution.
So why not just create a compact universal representation of that? Because it's not so big a leap to go beyond eating SaaS to eating integrations, migration costs/bad moats, and the marketing/documentation/wrapper.
Spreadsheets! They are everywhere. In fact, they are so abundant these days that that many are spawned for a quick job and immediately discarded. In fact, the cost of having these spreadsheets is practically zero so in many cases one may find themselves having hundreds if not thousands of them sitting around with no indication to ever being deleted. Spreadsheets are also personal and annoying especially when forced upon you (since you did not make it yourself). Spreadsheets are also programming for non-programmers.
These new vibe-coded tools are essentially the new spreadsheets. They are useful,... for 5 minutes. They are also easily forgettable. They are also personal (for the person who made them) and hated (by everyone else). I have no doubt in my mind that organisation will start using more and more of these new types of software to automate repetitive tasks, improve existing processes and so on but ultimately, apart from perhaps just a few, none will replace existing, purpose-built systems.
Ultimately you can make your own pretty dashboard that nobody else will see or use because when the cost of production is so low your users will want to create their own version because they would think they could do better.
After all, how hard is to prompt harder then the previous person?
Also, do you really think that SaaS companies are not deploying AI themselves? It is practically an arms race: the non-expert plus some AI vs 10 specialist developers plus their AIs doing this all day long.
Who is going to have the upper-hand?
At the same time, to the core theme of the article - do any of us think a small sassy SaaS like Bingo card creator could take off now? :-)
https://training.kalzumeus.com/newsletters/archive/selling_s...
The problem is, nobody knows how much and how fast AI will improve or how much it will cost if it does.
That uncertainty alone is very problematic and I think is being underestimated in terms of its impact on everything it can potentially touch.
For now though, I've seen a wall form in benchmarks like swe-rebench and swebench pro. Greenfield is expanding, but maintenance is still a problem.
I think AI needs to get much better at maintenance before serious companies can choose build over buy for anything but the most trivial apps.
The only named product was Retool.
It took me no more than 2 hours to put those together. We didn't renew our TeamRetro
I’m pretty certain AI quadruples my output at least and facilitates fixing, improving and upgrading poor quality inherited software much better than in the past. Why pay for SaaS when you can build something “good enough” in a week or two? You also get exactly what you want rather than some £300k per year CRM that will double or treble in price and never quite be what you wanted.
Soon or later the CTO will be dictating which projects can be vibe coded which ones make sense to buy.
SaaS benefits from network effects - your internal tools don't. So overall SaaS is cheaper.
The reality is that software license costs is a tiny fraction of total business costs. Most of it is salaries. The situation you are describing the kind of dead spiral many companies will get into and that will be their downfall not salvation.
About a decade ago we worked with a partner company who was building their own in-house software for everything. They used it as one of their selling points and as a differentiator over competitors.
They could move fast and add little features quickly. It seemed cool at first.
The problems showed up later. Everything was a little bit fragile in subtle ways. New projects always worked well on the happy path, but then they’d change one thing and it would trigger a cascade of little unintended consequences that broke something else. No problem, they’d just have their in-house team work on it and push out a new deploy. That also seemed cool at first, until they accumulated a backlog of hard to diagnose issues. Then we were spending a lot of time trying to write up bug reports to describe the problem in enough detail for them to replicate, along with constant battles over tickets being closed with “works in the dev environment” or “cannot reproduce”.
> You also get exactly what you want rather than some £300k per year CRM
What’s the fully loaded (including taxes and benefits) cost of hiring enough extra developers and ops people to run and maintain the in house software, complete with someone to manage the project and enough people to handle ops coverage with room for rotations and allowing holidays off? It turns out the cost of running in-house software at scale is always a lot higher than 300K, unless the company can tolerate low ops coverage and gaps when people go on vacation.
SaaS maintenance isn't about upgrading packages, it's about accountability and a point of contact when something breaks along with SLAs and contractual obligations. It isn't because building a kanban board app is hard. Someone else deals with provisioning, alerts, compliance, etc. and they are a real human who cannot hallucinate that the issue has been fixed when it hasn't. Depending on the contract and how it is breached, you can potentially take them to court and sue them to recover money lost as a result of their malpractice. None of that applies to a neural network that misreads the alert, does something completely wrong, then concludes the issue is fixed the way the latest models constantly do when I use them.
With AI, that equation is now changing. I anticipate that within 5 years autonomous coding agents will be able to rapidly and cheaply clone almost any existing software, while also providing hosting, operations, and support, all for a small fraction of the cost.
This will inevitably destroy many existing businesses. In order to survive, businesses will require strong network effects (e.g. marketplaces) or extremely deep data/compute moats. There will also be many new opportunities created by the very low cost of software. What could you build if it were possible to create software 1000x faster and cheaper?"
Paul Bucheit
AI-generated code still requires software engineers to build, test, debug, deploy, secure, monitor, be on-call, handle incidents, and so on. That's very expensive. It is much cheaper to pay a small monthly fee to a SaaS company.
Yeah it's a fundamental misunderstanding of economies of scale. If you build an in-house app that does X, you incur 100% of the maintenance costs. If you're subscribed to a SaaS product, you're paying for 1/N % of the maintenance costs, where N is the number of customers.
I only see AI-generated code replacing things that never made sense as a SaaS anyway. It's telling the author's only concrete example of a replaced SaaS product is Retool, which is much less about SaaS and much more about a product that's been fundamentally deprecated.
Wake me up when we see swaths of companies AI-coding internal Jira ("just an issue tracker") and Github Enterprise ("just a browser-based wrapper over git") clones.
MangoToupe•2h ago
Oh, child.... building is easy. Coordinating maintenance of the tool across a non-technical team is hell.
toomuchtodo•2h ago
lwhi•2h ago
Corporations think in terms of risk.
Second only to providing a useful function, a successful SaaS app will have been built to mitigate risk well.
It's not going to be easy to meet these requirements without prior knowledge and experience.