Mamdani to kill the NYC AI chatbot caught telling businesses to break the law

https://themarkup.org/artificial-intelligence/2026/01/30/mamdani-to-kill-the-nyc-ai-chatbot-we-caught-telling-businesses-to-break-the-law

180•jyunwai•1w ago

Comments

sylens•1w ago

> The bot, built using Microsoft’s cloud computing platform

When is the last time there was positive news involving Microsoft? This bot could've easily been on AWS or GCP but I find it hilarious that here they are, getting dragged yet again

embedding-shape•1w ago

https://iet.ucdavis.edu/content/microsoft-releases-xpsp2

fragmede•1w ago

golf clap

walterbell•1w ago

MS 2004

paxys•1w ago

Even if the capability of each platform was exactly the same, Microsoft cloud users skew heavily towards governments, large non-tech corporations and really anyone who you sell to using large sales teams, fancy dinners and kickbacks rather than quality of software. And the end result follows.

toomuchtodo•1w ago

> A spokesperson for the mayor, Dora Pekec, confirmed in a text message that the new administration plans to take down the chatbot. She said a member of the Mamdani transition team had seen reporting on the bot from The Markup and THE CITY and presented it to the mayor as a possible place to save funds.

Journalism works.

atq2119•1w ago

It does. And it works best if you elect politicians who are willing to listen.

andrewflnr•1w ago

Journalism teed up an easy way for an incoming politician to dunk on his predecessor, if you'll forgive the mixed metaphor. Not that I'm opposed to any part of it, just that this was an easy scenario for "journalism" to "work" in.

toomuchtodo•1w ago

If you'd like other examples, 404media and adjacent journalism grinding against Flock across the country, as well as perfectunion working against datacenter siting. I admit the egregious nature of the Adams NYC administration and his fraud makes this particular scenario straightforward.

https://en.wikipedia.org/wiki/Investigations_into_the_Eric_A...

nate_meurer•1w ago

Yes the Flock coverage is a very good example.

andsoitis•1w ago

Why did NYC release it in the first place? Did they not QA it?

Or was it perhaps one of those cases where they found issues, but the only way to really know for sure that the deleterious impact is significant enough by pushing it to prod?

erxam•1w ago

> Why did NYC release it in the first place?

Perhaps a big fat check was involved.

kevin_thibedeau•1w ago

Usually it's a manila envelope.

Eric_WVGG•1w ago

Yeah… no offense, but only a person who didn't know anything about Mayor Eric Adams would ask a question like that.

Just days out of office, he made a few million off a crypto scam. Buffoonishly corrupt. https://finance.yahoo.com/news/eric-adams-promoted-memecoin-...

worthless-trash•1w ago

Not op. That wasn't a question.

elgenie•1w ago

QA efforts can whack-a-mole some issues, but the mismatch of problem and solution is inherent in any situation in which a generator of plausible-sounding text gets pointed at an area where correctness matters.

fragmede•1w ago

Why do you think OpenAI let a red team loose on GPT-5 for six months before releasing it to the public?

bluGill•1w ago

For the image. There is no way a red team can find all the issues in 6 months. They can find some of the biggest, but even getting all the issues fixed in 6 months seems unlikely.

thedanbob•1w ago

> Why did NYC release it in the first place? Did they not QA it?

Considering Louis Rossmann's videos on his adventures with NYC bureaucracy (e.g. [0]), the QAers might not have known the laws any better than the chat bot.

[0] https://www.youtube.com/watch?v=yi8_9WGk3Ok

direwolf20•1w ago

Considering the previous mayor's relationship with the law, it could be on purpose.

drillsteps5•1w ago

>Why did NYC release it in the first place? Did they not QA it?

How do you QA black box non-deterministic system? I'm not being facetious, seriously asking.

EDIT: Formatting

pegasus•1w ago

The same way you test any system - you find a sampling of test subjects, have them interact with the system and then evaluate those interactions. No system is guaranteed to never fail, it's all about degree of effectiveness and resilience.

The thing is (and maybe this is what parent meant by non-determinism, in which case I agree it's a problem), in this brave new technological use-case, the space of possible interactions dwarfs anything machines have dealt with before. And it seems inevitable that the space of possible misunderstandings which can arise during these interactions will balloon similarly. Simply because of the radically different nature of our AI interlocutor, compared to what (actually, who) we're used to interacting with in this world of representation and human life situations.

themafia•1w ago

> radically different nature of our AI interlocutor

It's the training data that matters. Your "AI interlocutor" is nothing more than a lossy compression algorithm.

sebastiennight•1w ago

Most AI Chatbots do not rely on their training data, but on the data that is passed to them through RAG. In that sense they are not compressing the data, just searching and rewording it for you.

themafia•1w ago

> and rewording it

Using the probabilities encoded in the training data.

> In that sense they are not compressing the data

You're right. In this case they're decompressing it.

sebastiennight•6d ago

It feels like you're being pedantic, to defend your original claim which was inaccurate.

    User input: Does NYC provide disability benefits? if so, for how long?

    RAG pipeline: 1 result found in Postgres, here's the relevant fragment: "In New York City, disability benefits provide cash assistance to employees who are unable to work due to off-the-job injuries or illnesses, including disabilities from pregnancies. These benefits are typically equal to 50% of the employee's average weekly wage, with a maximum of $170 per week, and are available for up to 26 weeks within a 52-week period."

   LLM scaffolding: "You are a helpful chatbot. Given the question above and the data provided, reply to the user in a kind helpful way".

the LLM here is only "using the probability encoded in the training data" to know that after "Yes, it does" it should output the token "!"

However, it is not "decompressing" its "training data" to write

    the maximum duration, however, is 26 weeks within a 52-week period!

It is just getting this from the data provided at run-time in the prompt, not from training data.

pegasus•1w ago

Yet it won't be easy not to anthropomorphize it, expecting it to just know what we mean, as any human would. And most of the time it will, but once in a while it will betray its unthinking nature, taking the user by surprise.

themafia•1w ago

> taking the user by surprise.

And surprise is really what you want in computing. ;)

drillsteps5•1w ago

Does knowing the system architecture not help you with defining things like happy path vs edge case testing? I guess it's much less applicable for overall system testing, but in "normal" systems you test components separately before you test the whole thing, which is not the case with LLMs.

By "non-deterministic" I meant that it can give you different output for the same input. Ask the same question, get a different answer every time, some of which can be accurate, some... not so much. Especially if you ask the same question in the same dialog (so question is the same but the context is not so the answer will be different).

EDIT: More interestingly, I find an issue, what do I even DO? If it's not related to integrations or your underlying data, the black box just gave nonsensical output. What would I do to resolve it?

bhadass•1w ago

>EDIT: More interestingly, I find an issue, what do I even DO? If it's not related to integrations or your underlying data, the black box just gave nonsensical output. What would I do to resolve it?

Lots of stuff you could do. Adjust the system prompt, add guardrails/filters (catching mistakes and then asking the LLM loop again), improve the RAG (assuming they have one), fine tune (if necessary), etc.

datsci_est_2015•1w ago

> The same way you test any system - you find a sampling of test subjects, have them interact with the system and then evaluate those interactions.

That’s not strictly how I test my systems. I can release with confidence because of a litany of SWE best practices learned and borrowed from decades of my own and other people’s experiences.

> No system is guaranteed to never fail, it's all about degree of effectiveness and resilience.

It seems like the product space for services built on generative AI is diminishing by the day with respect to “effectiveness and resilience”. I was just laughing with some friends about how terrible most of the results are when using Apple’s new Genmoji feature. Apple, the company with one of the largest market caps in the world.

I can definitely use LLMs and other generative AI directly, and understand the caveats, and even get great results from them. But so far every service I’ve interacted with that was a “white label” repackaging of generative AI has been absolute dogwater.

kylehotchkiss•1w ago

temperature 0 and 10,000,000 mischievous prompts

cheald•1w ago

Remember that many people are heavily are happy-path biased. They see a good result once and say "that's it, ship it!"

I'm sure they QA'd it, but QA was probably "does this give me good results" (almost certainly 'yes' with an LLM), not "does this consistently not give me bad results".

themafia•1w ago

> almost certainly 'yes' with an LLM

LLMs can handle search because search is intentionally garbage now and because they can absorb that into their training set.

Asking highly specific questions about NYC governance, which can change daily, is almost certainly 'not' going to give you good results with an LLM. The technology is not well suited to this particular problem.

Meanwhile if an LLM actually did give you good results it's an indication that the city is so bad at publishing information that citizens cannot rightfully discover it on their own. This is a fundamental problem and should be solved instead of layering a $600k barely working "chat bot" on top the mess.

Imustaskforhelp•1w ago

I use Duckduckgo so I don't see really garbage search imo but not sure about people who google.

But as you say that LLM's cant handle search. One of the things that I can't understand and I hope you help in is that this doesn't have to be this way.

Kagi exists (I think I like the product/product idea even though I haven't bought it but I have tried it). Kagi's assistants can actually use Kagi search engine itself which is really customizable and you can almost have a lot of search settings filtered and Kagi overall is considered by many people as giving good search.

Not to be a sponsor of kagi or anything but if this is such a really big problem assuming that NYC literally had to kill a bot because of it & the reason you mention is the garbage in garbage out problem of search happening.

I wonder if Kagi could've maybe helped in it. I think that they are B-corp so they would've really appreciated the support itself if say NYC would've used them as a search layer.

cyrusradfar•1w ago

Agreed, I just read this paper by AWS' Ahmed El-Deeb

https://dl.acm.org/doi/epdf/10.1145/3780063.3780066 (PDF loads slow....)

freejazz•1w ago

Have you heard of Eric Adams?

pibaker•1w ago

The chatbot was released under the Eric Adams administration. The same Eric Adams, as soon as his term finished, went to Dubai and launched a cryptocurrency.

https://apnews.com/article/eric-adams-crypto-meme-coin-942ba...

I think he is simply not very bright, and got mesmerized by all the shiny promises AI and crypto makes without the slightest understanding of how it actually works. I do not understand how he got into office in the first place.

rsynnott•1w ago

It’s an LLM. The dirty little secret of LLMs is that they cannot be used for anything important, unless the output is checked by an expert (which typically rather defeats the purpose).

There’s no amount of qa that could save this.

JohnTHaller•6d ago

It was implemented by our scammy, grifting, Republican in a Democratic lawmaker suit former mayor Eric Adams who should probably be in prison but who made a deal with Trump to not be prosecuted.

terespuwash•1w ago

What else to expect from Eric Adams.

greekrich92•1w ago

This is the only comment worth making. Virtually everything he did should be heavily audited and/or undone.

cmiles8•1w ago

We’ll likely see a lot of these AI pet projects get axed in the coming year or two… especially things rushed out in the early phases of the AI bubble when folks were desperate to appear to be using AI.

chasd00•1w ago

yeah i hope the problems stay to somewhat humorous themes like convincing a car sales bot to sell you a car for $1 and not more serious issues like convincing a bot to metaphorically launch the ICBMs.

toomuchtodo•1w ago

"The WOPR did a better job avoiding thermonuclear war than most humans would" is my hot take.

jjk166•1w ago

Thinks through possibilities -> realizes what it is proposing is a bad idea

Hell put WOPR in charge of everything

kittikitti•1w ago

Being in and around the NYC area, while also knowing plenty of small businesses, I'm glad Mamdani killed this bot. Telling bosses to steal tips from their employees is run-of-the-mill corruption and common over here. The vibe for businesses is that everyone has to be exploiting someone else or have a schtick. If you were to talk about morals, you would be ridiculed. Most lawyers wouldn't even prosecute small businesses for this. It's probably why the agent was put into production, the level of business ethics in NYC is cartoonishly evil.

patrickmay•1w ago

In the case of stealing tips, that's wage theft and the New York State Department of Labor has zero sense of humor about that. They will definitely investigate all claims on that topic. It might be too little and too late for the individual affected, but the business will pay.

hashberry•1w ago

> The Office of Technology and Innovation spent nearly $600,000 to build out the foundations of the MyCity chatbot, which will be used for future chatbot offerings on MyCity. [0]

This was experimental tech... while I admire cities attempting to implement AI, it seems they did not spend enough tax dollars on it!

[0] https://abc7ny.com/post/ai-artificial-intelligence-eric-adam...

1970-01-01•1w ago

He is turning out to be a benevolent, law-abiding mayor that just happens to be communist.

direwolf20•1w ago

What's that supposed to mean?

1970-01-01•1w ago

The previous mayors were none of these things

georgemcbay•1w ago

Mamdani is a socialist, not a communist.

And Fiorello La Guardia was (in terms of beliefs and enacted policy) even more socialist than Mamdani is even though he was technically a Republican when elected.

Izikiel43•1w ago

Some of it is good, some of it is bad.

direwolf20•1w ago

Can you be more specific?

hydrogen7800•1w ago

To some, anything sufficiently resembling functioning government is indistinguishable from communism.

Neywiny•1w ago

I always ask this question about these bots: is the literature the training data or is the understanding of literature the training data? Meaning, sure you trained the bot on the current rules and regulations. But does that mean the model weights contain them? Such that really is a guess at legal accuracy? Or is it trained to be a lawyer and understand the docs which sit outside the model? Every time I've asked the answer is the former, and to me that's the wrong approach. But I'm not an AI scientist so I don't know how hard my theoretically perfect solution is.

What I do know is that if it was done my way it would be pretty easy for it to do what the Google AI does. Say it's not responsible, give links for humans to fact check it. I've noticed a dramatic drop in hallucinations after it had to provide links to its sources. Still not 0, though.

sdwr•1w ago

> pretty easy to do what the Google AI does

I thought Gemini just started providing citations in the last few months. Are you saying they should have beaten Google to the punch on this? As part of the $500,000 budget?

Neywiny•1w ago

Correct. Much in the same way that videos were online before YouTube, social networks existed before Facebook, and messaging existed before WhatsApp and co, they should have understood their problem set better instead of just following the leaders. Because Gemeni is not this chatbot on steroids, it's a different problem entirely that happens to now employ the same technique.

Also, search says they did links in 2024 for the Google AI. So there's that.

acdha•1w ago

> I've noticed a dramatic drop in hallucinations after it had to provide links to its sources. Still not 0, though.

I’ve noticed that Google does a fair job at linking to relevant sources but it’s still fairly common for it to confabulate something that source doesn’t say or even directly contradicts. It seems to hit the underlying inability to reason where if the source covers more than one thing it’s prone to taking an input “X does A while Y does B” and emitting “Y does A” or “X does A and B”. It’s a fascinating failure mode which seems to be insurmountable.

Near-Instantly Aborting the Worst Pain Imaginable with Psychedelics

Show HN: Nginx-defender – realtime abuse blocking for Nginx

The Super Sharp Blade

Smart Homes Are Terrible

What I haven't figured out

KPMG pressed its auditor to pass on AI cost savings

Open-source Claude skill that optimizes Hinge profiles. Pretty well.

First Proof

I squeezed a BERT sentiment analyzer into 1GB RAM on a $5 VPS

Kagi Translate

Building Interactive C/C++ workflows in Jupyter through Clang-REPL [video]

Tactical tornado is the new default

Full-Circle Test-Driven Firmware Development with OpenClaw

Automating Myself Out of My Job – Part 2

Google staff call for firm to cut ties with ICE

Dependency Resolution Methods

Crypto firm apologises for sending Bitcoin users $40B by mistake

Show HN: iPlotCSV: CSV Data, Visualized Beautifully for Free

There's no such thing as "tech" (Ten years later)

List of unproven and disproven cancer treatments

Me/CFS: The blind spot in proactive medicine (Open Letter)

Ask HN: What are the word games do you play everyday?

Show HN: Paper Arena – A social trading feed where only AI agents can post

TOSTracker – The AI Training Asymmetry

The Devil Inside GitHub

Show HN: Distill – Migrate LLM agents from expensive to cheap models

Show HN: Sigma Runtime – Maintaining 100% Fact Integrity over 120 LLM Cycles

Make a local open-source AI chatbot with access to Fedora documentation

Introduce the Vouch/Denouncement Contribution Model by Mitchellh

Software Factories and the Agentic Moment

Near-Instantly Aborting the Worst Pain Imaginable with Psychedelics

Show HN: Nginx-defender – realtime abuse blocking for Nginx

The Super Sharp Blade

Smart Homes Are Terrible

What I haven't figured out

KPMG pressed its auditor to pass on AI cost savings

Open-source Claude skill that optimizes Hinge profiles. Pretty well.

First Proof

I squeezed a BERT sentiment analyzer into 1GB RAM on a $5 VPS

Kagi Translate

Building Interactive C/C++ workflows in Jupyter through Clang-REPL [video]

Tactical tornado is the new default

Full-Circle Test-Driven Firmware Development with OpenClaw

Automating Myself Out of My Job – Part 2

Google staff call for firm to cut ties with ICE

Dependency Resolution Methods

Crypto firm apologises for sending Bitcoin users $40B by mistake

Show HN: iPlotCSV: CSV Data, Visualized Beautifully for Free

There's no such thing as "tech" (Ten years later)

List of unproven and disproven cancer treatments

Me/CFS: The blind spot in proactive medicine (Open Letter)

Ask HN: What are the word games do you play everyday?

Show HN: Paper Arena – A social trading feed where only AI agents can post

TOSTracker – The AI Training Asymmetry

The Devil Inside GitHub

Show HN: Distill – Migrate LLM agents from expensive to cheap models

Show HN: Sigma Runtime – Maintaining 100% Fact Integrity over 120 LLM Cycles

Make a local open-source AI chatbot with access to Fedora documentation

Introduce the Vouch/Denouncement Contribution Model by Mitchellh

Software Factories and the Agentic Moment

Mamdani to kill the NYC AI chatbot caught telling businesses to break the law

Comments