Journalism works.
https://en.wikipedia.org/wiki/Investigations_into_the_Eric_A...
Or was it perhaps one of those cases where they found issues, but the only way to really know for sure that the deleterious impact is significant enough by pushing it to prod?
Perhaps a big fat check was involved.
Just days out of office, he made a few million off a crypto scam. Buffoonishly corrupt. https://finance.yahoo.com/news/eric-adams-promoted-memecoin-...
Considering Louis Rossmann's videos on his adventures with NYC bureaucracy (e.g. [0]), the QAers might not have known the laws any better than the chat bot.
How do you QA black box non-deterministic system? I'm not being facetious, seriously asking.
EDIT: Formatting
The thing is (and maybe this is what parent meant by non-determinism, in which case I agree it's a problem), in this brave new technological use-case, the space of possible interactions dwarfs anything machines have dealt with before. And it seems inevitable that the space of possible misunderstandings which can arise during these interactions will balloon similarly. Simply because of the radically different nature of our AI interlocutor, compared to what (actually, who) we're used to interacting with in this world of representation and human life situations.
It's the training data that matters. Your "AI interlocutor" is nothing more than a lossy compression algorithm.
Using the probabilities encoded in the training data.
> In that sense they are not compressing the data
You're right. In this case they're decompressing it.
User input: Does NYC provide disability benefits? if so, for how long?
RAG pipeline: 1 result found in Postgres, here's the relevant fragment: "In New York City, disability benefits provide cash assistance to employees who are unable to work due to off-the-job injuries or illnesses, including disabilities from pregnancies. These benefits are typically equal to 50% of the employee's average weekly wage, with a maximum of $170 per week, and are available for up to 26 weeks within a 52-week period."
LLM scaffolding: "You are a helpful chatbot. Given the question above and the data provided, reply to the user in a kind helpful way".
the LLM here is only "using the probability encoded in the training data" to know that after "Yes, it does" it should output the token "!"However, it is not "decompressing" its "training data" to write
the maximum duration, however, is 26 weeks within a 52-week period!
It is just getting this from the data provided at run-time in the prompt, not from training data.And surprise is really what you want in computing. ;)
By "non-deterministic" I meant that it can give you different output for the same input. Ask the same question, get a different answer every time, some of which can be accurate, some... not so much. Especially if you ask the same question in the same dialog (so question is the same but the context is not so the answer will be different).
EDIT: More interestingly, I find an issue, what do I even DO? If it's not related to integrations or your underlying data, the black box just gave nonsensical output. What would I do to resolve it?
Lots of stuff you could do. Adjust the system prompt, add guardrails/filters (catching mistakes and then asking the LLM loop again), improve the RAG (assuming they have one), fine tune (if necessary), etc.
That’s not strictly how I test my systems. I can release with confidence because of a litany of SWE best practices learned and borrowed from decades of my own and other people’s experiences.
> No system is guaranteed to never fail, it's all about degree of effectiveness and resilience.
It seems like the product space for services built on generative AI is diminishing by the day with respect to “effectiveness and resilience”. I was just laughing with some friends about how terrible most of the results are when using Apple’s new Genmoji feature. Apple, the company with one of the largest market caps in the world.
I can definitely use LLMs and other generative AI directly, and understand the caveats, and even get great results from them. But so far every service I’ve interacted with that was a “white label” repackaging of generative AI has been absolute dogwater.
I'm sure they QA'd it, but QA was probably "does this give me good results" (almost certainly 'yes' with an LLM), not "does this consistently not give me bad results".
LLMs can handle search because search is intentionally garbage now and because they can absorb that into their training set.
Asking highly specific questions about NYC governance, which can change daily, is almost certainly 'not' going to give you good results with an LLM. The technology is not well suited to this particular problem.
Meanwhile if an LLM actually did give you good results it's an indication that the city is so bad at publishing information that citizens cannot rightfully discover it on their own. This is a fundamental problem and should be solved instead of layering a $600k barely working "chat bot" on top the mess.
But as you say that LLM's cant handle search. One of the things that I can't understand and I hope you help in is that this doesn't have to be this way.
Kagi exists (I think I like the product/product idea even though I haven't bought it but I have tried it). Kagi's assistants can actually use Kagi search engine itself which is really customizable and you can almost have a lot of search settings filtered and Kagi overall is considered by many people as giving good search.
Not to be a sponsor of kagi or anything but if this is such a really big problem assuming that NYC literally had to kill a bot because of it & the reason you mention is the garbage in garbage out problem of search happening.
I wonder if Kagi could've maybe helped in it. I think that they are B-corp so they would've really appreciated the support itself if say NYC would've used them as a search layer.
https://dl.acm.org/doi/epdf/10.1145/3780063.3780066 (PDF loads slow....)
https://apnews.com/article/eric-adams-crypto-meme-coin-942ba...
I think he is simply not very bright, and got mesmerized by all the shiny promises AI and crypto makes without the slightest understanding of how it actually works. I do not understand how he got into office in the first place.
There’s no amount of qa that could save this.
Hell put WOPR in charge of everything
This was experimental tech... while I admire cities attempting to implement AI, it seems they did not spend enough tax dollars on it!
[0] https://abc7ny.com/post/ai-artificial-intelligence-eric-adam...
And Fiorello La Guardia was (in terms of beliefs and enacted policy) even more socialist than Mamdani is even though he was technically a Republican when elected.
What I do know is that if it was done my way it would be pretty easy for it to do what the Google AI does. Say it's not responsible, give links for humans to fact check it. I've noticed a dramatic drop in hallucinations after it had to provide links to its sources. Still not 0, though.
I thought Gemini just started providing citations in the last few months. Are you saying they should have beaten Google to the punch on this? As part of the $500,000 budget?
Also, search says they did links in 2024 for the Google AI. So there's that.
I’ve noticed that Google does a fair job at linking to relevant sources but it’s still fairly common for it to confabulate something that source doesn’t say or even directly contradicts. It seems to hit the underlying inability to reason where if the source covers more than one thing it’s prone to taking an input “X does A while Y does B” and emitting “Y does A” or “X does A and B”. It’s a fascinating failure mode which seems to be insurmountable.
sylens•1w ago
When is the last time there was positive news involving Microsoft? This bot could've easily been on AWS or GCP but I find it hilarious that here they are, getting dragged yet again
embedding-shape•1w ago
fragmede•1w ago
walterbell•1w ago
paxys•1w ago