AI is going great for the blind (2023)

99•ljlolel•5mo ago

Comments

paulsutter•5mo ago

> Now that AI is a thing now, I doubt OCR and even self-driving cars will get any significant advancements.

Great to read that blind folks get so much benefit from LLMs. But this one quote seemed odd. The most amazing OCR and document attribution products are becoming available due to LLMs

rafram•5mo ago

LLM/VLM-based OCR is highly prone to hallucination - the model does not know when it can’t read a text, it can’t estimate its own confidence, and it deals with fuzzy/unclear texts by simply making things up. I would be very nervous using it for anything critical.

paulsutter•5mo ago

There are really amazing products coming

rafram•5mo ago

I’ll believe it when I see it.

jibal•5mo ago

The article is nearly 2 years old ... people don't have perfect foresight.

ljlolel•5mo ago

This was 2023 so I can only assume it’s gotten even better!

simonw•5mo ago

The headline is clearly meant to be sarcastic but the actual body of the text seems to indicate that AI back in 2023 was going pretty great for the blind - it mostly reports on others who are enthusiastic adopters of it, despite the author's own misgivings.

Wowfunhappy•5mo ago

I did not interpret the headline as sarcastic.

simonw•5mo ago

The actual headline is:

  AI is going great for the blind.

That . (not present in the Hacker News posting) made me think it was sarcastic, combined with the author's clear dislike of generative AI.

ljlolel•5mo ago

I posted it with the period

lxgr•5mo ago

It also pattern matches to "Web3 is going just great", a popular crypto-skeptic blog – not sure if that's intentional.

There seems to be a sizable crowd of cryptocurrency hype critics that have pivoted to criticizing the AI hype (claiming that the hype itself is also largely caused by the same actors, and that accordingly neither crypto nor AI have much object-level merit to them) – ironically and sadly in a quite group-think-heavy way, considering how many valid points of criticism there are to be made of both.

PhantomHour•5mo ago

There's a big difference in precisely how the technology is applied.

Transformer models making screen readers better is cool. Companies deciding to fire their human voice actors and replacing all audiobooks with slop is decidedly not cool.

You can really see this happening in translation right now. Companies left and right are firing human translators and replacing their work with slop, and it's a huge step down in quality because AI simply cannot do the previous level of quality. (Mr Chad Gippity isn't going to maintain puns or add notes for references that the new audience won't catch.)

And that's in a market where there is commercial pressure to have quality work. Sloppy AI translations are already hurting sales.

In accessibility, it's a legal checkbox. Companies broadly do not care. It's already nearly impossible to get people to do things like use proper aria metadata. "We're a startup, we gotta go fast, ain't got no time for that".

AI is already being used to provide a legally-sufficient but practically garbage level of accessibility. This is bad.

cpfohl•5mo ago

Boy my experience with small chunks of translation between languages I know well is not the same at all. When prompted properly the translation quality is unbelievable and can absolutely catch nuances, puns, and add footnotes.

That said, I use it with pretty direct prompting, and I strongly prefer the "AI Partners with a Human" model.

PhantomHour•5mo ago

I'm sorry but I simply do not believe you that it does handle things like puns and things requiring footnotes, my experience is that LLMs are miserable at this even when directly "instructed" to.

But for what it concerns my previous comment: It doesn't really matter what the "state of the art" AI is because companies simply do not use that. They just pipe it through the easiest & cheapest models, human review (that does not actually get the time to do meaningful review) optional.

conradev•5mo ago

Firing voice actors is not great. Replacing human-narrated audio with AI narrated audio is not great.

But the coverage of audiobooks is… also not great? Of the books I've purchased recently, maybe 30% or less have audiobooks? What if I want to listen to an obscure book? Should I be paying a human narrator to narrate my personal library?

The copyright holders are incentivized to make money. It does not make financial sense to pay humans to narrate their entire catalog. As long as they're the only ones allowed to distribute derivative works, we're kind of in a pickle.

PhantomHour•5mo ago

> What if I want to listen to an obscure book? Should I be paying a human narrator to narrate my personal library?

You weren't doing that before AI either, were you?

The practical answer has already been "you pipe an ebook through a narrator/speech synthesizer program".

> The copyright holders are incentivized to make money.

Regulations exist. It'd be rather trivial to pass a law mandating every ebook sold to be useable with screen readers. There's already laws for websites, albeit a bit poorly enforced.

gostsamo•5mo ago

Thats''s one very angry take on things. As a blind person myself, AI is a net benefit, it has potential, but I also agree that there are lots of people who think that if AI solutions are good enough, there is no need to invest in actually accessible gui-s. The last one is an extremely wrong take, because ai-s always will be the slowest solution which might be badly prompted or just hallucinate. Just today someone was complaining that Gemini's code editor is not fully accessible and was looking for advice there, so I'd give the author points for mentioning that the very ai interface might be unaccessible. Not to mention that often chat web interfaces lack proper aria descriptions for some basic operations.

simne•5mo ago

Could you approximate number of blind persons using AI?

If their number is significant, they could themselves be foundation for some AI business, even if all other consumers will turn away from AI.

For about accessibility of web, must say it is terrible even for non-blind person, but AI could also change this to better.

What I mean, you may hear about PalmPilot organizers, they was very limited in hardware, but existed private company, which provided proxy-browser, which input ordinary web sites and shown on PalmPilot small display optimized version, plus have mode for offline reading. With existing AI now could do much better.

999900000999•5mo ago

Even before the LLMs, simple voice assistants have been great for those with limited sight.

I recall speaking to a girl who thanked these voice assistants for helping her order food and cook.

Right now I'm using AI while traveling, it gets stuff 85% right which is enough for lunch.

ccgreg•5mo ago

The IETF AI-Preference standard group is currently discussing whether or not to include an example of bypassing AI preferences to support assistive technologies. Oddly enough, many publishers oppose that.

1gn15•5mo ago

What does bypassing AI preferences mean? Just ignoring them?

ianbicking•5mo ago

Probably ignoring things like robots.txt, I'm guessing? But I'd be curious what exactly the list of things is, and if it's growing. Would it go as far as ChatGPT filling in CAPTCHAs?

autocomplete="off" is an instance of something that user agents willfully ignore based on their own heuristics, and I'm assuming accessibility tools have always ignored a lot of similar things.

ccgreg•5mo ago

A lot of publishers do not care about blind people, and would prefer that they be unable to use AI to read.

rgoulter•5mo ago

> With an LLM, it will never get annoyed, aggravated, think less of the person, or similar.

Between people, it's extremely commonly considered impolite to request excess help from other people. -- So, having an info retrieval / interactive chat which will patiently answer questions is a boon for everyone.

I guess you can try and frame all 'helping' as "you're +1 if you're being helpful", but don't be surprised if not everyone sees things that way all the time.

adhamsalama•5mo ago

As long as you don't get rate-limited!

y-curious•5mo ago

I guarantee you that rate limits are a thing when you ask non-impaired people for help constantly, too. I'd be taking my chances with AI rate limits for things like "describe this character in detail" and "repeat the minute-long dialog you just delivered"

krisoft•5mo ago

> The blind and visually impaired people advocating for this have been conditioned to believe that technology will solve all accessibility problems because, simply put, humans won’t do it.

Technology is not just sprouting out of the ground out of its own. It is humans who are making it. Therefore if technology is helpful it was humans who helped.

> Let’s not mention the fact the particular large language model, LLM called Chat GPT they chose, was never the right kind of machine learning for the task of describing images.

Weird. I would think LLMs are exactly the right kind of tool to describe images. Sadly there is no more detail about what they think would be a better approach.

> I fully predict that blind people will be advocating to make actual LLM platforms accessible

Absolutely. The LLM platforms indeed very much should be accessible. I don't think anyone would have beef with that.

> I also predict web accessibility will actually get worse, not better, as coding models will spit out inaccessible code that developers won’t check or won’t even care to check.

Who knows. Either that, or some pages will become more accessible because the effort of making it accessible will be less on the part of the devs. It probably will be a mixed bag with a little bit of column A and column B.

> Now that AI is a thing now, I doubt OCR and even self-driving cars will get any significant advancements.

These are all AI. They are all improving leaps and bounds.

> An LLM will always be there, well, until the servers go down

Of course. That is a concern. This is why models you can run yourself are so important. Local models are good for latency and reliability. But even if the model is run on a remote server as long as you control the server you can decide when it becomes shut down.

NoahZuniga•5mo ago

Gemini 2.5 has the best vision understanding of any model I've worked with. Leagues beyond gpt5/o4

IanCal•5mo ago

It's hard to overstate this. They perform segmentation and masking and provide information from that to the model and it helps enormously.

Image understanding is still drastically lower than text performance, making glaring mistakes that are hard to understand but gemini 2.5 models are far and away the best in what I've tried.

devinprater•5mo ago

There's a whole tool based on having Gemini 2.5 describe Youtube videos, OmniDescriber.

https://audioses.com/en/yazilimlar.php

pineaux•5mo ago

Yeah i made a small app to sell my fathers books. I scanned all the books by making pictures of the book shelves + books (collection of 15k books almost all non-fiction). Then fed them to different AI's. Combining mistralOCR and Gemini worked very very good. I ran all the past both AIs and compared the output per book. Then saved all the output into an SQL for later reference. I did some other stuff with it, then made a document out of the output and sent it to a large group of book buyers. I asked them to bid on individual books and the whole collection.

johnfn•5mo ago

Interesting -- what sort of things do you use it for?

devinprater•5mo ago

Having Youtube videos described to me, basically. Since Google won't do it.

lxgr•5mo ago

> > Let’s not mention the fact the particular large language model, LLM called Chat GPT they chose, was never the right kind of machine learning for the task of describing images.

> Weird. I would think LLMs are exactly the right kind of tool to describe images.

TFA is from 2023, when multimodal LLMs were just picking up. I do agree that that prediction (flat capability increase) has aged poorly.

> I doubt OCR and even self-driving cars will get any significant advancements.

This particular prediction has also aged quite poorly. Mistral OCR, an OCR-focused LLM, is working phenomenally well in my experience compared to "non-LLM OCRs".

stinkbeetle•5mo ago

> > I fully predict that blind people will be advocating to make actual LLM platforms accessible

> Absolutely. The LLM platforms indeed very much should be accessible. I don't think anyone would have beef with that.

AIs I have used have fairly basic interfaces - input some text or an image and get back some text or an image - is that not something that accessibility tools can already do? Or do they mean something else by "actual LLM platform"? This isn't a rhetorical question, I don't know much about interfaces for the blind.

simonw•5mo ago

I've been having trouble figuring out how best to implement a streaming text display interface in a way that's certain to work well with screenreaders.

devinprater•5mo ago

If it's command-line based, maybe stream based on lines, or even better, sentences rather than received tokens.

miki123211•5mo ago

This really depends on the language.

In some languages, pronunciation(a+b) == pronunciation(a) + pronunciation(b). Polish mostly belongs to this category, for example. For these, it's enough to go token-by-token.

For English, it is not that simple, as e.g. the "uni" in "university" sounds completely different to the "uni" in "uninteresting."

In English, even going word-by-word isn't enough, as words like "read" or "live" have multiple pronunciations, and speech synthesizers rely on the surrounding context to choose which one to use. This means you probably need to go by sentence.

Then you have the problem of what to do with code, tables, headings etc. While screen readers can announce roles as you navigate text, they cannot do so when announcing the contents of the live region, so if that's something you want, you'de need to build a micro screen-reader of sorts.

devinprater•5mo ago

Oh no, cause screen readers are dumb things. If you don't send them an announcement, through live regions or accessibility announcements on Android or iOS, they will not know that a response has been received. So, the user will just sit there and have to tap and tap to see when a response comes in. This is especially frustrating with streaming responses where you're not sure when streaming has completed. Gemini for Android is awful at this when typing to it while using TalkBack. No announcements. Claude on web and Android also do nothing, and on iOS it at least places focus, accidentally I suspect, at the beginning of the response. chatGPT on iOS and web are great; it tells me when a response is being generated and reads it out when it's done. On iOS, it sends each line to VoiceOver as it's being generated. AI companies, and companies in general, need to understand that not all blind people talk to their devices.

simonw•5mo ago

Sounds like I should reverse engineer the ChatGPT web app and see what they're doing.

agos•5mo ago

dang, I was hoping that with the impossibly simple interface chatGPT has and the basically unlimited budget they have, they would have done a bit better for accessibility. shameful

giancarlostoro•5mo ago

> Weird. I would think LLMs are exactly the right kind of tool to describe images. Sadly there is no more detail about what they think would be a better approach.

Not sure but the Grok avatars or characters, whatever, I've experimented with them, though I hate the defaults that xAI made, because they seem to not be generic simple AI robot or w/e after you tell them to stop flirting and calling you babe (seriously what the heck lol) they can really hold a conversation. I talked to it about a musician I liked, very niche genre of music, and they were able to provide an insanely accurately relatable song from a different artist I did not know, all in real time.

I think it was last year or the year before? They did a demo where they had two phones, one could see, one could not, and the two ChatGPT instances were talking to each other, one was describing the room to the other. I think we are probably there by now to where you can describe a room.

jibal•5mo ago

> > Let’s not mention the fact the ==> particular <== large language model, LLM called ==> Chat GPT <== they chose, was never the right kind of machine learning for the task of describing images.

> Weird. I would think LLMs are exactly the right kind of tool to describe images.

stinkbeetle•5mo ago

> While the stuff LLMs is giving us is incorrect information, it’s still information that the sighted world won’t or refuses to give us.

I don't understand what's going on here. He's angry at us horrible sighteds for refusing to give them incorrect information? Or because we refuse to tell them when their LLMs give them incorrect information? Or he thinks that we're refusing to give them correct information which makes it okay that the LLM is giving them incorrect information?

ants_everywhere•5mo ago

It's nearly impossible to think clearly when you're angry.

fwip•5mo ago

I believe they're saying that sometimes-wrong information from a machine is preferable to no information (without machine). At least to some people.

jibal•5mo ago

Obviously.

fwip•5mo ago

I was trying to be gentle. :P

jibal•5mo ago

> He's angry at us horrible sighteds

It's not about you so no need to be personally offended.

Have a bit of empathy and do a bit of research and it's not hard to understand that accessibility is limited.

stinkbeetle•5mo ago

I'm not offended at all, just trying to understand what was written. What exactly the gripe is and what he wants.

My empathy is not the problem here. Having a disability doesn't give you a free pass to be a bitter asshole.

stinkbeetle•5mo ago

The response to this comment, pathetic as it is, makes my point.

jibal•5mo ago

The response to this comment, grossly dishonest as it is, makes my point.

josefresco•5mo ago

I've been volunteering for Be My Eyes* for a few years now. It's been very rewarding. I get maybe a half dozen calls each year.

I help people with very mundane and human tasks: cooking, gardening, label identification.

*https://www.bemyeyes.com/download-app/

miki123211•5mo ago

Would you be willing to share what kind of calls you usually get? No personal or sensitive details of course.

Did the volume of calls change meaningfully with the introduction of AI into Be my Eyes?

josefresco•5mo ago

I touched on it in my original comment. My last call was a woman who was cooking and thought maybe she had dropped some meat on the stove burner. Another call was someone stocking a refrigerator with soda, and they needed help sorting the brands. Another was a woman who wanted to know if her plant was still alive. My first call was helping a woman wrap a gift, she needed to know which side of the gift book was "up".

Robdel12•5mo ago

My mom is 100% blind, everything I build I make sure my mom can use it. I remediated visa checkout to save a tier 1 bank contract and got them to WCAG2.0 compatibility. With my “credentials” out do the way…

Ugh, this is what sort of bothers me about the accessibility community. Something about it is always coming off preachy and like a moral argument. This is the worst way to get folks to actually care. You’re just making them feel bad.

Look the fact is everyone needs to use technology to live these days. And us devs suck ass at making those things accessible. Even in the slightest. It won’t be until we all age into needing it when it finally becomes a real issue that’s tackled. Until then, tools like LLMs are going to be amazingly helpful. Posts like this are missing the forest in the trees.

My mom has been using ChatGPT for a ton of things that’s helpful. It’s a massive net positive. The LLM alt tags Facebook added a long time ago, massively helpful. Perfect? Hell no. But we gotta stop acting like these improvements aren’t helpful and aren’t progress. It comes across as whiny. I say this as someone who is in this community.

devinprater•5mo ago

I use AI, as a blind person. I posted a week or so ago a video in which I use TalkBack's image description feature to play a video game that has no accessibility at all. Of course, that was on Android which isn't the most blind-friendly of OS', and iOS doesn't have LLM image descriptions built in, nor good PS2 emulators.

Other blind people are all in with the AI hype, describing themselves as partially sighted because of AI with their Meta Rayban glasses. Sidenote, Rayban glasses report that I died last year. I somehow missed my funeral, sorry to all those who were there without me. I do like brains, though...

Meanwhile many LLM sites are not blind-friendly, like Claude, Perplexity, and there are sites that try but fail so exasperatingly hard that I lose any motivation for filing reports where I can't even begin to explain what's breaking so hard. It's evident that OpenWebUI have not tested their accessibility with a screen reader. Anyway blindness organizations (NFB mainly) have standardized on just use ChatGPT and everything else is the wild west where they absolutely do not care. Gemini could be more accessible, on web and especially Android, but all reports have been ignored so I'm not going to bother with them anymore. It's sad since their AI describes images well. Thank goodness for the API, and tools like [PiccyBot](https://www.piccybot.com/) on iOS/Android and [viewpoint](https://viewpoint.nibblenerds.com/) and [OmniDescriber](https://audioses.com/en/yazilimlar.php) on Windows. I'm still waiting for iOS to catch up to Android in LLM image descriptions built into the screen reader. Meanwhile, at least we have [this shortcut](https://shortcutjar.com/describe/documentation/). It uses GPT 4O but at least it's something. Apple could easily integrate with their own Apple Intelligence to call out to ChatGPT or whatever, but I guess competition has to happen. Or something. Maybe next year lol. In the meantime I'll either use my own cents to get descriptions or share to Seeing AI or something like a cave man.

bsoles•5mo ago

When my employer jumped on the bandwagon and built it's own internal wrapper around ChatGPT, I've tested it with a screen reader using keyboard navigation. And it was terrible. As long as humans don't really care about the disabled (which they really don't), I doubt the AI will solve the problems of visually impaired people.

France's homegrown open source online office suite

SpaceX Delays Mars Plans to Focus on Moon

Jeremy Wade's Mighty Rivers

Show HN: MCP App to play backgammon with your LLM

AI Command and Staff–Operational Evidence and Insights from Wargaming

Show HN: CCBot – Control Claude Code from Telegram via tmux

Ask HN: Is the CoCo 3 the best 8 bit computer ever made?

Show HN: Convert your articles into videos in one click

Red Queen's Race

The Anthropic Hive Mind

A Horrible Conclusion

I spent $10k to automate my research at OpenAI with Codex

From Zero to Hero: A Spring Boot Deep Dive

Show HN: Solving NP-Complete Structures via Information Noise Subtraction (P=NP)

Cook New Emojis

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

Long-Sought Proof Tames Some of Math's Unruliest Equations

Hacking the last Z80 computer – FOSDEM 2026 [video]

Browser-use for Node.js v0.2.0: TS AI browser automation parity with PY v0.5.11

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Software Engineering Is Back

Storyship: Turn Screen Recordings into Professional Demos

Reputation Scores for GitHub Accounts

A BSOD for All Seasons – Send Bad News via a Kernel Panic

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

Omarchy First Impressions

Reinforcement Learning from Human Feedback

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

Big Tech vs. OpenClaw

France's homegrown open source online office suite

SpaceX Delays Mars Plans to Focus on Moon

Jeremy Wade's Mighty Rivers

Show HN: MCP App to play backgammon with your LLM

AI Command and Staff–Operational Evidence and Insights from Wargaming

Show HN: CCBot – Control Claude Code from Telegram via tmux

Ask HN: Is the CoCo 3 the best 8 bit computer ever made?

Show HN: Convert your articles into videos in one click

Red Queen's Race

The Anthropic Hive Mind

A Horrible Conclusion

I spent $10k to automate my research at OpenAI with Codex

From Zero to Hero: A Spring Boot Deep Dive

Show HN: Solving NP-Complete Structures via Information Noise Subtraction (P=NP)

Cook New Emojis

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

Long-Sought Proof Tames Some of Math's Unruliest Equations

Hacking the last Z80 computer – FOSDEM 2026 [video]

Browser-use for Node.js v0.2.0: TS AI browser automation parity with PY v0.5.11

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Software Engineering Is Back

Storyship: Turn Screen Recordings into Professional Demos

Reputation Scores for GitHub Accounts

A BSOD for All Seasons – Send Bad News via a Kernel Panic

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

Omarchy First Impressions

Reinforcement Learning from Human Feedback

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

Big Tech vs. OpenClaw

AI is going great for the blind (2023)

Comments