frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
230•theblazehen•2d ago•66 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
694•klaussilveira•15h ago•206 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
962•xnx•20h ago•553 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
5•AlexeyBrin•58m ago•0 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
130•matheusalmeida•2d ago•35 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
66•videotopia•4d ago•6 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
53•jesperordrup•5h ago•24 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
36•kaonwarb•3d ago•27 comments

ga68, the GNU Algol 68 Compiler – FOSDEM 2026 [video]

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
10•matt_d•3d ago•2 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
236•isitcontent•15h ago•26 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
233•dmpetrov•16h ago•124 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
32•speckx•3d ago•21 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
335•vecti•17h ago•147 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
502•todsacerdoti•23h ago•244 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
385•ostacke•21h ago•97 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
300•eljojo•18h ago•186 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
361•aktau•22h ago•185 comments

UK infants ill after drinking contaminated baby formula of Nestle and Danone

https://www.bbc.com/news/articles/c931rxnwn3lo
8•__natty__•3h ago•0 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
422•lstoll•21h ago•282 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
68•kmm•5d ago•10 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
96•quibono•4d ago•22 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
21•bikenaga•3d ago•11 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
19•1vuio0pswjnm7•1h ago•5 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
264•i5heu•18h ago•215 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
33•romes•4d ago•3 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
63•gfortaine•13h ago•28 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1076•cdrnsf•1d ago•460 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
39•gmays•10h ago•13 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
298•surprisetalk•3d ago•44 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
154•vmatsiiako•20h ago•72 comments
Open in hackernews

FastVLM: Efficient vision encoding for vision language models

https://github.com/apple/ml-fastvlm
367•nhod•9mo ago

Comments

BryanLegend•9mo ago
Seems like the main thing holding these new minds back is being able to see well. Breakthroughs like this will fix that.
efnx•9mo ago
That and the ability to hold on to knowledge.
static_void•9mo ago
... or say they don't know.
kamranjon•9mo ago
Apple out here playing 5d chess, installing neural cores in their hardware and writing crazy efficient vision models to run on em. Cool stuff.
wmf•9mo ago
I thought they turned sycophancy off...
kamranjon•9mo ago
Awe yes I admit, I think the new Apple hardware is real cool
vFunct•9mo ago
Can it fill a wine glass to the rim?
mkl•9mo ago
It's for interpreting images, not generating them.
turnsout•9mo ago
Apple has gotten a slow start in the LLM world, but they have the only long term strategy that makes sense. They’re going to dominate the 2030s.
boroboro4•9mo ago
What exactly the strategy is?
generalizations•9mo ago
They can run locally on-device: a win for cost, latency and privacy (privacy is pragmatic: it means you can use all the user's data as context without qualms). There's a reason Microsoft tried so hard to push for the neural processors a year or two ago. Avoiding the cost of the datacenter while offering good-enough inference (emphasis on good) is a massive win.
turnsout•9mo ago
Yes, thank you; this is the strategy I was referring to. It will take some time for the models and chips to get there, but on-device inference will have massive advantages for privacy, speed and cost. Plus it will drive demand for hardware—at first, iPhones, but soon AirPods and glasses.
xnx•9mo ago
Google already has some of the best on device models (Gemma) and chips (Tensor).
AceJohnny2•9mo ago
> and chips (Tensor)

Is there actually any hard data out there comparing the NPU on the Google Tensor G4 vs the Apple A18? I wasn't able to quickly find anything concrete.

I mean Apple has been shipping mobile NPUs for longer than Google (Apple: since A11 in 2017, Google: since 2021), and are built on (ostensibly) a smaller silicon node that Google's (G4: Samsung SF4P vs A18: TSMC N3E). However, the G4 appears to have more RAM bandwidth (68.26 GB/s vs 60 GB/s on A18).

lern_too_spel•9mo ago
Google has been shipping custom NPUs since the Pixel 4 in 2019. Prior to that, Google phones just used off the shelf SOCs from Qualcomm, with 2018's Pixel 3 using the NPU in the Snapdragon 845. Android first shipped NNAPI in Android 8.1 in 2017, with acceleration on various mobile GPUs and DSPs, including the Pixel Visual Core on the Pixel 2. Google has shipped more on-device models so far, but neither company has a moat for on-device inference.

https://blog.google/products/pixel/pixel-visual-core-image-p...

turnsout•8mo ago
Unfortunately for them, Google doesn't make devices that people want to buy
lern_too_spel•8mo ago
Other Android phone vendors do, and they have the same strategy, sitting on top of Qualcomm NPUs.
weikju•9mo ago
They are running data centers and offloading some things to chatGPT though, not just running on device.

In fact there’s no clear indication when Apple Intelligence is running on-device or in their Private Cloud Compute.

jfarina•9mo ago
What strategy is that?
ryanmcgarvey•9mo ago
I presume they mean that distribution is king and they make all the devices.
karn97•8mo ago
Average delusional hner who really doesn't know what they are talking about
turnsout•8mo ago
Enlighten us, wise one
insane_dreamer•9mo ago
As the father of a young child whose optic nerves are highly deteriorated (compression) and is expected to lose his sight (when exactly is unknown; based on original projections he should be blind by now, but an experimental treatment run in a trial at the NIH (KEEP FUNDING SCIENCE) has stabilized his sight), I'm overjoyed with the advances being made in VLMs. I can now envision a future where even if he loses his sight he'll be able to interact with the world around him, go to college, have a fulfilling career (he loves science and engineering, and is talented for his young age), etc.
lynx97•9mo ago
I grew up in the 80s as a 100% blind child. Technology was by far not as advanced as today. Computers were just coming up when I was around 12. I learnt to type on a oldschool typewriter, and I also learnt to write braille with a pretty heavy full-metal embossing device. OCR was still quite bad. When I switched to what you call high scooll, I used a laptop with integrated Braille display to follow classes. Used good old DOS as OS and Word 5.5 as my "notepad". Except for PC Lingua for Latin, I basically had no tools specialized for learning. A electronic notepad and my brain was all I had to follow school. And I still made it. I have a great job I love, my own appartment, a sweet girlfriend and I am basically completely independent. To a point where I had to forcefully send away my mother since her continued attempts to "help" me were basically detrimental to my own development. I can not emphasis how important it is how you deal with it as a parent. Since parents are indeed the biggest hinderence to development, we have a saying around here amongst disabled people: "additional disability due to parental overprotection" (Zusatzbehinderung Eltern). Please take a moment to understand what this means, without feeling personally attacked. Its important. Your child can leave home around 18, just like every other kid. I did. Don't slow that process down artificially. The more this is prolonged, the harder it gets for the individual to actually obtain independence.

I am telling you this because I read between the lines that you believe current technology is a reason for you to be hopeful. Sure, it should be. But never forget, your child can do much more then you as a sighted person will ever be able to understand. Don't let them drown in your own misery. Let them discover what they can do. You will be surprised what they come up with. And dont fall for Gear Acquision Syndrome. Sure, tools are nice, and they do get better, which is also nice. I LOVE vision models, to stay on topic somehow. However, I still leave my house with only a cane and my phone in my pocket. I do occasionally ask Siri "Where am I" to get an address if I happen to have forgotten where I am exactly, currently. But at the end of the day, my cane is what shows me the way. Most tech is hype, plain old hearing and your sense of touch gets you much farther then you might think.

Wish you all the best for your own journey, and the development of your child.

wiz21c•9mo ago
I should read a comment like yours every morning.
topato•9mo ago
Wow, this really adds an amazing perspective to the entire (frequently touted) concept of Visual Language Models somehow "saving" blind people from their old life; In the past, a blind person desperately needed caretakers, otherwise the blind person will bumble around their home, end up mistaking the sink for the toilet, accidentally turn on their stove thinking it's the thermostat, until they died after mistaking bleach for milk and cat litter for cereal....

BUT NOW... THE FUTURE IS HERE.... an all-knowing god-like cell phone can tell these poor miserable individuals what the objects in their own homes are! No more tragic Mr. Magoo-ian accidents!

But thank you for posting this; It certainly enlightened me! I'll admit, all these AI solutions

exe34•9mo ago
> I'll admit, all these AI solutions

They got to him.

lynx97•8mo ago
:-)
insane_dreamer•9mo ago
Thanks. I appreciate your insight.
liamwire•9mo ago
It feels like this is the required level of speed-up needed re. time-to-first-token to make continuous vision useful for on-device applications like an assistant that can see and take action on your screen, ala the original Apple Intelligence demos. It’s very impressive seeing the app in the repo and I’m excited to build it tonight and play around.
nine_k•9mo ago
With that, a really helpful aid for blind people can be made, running just on their phone, fed from a camera in their eyeglasses. Somebody who could not move around without an assistant could become autonomous in daily life.
jdiff•8mo ago
It might be useful for telling Cream of Chicken from Cream of Mushroom, but for locomotion I can't see this adding anything over existing strategies people use to get around sans sight.

"There's a tree. There's a tree. There's a tree. There's a number of pedestrians. There's a tree. There's a sign." does not strike me as useful feedback for getting around.

nine_k•8mo ago
Consider a city. It's full of signs and inscriptions, traffic lights, and other key interaction elements. Consider a store. It has shelves with stuff, again with inscriptions, price tags, etc.

"Pavement. Row of stores to the left. Joe's Grocery Store. Doors. Door handle. A shelf with bakery. A shelf with canned goods. A shelf with bottles. Coke bottle. Large Pepsi bottle. Apple juice bottle. Passageway. Checkout. Payment terminal. Door. Door handle. Pavement. ..."

jdiff•8mo ago
None of that gives me any useful spatial sense of where. "Payment terminal." Okay. Where is it? Left? Left where? How much left? How far?

The only truly useful bits I see in your stream of text is, again, "Cream of Mushroom" vs "Cream of Chicken." I am actively holding something, so I know where it is, but need to differentiate it from printed detail.

adamsiem•9mo ago
Anyone using vision to parse screenshots? QVQ was too slow. Will give this a shot.
abrichr•9mo ago
You might be interested in https://github.com/OpenAdaptAI/OpenAdapt
logankeenan•9mo ago
I used molmo to parse screenshots in order to detect locations of UI elements. See the repo below. I think Omni parser from Microsoft would also work well.

https://github.com/logankeenan/george

https://github.com/microsoft/OmniParser

Aeroi•9mo ago
I built/building a realtime voice+vision app called Sen, its currently live in beta and streams frames over webrtc. It's fast and smart, but Im super curious to see how these models do as we get closer to the metal. I can see these running on-device in the future with super fast ttfb.
keyle•9mo ago
Do you have a write up of the tech stack and setup? Or willing to give the gist here?

I'd like to make a private Qwen or similar for my kids to prompt with a button and voice control. It doesn't need vision... Although eventually that'd be very cool.

Siri just sucks.

We might not be there yet...

Aeroi•9mo ago
yeah i made a post on here, but the algo sent it to the gulag abyss.

https://news.ycombinator.com/item?id=43926673

keyle•9mo ago
That's a good product site but it doesn't help me in anyway...
Aeroi•9mo ago
I also ran across an interesting robot toy demo today that had voice built in. it was whimsical and seemed like it was aimed towards primary education and kids. Someone here might know the name.
stavros•9mo ago
You can use Ollama or LM Studio, both in API mode, to return the responses. I believe they offer audio support, but I'm not entirely sure.

However, if you're looking for instruction following (like an agent), I've tried to implement my own agent and have lost faith. Even GPT-4.1 will regularly gaslight me that no, it definitely ran the tool call to add the event to my calendar, when it just didn't. I can't get any more adherence out of it.

cloudking•9mo ago
Check out https://livekit.io/
keyle•8mo ago
All this lead to, is paying or using APIs and more paying. That's not what I was asking for.
tomp•9mo ago
We're definitely there, there's just no "ready-made" apps yet. But the technology is possible, go to e.g. vapi.ai to test it.
nikolayasdf123•9mo ago
2GB for 0.5B smallest model. it does not make sense for each app to download this. apple must have plans to pre-load these models on os level and expose SDK for all apps to call these models locally. exciting times!

opened issue for them to confirm this: https://github.com/apple/ml-fastvlm/issues/7

cube2222•9mo ago
That’s what they suggested about LLMs at last year’s WWDC iirc. The core models are provided by the OS, while apps bring LORAs to fine-tune them / bring custom heads for them.
babl-yc•9mo ago
You could probably get away with f16 or even quantize to int8 and have a much smaller model, but your point stands. Users won't be thrilled to download a 500MB model for an app either.
nikolayasdf123•8mo ago
haha latest Uber build for iOS 18 is 500MB... without LLM models <face-palm/>
ukuina•8mo ago
What are they doing in there? Is it mostly visual assets?
bastawhiz•8mo ago
If I was going to guess, I'd get there's a ton of third party code for things like payment method SDKs. Every local payment method around the world is going to have its own package that you need to import, and you can't just load in new executable code on the fly after the app is installed.
victorbjorklund•8mo ago
You can actually do over the air updates of apps (how easy it is depends on what you wrote your app in) and not adding a new feature (like just adding an additional payment provider) would not require an update on the App Store.
bastawhiz•8mo ago
You can't download only part of your app and lazy load functionality that your user probably won't use.
victorbjorklund•8mo ago
You could but it probably would either require a server solution like Liveview Native (you only get the code when you use it) or a pretty bad UX where you have to wait to download something first
nikolayasdf123•8mo ago
wouldn't you want to create payment gateway and abstract away logic such that client is agnostic of payment processes and backend confirms internal payment process into external specific ones? (in worst case redirect to other apps with universal links or webview)
bastawhiz•8mo ago
You're not doing it in a browser window. You're integrating against device APIs like NFC, you've got custom UI (which you probably want to be native), you've got stuff like camera access to read QR codes and OCR a credit card. Want to pay by topping up a wallet at 7-Eleven? Now you need a map to show where the nearest one is.
philipkglass•8mo ago
A lot of geography-specific scenarios are compiled into the app, including regional payment SDKs. There's a great comment from a former Uber engineer explaining it here:

https://news.ycombinator.com/item?id=25376346

nikolayasdf123•8mo ago
I think they using vector graphics and vector animations (say Rive). Rive takes order of 10s of KBs. Lottie is much larger to 100s of KBs. Even then you would need 5000 animations to reach 500MB, unlikely!

raster graphics and videos are likely not included in build

probably some unused code (libraries) got into it, it can grow quite large

or maybe some ML models?

HanClinto•8mo ago
I think that there is fantastic potential in having open-weight, OS-standard foundation models.

Especially if the API gives opportunity for app developers to load their custom LoRa fine-tunings onto OS-standard foundation models at runtime, then you can (ideally) have the best of both worlds -- fine-tuned app-specific models with reasonable app sizes.

HappMacDonald•8mo ago
I haven't seen much done with loras for LLMs though, only for diffusion image gen models. From what I've heard it sounds like a difference in benefit due to architecture.
gessha•8mo ago
My guess is that they won’t confirm it unless it’s a big presentation. WWDC maybe?
nikolayasdf123•9mo ago
google and cloud LLM providers must be biting their teeth now! haha
nikolayasdf123•9mo ago
distributing this heavy compute and moving it close to device where 1. source of data happens; 2. decision and output about the result of analysis is done; is way to go. super low latency, no network traffic, privacy, less overhead in cloud. this is amazing
porphyra•9mo ago
It seems that the future of robotics is VLA models. Even Tesla FSD is an end-to-end VLA model. Efficient vision encoding will be a huge part of making robots safe and responsive.
lynx97•9mo ago
I wonder, can I convert/run this with llama.cpp? It being LLaVA based seems promising.
vessenes•9mo ago
Um wow. The on-device realtime videos are worth a watch, and compelling. Looking forward to this being deployed and widely adopted. Getting much faster time to first token opens up a ton of features and usability benefits.
buyucu•9mo ago
where is my gguf?
simianparrot•9mo ago
I have a feeling feeding tesseract the image every 1 second would be significantly faster and take far less space and processing power? Haven't tested it yet but given how fast tesseract is on large images, it wouldn't surprise me.
regularfry•9mo ago
If all you want is OCR, possibly.
coredog64•8mo ago
If all you want is OCR of typewritten text.

Tesseract is awful for handwriting.

d3k•9mo ago
Very nice! I wish they were more keen to contribute to AI/ML community an publish weights and model definition on HuggingFace. Funny enough I have just seen today a similar demo that is using a freely available VLM: https://github.com/ngxson/smolvlm-realtime-webcam
tough•9mo ago
SmolVLM is from huggingface team

cool to see people doing stuff with smaller models

https://huggingface.co/blog/smolvlm

https://arxiv.org/abs/2504.05299

labadal•8mo ago
I'm absolutely thrilled that there is an effort to make models smaller and run with less resources instead of blindly throwing more resources at the problem and expecting it to get solved.