NeuralOS: An operating system powered by neural networks

80•yuntian•5h ago

Comments

yuntian•5h ago

A generative operating system that directly predicts screen images based on mouse and keyboard inputs, powered by an RNN for state modeling and a diffusion model for image generation.

See my tweet for more details: https://x.com/yuntiandeng/status/1944802154314916331

5-•4h ago

i like how most of your demo video is clicking through various firefox and google popups.

arm32•2h ago

Pretty realistic, actually.

munchler•5h ago

I tried to use this but the lag made it impossible to even click on an icon. On top of that, a message that other people were waiting popped up intermittently, pushing the emulation down the page, away from the mouse pointer. I'm not sure what sort of experience you're aiming for, but this probably isn't it.

1dom•4h ago

Tried to use this, also found lag made it basically impossible, and felt uncomfortable being reminded that other people might be waiting for me to get on with it.

However, I was able to click on a folder, it opened and looked fairly convincing. Only indicator that something was off - other than lag - was the at the bottom of the file browser, it mentioned how much diskspace was available: the first digit was clearly 6, the second was flickering and blurring between different numbers.

Pretty interesting idea though. What framerates should it run at? I felt I was getting <5fps.

Sharlin•3h ago

I managed to open the terminal, unsurprisingly trying to type something just resulted in hallucinations. And even though menus open and look plausible, clicking the items either didn't do anything or hallucinated some garbage.

mpascale00•4h ago

This brings personal nostalgia to when I was very young and made an "OS" in PowerPoint using links between slides, animations, and the embedded internet explorer object. Similarly, I'm not sure I see any practical use in this. Still it's a really fascinating conceptual demonstration of networks understanding intent in the complex state-machine that is a graphical user interface.

arm32•2h ago

I'm glad I wasn't the only one who did this, except for me, I used Microsoft Frontpage.

spogbiper•4h ago

although i wasn't able to really use it due to lag

yuntian•2h ago

Actually NeuralOS works very differently from Gemini OS. NeuralOS directly generates each screen at the pixel level entirely from neural networks, while Gemini OS generates code that's then rendered into a traditional UI. This difference is why NeuralOS is much slower and currently runs at a lower frame rate.

yuntian•3h ago

Thanks everyone for trying out NeuralOS, and apologies for the frustrating user experience!

I coded up the demo myself and didn't anticipate how disruptive the intermittent warning messages about waiting users would become. The demo is quite resource-intensive: each session currently requires its own H100 GPU, and I'm already using a dispatcher-worker setup with 8 parallel workers. Unfortunately, demand exceeded my setup, causing significant lag and I had to limit sessions to 60 more seconds when others are waiting. Additionally, the underlying diffusion model itself is slow to run, resulting in a frame rate typically below 2 fps, further compounded by network bottlenecks.

As for model capabilities, NeuralOS is indeed quite limited at this point (as acknowledged in my paper abstract). That's why the demo interactions shown in my tweet were minimal (opening Firefox, typing a URL).

Overall, this is meant as a proof-of-concept demonstrating the potential of generative, neural-network-powered GUIs. It's fully open-source, and I hope others can help improve it going forward!

Thanks again for the honest feedback.

yunyu•3h ago

Maybe put the warning below the UI, so it doesn't cause the layout to change?

yuntian•3h ago

Good idea. I'll update when no one is using it, don't want to cause further interruptions...

cupantae•2h ago

Nǐ hăo, xìe xìe Yuntian! I read the readme and paper but haven’t played around much yet. I find this fascinating and I don’t care much about poor “experience” because intuitively I feel this idea couldn’t produce something as reliable and flexible as a real OS anyway. I see you talked about inability to install new software and my reaction was “well obviously”, because surely it will be at least as limited as the training data, while a real OS provides lots of software of great complexity which is seldom used.

Could you talk about your hopes for the future on this project? What are your thoughts on having a more simplified interface which could combine inputs in a more abstract way, or are you only interested in simulating a traditional OS?

Thanks again.

PS the waiting time while firefox “loads” made me laugh. I presume this is also simulated.

yuntian•35m ago

Thanks for your comment! I completely agree that currently NeuralOS is far from being as reliable as a real OS. The Firefox loading time is indeed a funny artifact of the neural model simulating delay in real OS.

However, my real dream behind this project is to blur the boundaries across applications, not just simulate traditional OS interactions. For example, imagine converting a movie we're watching directly into an interactive video game, or instantly changing the interface of an app (like Signal) to something we prefer (like Facebook Messenger) on the fly.

Of course, the current training data severely limits what's achievable today. But looking forward, I envision combining techniques from controllable text generation (such as Zhiting Hu's "Toward Controlled Generation of Text" paper) or synthesizing new interaction data to achieve greater and customization. I believe this is a promising path toward creating truly generative and personalized interfaces.

Thanks again for your interest!

UncleOxidant•3h ago

Who thought this was a good idea?

odyssey7•3h ago

It seems inevitable.

273kgracia•3h ago

You can visit NeuralOS inside NeuralOS!

jjaksic•2h ago

This reminds me of a recent conversation we had at work where someone suggested that at some point all backend APIs are going to get replaced by a single LLM that'll just do anything (if you ask it nicely enough).

pmxi•2h ago

This is a cool proof-of-concept! It reminds me of https://oasis-model.github.io/. which friends and I had a lot of fun with

godelski•2h ago

I didn't get to do much. Had a hard time clicking on Firefox and then getting to the nav bar and type in "Hackernews". Boy was that wild watching it type. Those definitely weren't letters. Then it tried to translate the page for me into Finish and weirdly the "I'm not a robot" box would appear, disappear, and then I'd see the title of some paper. I never actually made it to the Google results...

It's an interesting project. I'll totally accept "for fun" or "because" but I'm interested in the why. Even if just a very narrow thing, is there any benefits we would get from using a ML based OS? I mean it is definitely cool and that has merit in its own right, but people talk about Neural OSs and I just don't "get it"

Scene_Cast2•1h ago

This is a really cool idea! I wonder if a more integrated adaptation would help render real OS UIs better / faster / prettier.

nither•1h ago

It's really cool concept. Just keep going, I wonder how it will look like in 1-2 years.

e1ghtSpace•1h ago

I've been waiting for this. Wish the resolution and framerate were higher though.

theGnuMe•7m ago

[delayed]

A new open-source social network

Simulating Time with Square-Root Space

California's beer tax compares to other states

Micronutrient deficiency linked to chronic pain in new study

Stefanie Stantcheva: To understand America today, study the zero-sum mindset

Museum of Failure

Double Pendulums are Chaoticn't [video]

Tesla -shaped Robotaxi expansion illustrates how unserious the business is

Doge Denizen Marko Elez Leaked API Key for XAI

AWS Lambda Silent Crash – A Platform Failure, Not an Application Bug [pdf]

Research shows path toward protocells on Saturn's moon Titan

For Algorithms, Memory Is a Far More Powerful Resource Than Time

Spicy – Generating Robust Parsers for Protocols and File Formats

Mass timber for hospitals: engineered wood resists microbes more than plastic

Pdoc vs. Pdoc3 Controversy

Serena and her kingdom invading humanity

The woman who changed our view of chimps – and human beings

Trump administration says it won't publish climate change report on NASA website

Digital Nomads Are Getting Caught in the War on Tourism

Toward Science Fiction Education

Tour de force: Vivaldi and Renault team up for the best on-road experience

About 1,500 tarantulas found hidden in cake boxes at German airport

The Matthew Effect

My Problem with Superman

'Tremendous uncertainty' for cancer research as US gov target mRNA vaccines

Old Tricks, New Tech: How Legacy Media Capture Fuels Digital Authoritarianism

AI 'Nudify' Websites Are Raking in Millions of Dollars

Using a USB Foot Pedal for Vibe Coding

Kruxel – Know When Ad Spend Pays Back

LLM Daydreaming