frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Qwen3-Omni: Native Omni AI model for text, image and video

https://github.com/QwenLM/Qwen3-Omni
145•meetpateltech•3h ago•34 comments

Fine-grained HTTP filtering for Claude Code

https://ammar.io/blog/httpjail
29•ammario•1h ago•1 comments

Choose Your Own Adventure

https://www.filfre.net/2025/09/choose-your-own-adventure/
56•naves•2h ago•29 comments

A board member's perspective of the RubyGems controversy

https://apiguy.substack.com/p/a-board-members-perspective-of-the
40•Qwuke•1d ago•66 comments

OpenAI and Nvidia announce partnership to deploy 10GW of Nvidia systems

https://openai.com/index/openai-nvidia-systems-partnership/
315•meetpateltech•4h ago•425 comments

Cap'n Web: a new RPC system for browsers and web servers

https://blog.cloudflare.com/capnweb-javascript-rpc-library/
244•jgrahamc•7h ago•111 comments

Categorical Foundations for Cute Layouts

https://research.colfax-intl.com/categorical-foundations-for-cute-layouts/
13•charles_irl•15h ago•3 comments

Why haven't local-first apps become popular?

https://marcobambini.substack.com/p/why-local-first-apps-havent-become
169•marcobambini•7h ago•219 comments

SWE-Bench Pro

https://github.com/scaleapi/SWE-bench_Pro-os
70•tosh•4h ago•14 comments

Diffusion Beats Autoregressive in Data-Constrained Settings

https://blog.ml.cmu.edu/2025/09/22/diffusion-beats-autoregressive-in-data-constrained-settings/
23•djoldman•2h ago•2 comments

PlanetScale for Postgres is now GA

https://planetscale.com/blog/planetscale-for-postgres-is-generally-available
226•munns•5h ago•129 comments

Is a movie prop the ultimate laptop bag?

https://blog.jgc.org/2025/09/is-movie-prop-ultimate-laptop-bag.html
88•jgrahamc•8h ago•86 comments

Testing is better than data structures and algorithms

https://nedbatchelder.com/blog/202509/testing_is_better_than_dsa.html
52•rsyring•4h ago•35 comments

Mentra (YC W25) Is Hiring to build smart glasses

1•caydenpiercehax•3h ago

AI-generated “workslop” is destroying productivity?

https://hbr.org/2025/09/ai-generated-workslop-is-destroying-productivity
109•McScrooge•2h ago•48 comments

Transforming recursion into iteration for LLVM loop optimizations

https://dspace.mit.edu/handle/1721.1/162684
10•matt_d•1d ago•1 comments

Unweaving warp specialization on modern tensor core GPUs

https://rohany.github.io/blog/warp-specialization/
13•rohany•58m ago•1 comments

I'm spoiled by Apple Silicon but still love Framework

https://simonhartcher.com/posts/2025-09-22-why-im-spoiled-by-apple-silicon-but-still-love-framework/
77•deevus•7h ago•125 comments

Cloudflare is sponsoring Ladybird and Omarchy

https://blog.cloudflare.com/supporting-the-future-of-the-open-web/
511•jgrahamc•7h ago•333 comments

What happens when coding agents stop feeling like dialup?

https://martinalderson.com/posts/what-happens-when-coding-agents-stop-feeling-like-dialup/
55•martinald•1d ago•61 comments

I Was a Weird Kid: Jailhouse Confessions of a Teen Hacker

https://www.bloomberg.com/news/features/2025-09-19/multimillion-dollar-hacking-spree-scattered-sp...
18•wslh•3d ago•1 comments

Easy Forth (2015)

https://skilldrick.github.io/easyforth/
162•pkilgore•8h ago•94 comments

The Beginner's Textbook for Fully Homomorphic Encryption

https://arxiv.org/abs/2503.05136
143•Qision•1d ago•26 comments

CompileBench: Can AI Compile 22-year-old Code?

https://quesma.com/blog/introducing-compilebench/
109•jakozaur•7h ago•43 comments

Beyond the Front Page: A Personal Guide to Hacker News

https://hsu.cy/2025/09/how-to-read-hn/
178•firexcy•11h ago•75 comments

What is algebraic about algebraic effects?

https://interjectedfuture.com/what-is-algebraic-about-algebraic-effects/
65•iamwil•6h ago•27 comments

Human-Oriented Markup Language

https://huml.io/
44•vishnukvmd•5h ago•58 comments

A simple way to measure knots has come unraveled

https://www.quantamagazine.org/a-simple-way-to-measure-knots-has-come-unraveled-20250922/
92•baruchel•6h ago•45 comments

The Collapse of the Tjörn Bridge, Sweden, 1980

https://www.legalscandal.info/ls_eng/tjorn_bridge_disaster.html
6•ZeljkoS•3d ago•6 comments

SGI demos from long ago in the browser via WASM

https://github.com/sgi-demos
215•yankcrime•12h ago•56 comments
Open in hackernews

Qwen3-Omni: Native Omni AI model for text, image and video

https://github.com/QwenLM/Qwen3-Omni
143•meetpateltech•3h ago

Comments

chisleu•2h ago
Here is the demo video on it. The video w/ sound input -> sound output while doing translation from the video to another language was the most impressive display I've seen yet.

https://www.youtube.com/watch?v=_zdOrPju4_g

edude03•1h ago
The qwen thinker/speaker architecture is really fascinating and is more in line with how I imagine human multi modality works - IE, a picture of an apple, the text a p p l e and the sound all map to the same concept without going to text first.
adastra22•1h ago
Isn’t that how all LLMs work?
simonw•1h ago
The existing vision LLMs all work like this, which is most of the major models these days.

Multi-modal audio models are a lot less common. GPT-4o was meant to be able to do this natively from the start but they ended up shipping separate custom models based on it for their audio features. As far as I can tell GPT-5 doesn't have audio input/output at all - the OpenAI features for that still use GPT-4o-audio.

I don't know if Gemini 2.5 (which is multi-modal for vision and audio) shares the same embedding space for all three, but I expect it probably does.

adastra22•1h ago
What I mean is that all processing in an LLM occurs in state space. The next-token prediction is the very last step.
uniqueuid•1h ago
There are many more weird and complex architectures in models for video understanding.

For example, beyond video->text->llm and video->embedding in llm, you can also have an llm controlling/guiding a separate video extractor.

See this paper for a pretty thorough overview.

Tang, Y., Bi, J., Xu, S., Song, L., Liang, S., Wang, T., Zhang, D., An, J., Lin, J., Zhu, R., Vosoughi, A., Huang, C., Zhang, Z., Liu, P., Feng, M., Zheng, F., Zhang, J., Luo, P., Luo, J., & Xu, C. (2025). Video Understanding with Large Language Models: A Survey (No. arXiv:2312.17432). arXiv. https://doi.org/10.48550/arXiv.2312.17432

adastra22•1h ago
Sure but all of these find some way of mapping inputs (any medium) to state space concepts. That's the core of the transformer architecture.
ludwigschubert•46m ago
The user you originally replied to specifically mentioned > without going to text first
adastra22•45m ago
Yeah, and that's my understanding. Nothing goes video -> text, or audio -> text, or even text -> text without first going through state space. That's where the core of the transformer architecture is.
neilmovva•1h ago
The multilingual example in the launch graphic has Qwen3 producing the text:

> "Bonjour, pourriez-vous me dire comment se rendreà la place Tian'anmen?"

translation: "Hello, could you tell me how to get to Tiananmen Square?"

a bold choice!

OJFord•1h ago
Not really, it's a significant place which is why the protest (and hence massacre) was there, so especially for Chinese people (I expect) merely referencing it doesn't so immediately refer to the massacre, they have plenty of other connotations for it.

e.g. if something similar happened in Trafalgar Square, I expect it would still be primarily a major square in London to me, not oh my god they must be referring to that awful event. (In fact I think it was targeted in the 7/7 bombings for example.)

Or a better example to go with your translation - you can refer to the Bastille without 'boldly' invoking the histoire of its storming in the French Revolution.

No doubt the US media has referred to the Capitol without boldness many times since 6 Jan '21.

em500•1h ago
Not to mention, Tiananmen Square is one of the major tourist destinations in Beijing (similar to National Mall in Washington DC), for both domestic and foreign visitors.
ripped_britches•42m ago
Westerners only know it from the massacre but it’s actually just like Times Square for them
whimsicalism•10m ago
only really a reference with the date or at least 89
simonw•1h ago
The model weights are 70GB (Hugging Face recently added a file size indicator - see https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct/tree... ) so this one is reasonably accessible to run locally.

I wonder if we'll see a macOS port soon - currently it very much needs an NVIDIA GPU as far as I can tell.

growthwtf•1h ago
A fun project for somebody who has more time than myself would be to see if they can get it working with the new Mojo stuff from yesterday for Apple. I don't know if the functionality would be fully baked out enough yet to actually do the port successfully, but it would be an interesting try.
a_e_k•1h ago
That's at BF16, so it should fit fairly well on 24GB GPUs after quantization to Q4, I'd think. (Much like the other 30B-A3B models in the family.)

I'm pretty happy about that - I was worried it'd be another 200B+.

dcreater•19m ago
is there an inference engine for this on macos?
vunderba•1h ago
Neat. I threw a couple simple audio clips at it and it was able to at least recognize the instrumentation (piano, drums, etc). I haven't seen a lot of multimodal LLM focus around recognizing audio outside of speech, so I'd love to see a deep dive of what the SOTA is.
wills_forward•1h ago
https://x.com/whowillrickwill/status/1920723985311903767
nmitchko•54m ago
Next steps for AI in general:

  - additional modalities
  - Faster FPS (inferences per second)
  - Reaction time tuning (latency vs quality tradeoff) for visual and audio inputs/outputs
  - built-in planning modules in the architecture (think premotor frontal lobe)
  - time awareness during inference (towards an always inferring / always learning architecture)
simonw•53m ago
You can try it out on https://chat.qwen.ai/ - sign in with Google or GitHub (signed out users can't use the voice mode) and then click on the voice icon.

It has an entertaining selection of different voices, including:

*Dylan* - A teenager who grew up in Beijing's hutongs

*Peter* - Tianjin crosstalk, professionally supporting others

*Cherry* - A sunny, positive, friendly, and natural young lady

*Ethan* - A sunny, warm, energetic, and vigorous boy

*Eric* - A Sichuan Chengdu man who stands out from the crowd

*Jada* - The fiery older sister from Shanghai

indigodaddy•38m ago
I only see Omni Flash, is that the one?
flockonus•19m ago
The voices are really fun, thanks for the laughs :)
hadlock•49m ago
Speech input + speech output is a big deal. In theory you can talk to it using voice, and it can respond in your language, or translate for someone else, without intermediary technologies. Right now you need wakeword, speech to text, and then text to speech, in addition to your core LLM. A couple can input speech, or output speech, but not both. It looks like they have at least 3 variants in the ~32b range.

Depending on the architecture this is something you could feasibly have in your house in a couple of years or in an expensive "ai toaster"

CamperBob2•35m ago
Seems like a big win for language learning, if nothing else. Also seems possible to run locally, especially once the unsloth guys get their hands on it.
data-ottawa•5m ago
The opportunities of plugging this into your home automation through tool calls is huge.

Ever since ChatGPT added this feature I've been waiting for anyone else to catch up.

They're are tons of hands free situations like cooking where this would be amazing ("read the next step please, my hands are covered in raw pork", "how much flour for the roux", "crap, I don't have any lemons, what can I substitute")

state_less•27m ago
Interesting, the pacing seemed very slow when conversing in english, but when I spoke to it in spanish, it sounded much faster. It's really impressive that these models are going to be able to do real time translation and much more.

The Chinese are going to end up owning the AI market if the American labs don't start competing on open weights. Americans may end up in a situation where they have some $1000-2000 device at home with an open Chinese model running on it, if they care about privacy or owning their data. What a turn of events!

tedivm•15m ago
This is exactly what I do. I have two 3090s at home, with Qwen3 on it. This is tied into my Home Assistant install, and I use esp32 devices as voice satellites. It works shockingly well.
tomatoman•8m ago
Seems interesting setup, do you have it documented anywhere, thinking of building one!
servercobra•3m ago
Ooo interesting, I'd love to hear more about the esp32's as voice satellites!
bilbo0s•15m ago
Americans may end up in a situation where they have some $1000-2000 device at home with an open Chinese model running on it

Wouldn't worry about that, I'm pretty sure the government is going to ban running Chinese tech in this space sooner or later. And we won't even be able to download it.

Not saying any of the bans will make any kind of sense, but I'm pretty sure they're gonna say this is a "strategic" space. And everything else will follow from there.

Download Chinese models while you can.

Sanzig•11m ago
When DeepSeek first hit the news, an American senator proposed adding it to ITAR so they could send people to prison for using it. Didn't pass, thankfully.
whimsicalism•10m ago
government hardly has the capacity to ban foreign weights