frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
500•klaussilveira•8h ago•139 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
841•xnx•13h ago•503 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
54•matheusalmeida•1d ago•10 comments

A century of hair samples proves leaded gas ban worked

https://arstechnica.com/science/2026/02/a-century-of-hair-samples-proves-leaded-gas-ban-worked/
112•jnord•4d ago•18 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
164•dmpetrov•9h ago•76 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
166•isitcontent•8h ago•18 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
280•vecti•10h ago•127 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
60•quibono•4d ago•10 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
340•aktau•15h ago•164 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
225•eljojo•11h ago•139 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
332•ostacke•14h ago•89 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
421•todsacerdoti•16h ago•221 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
34•kmm•4d ago•2 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
11•denuoweb•1d ago•0 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
360•lstoll•14h ago•251 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
76•SerCe•4h ago•60 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
15•gmays•3h ago•2 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
59•phreda4•8h ago•9 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
9•romes•4d ago•1 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
210•i5heu•11h ago•157 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
33•gfortaine•6h ago•8 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
123•vmatsiiako•13h ago•51 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
159•limoce•3d ago•80 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
257•surprisetalk•3d ago•33 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1017•cdrnsf•18h ago•422 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
51•rescrv•16h ago•17 comments

I'm going to cure my girlfriend's brain tumor

https://andrewjrod.substack.com/p/im-going-to-cure-my-girlfriends-brain
93•ray__•5h ago•46 comments

Evaluating and mitigating the growing risk of LLM-discovered 0-days

https://red.anthropic.com/2026/zero-days/
44•lebovic•1d ago•12 comments

WebView performance significantly slower than PWA

https://issues.chromium.org/issues/40817676
10•denysonique•5h ago•0 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
81•antves•1d ago•59 comments
Open in hackernews

Image Diffusion Models Exhibit Emergent Temporal Propagation in Videos

https://arxiv.org/abs/2511.19936
124•50kIters•2mo ago

Comments

onesandofgrain•2mo ago
Can someone smarter than me explain what this is about?
Kalabint•2mo ago
> Can someone smarter than me explain what this is about?

I think you can find the answer under point 3:

> In this work, our primary goal is to show that pretrained text-to-image diffusion models can be repurposed as object trackers without task-specific finetuning.

Meaning that you can track Objects in Videos without using specialised ML Models for Video Object Tracking.

echelon•2mo ago
All of these emergent properties of image and video models leads me to believe that evolution of animal intelligence around motility and visually understanding the physical environment might be "easy" relative to other "hard steps".

The more complex that an eye gets, the more the brain evolves not just the physics and chemistry of optics, but also rich feature sets about predator/prey labels, tracking, movement, self-localization, distance, etc.

These might not be separate things. These things might just come "for free".

fxtentacle•2mo ago
I wouldn't call these properties "emergent".

If you train a system to memorize A-B pairs and then you normally use it to find B when given A, then it's not surprising that finding A when given B also works, because you trained it in an almost symmetrical fashion on A-B pairs, which are, obviously, also B-A pairs.

jacquesm•2mo ago
There is a massive amount of pre-processing already done in the retina itself and in the LGN:

https://en.wikipedia.org/wiki/Lateral_geniculate_nucleus

So the brain does not necessarily receive 'raw' images to process to begin with, there is already a lot of high level data extracted at that point such as optical flow to detect moving objects.

Mkengin•2mo ago
Interesting. So similar to the vision encoder + projector in VLMs?
DrierCycle•2mo ago
And the occipital is developed around extraordinary levels of image separation, broken down into tiny areas of the input, scattered and woven for details of motion, gradient, contrast, etc.
magicalhippo•2mo ago
Glossing through the paper, here's my take.

Someone previously found that that the cross-attention layers in text-to-image diffusion models captures correlation between the input text tokens and corresponding image regions, so that one can use this to segment the image, pixels containing "cat" for example. However this segmentation was rather coarse. The authors of this paper found that also using the self-attention layers leads to a much more detailed segmentation.

They then extend this to video by using the self-attention between two consecutive frames to determine how the segmentation changes from one frame to the next.

Now, text-to-image diffusion models require a text input to generate the image to begin with. From what I can gather they limit themselves to semi-supervised video segmentation, so that the first frame is already segmented by say a human or some other process.

They then run a "inversion" procedure which tries to generate text that causes the text-to-image diffusion model to segment the first frame as closely as possible to the provided segmentation.

With the text in hand, they can then run the earlier segmentation propagation steps to track the segmented object throughout the video.

The key here is that the text-to-image diffusion model is pretrained, and not fine-tuned for this task.

That said, I'm no expert.

jacquesm•2mo ago
For a 'not an expert' explanation you did a better job than the original paper.
nicolailolansen•2mo ago
Bravo!
ttul•2mo ago
This is a cool result. Deep learning image models are trained on enormous amounts of data and the information recorded in their weights continues to astonish me. Over in the Stable Diffusion space, hobbyists (as opposed to professional researchers) are continuing to find new ways to squeeze intelligence out of models that were trained in 2022 and are considerably out of date compared with the latest “flow matching” models like Qwen Image and Flux.

Makes you wonder what intelligence is lurking in a 10T parameter model like Gemini 3 that we may not discover for some years yet…

smerrill25•2mo ago
Hey, do you know how you figured out about this information? I would be super curious to keep track of current ad-hoc ways of pushing older models to do cooler things. LMK
ttul•2mo ago
1) Reading papers. 2) Reading "Deep Learning: Foundations and Concepts". 3) Taking Jeremy Howard's Fast.ai course
cheald•2mo ago
Stable Diffusion 1.5 is a great model for hacking on. It's powerful enough that it encodes some really rich semantics, but small and light enough that iterative hacking on it is quick enough that it can be done by hobbyists.

I've got a new potential LoRA implementation that I've been testing locally (using a transformed S matrix with frozen U and V weights from an SVD decomposition of the base matrix) that seems to work really well, and I've been playing with both changes to the forward-noising schedule and the loss functions which seem to yield empirically superior results of the standard way of doing things. Epsilon prediction may be old and busted (and working on it makes me really appreciate flow matching!) but there's some really cool stuff happening in its training dynamics that are a lot of fun to explore.

It's just a lot of fun. Great playground for both learning how these things work and for trying out new ideas.

throwaway314155•2mo ago
I’d love to follow your work. Got a GitHub?
cheald•2mo ago
I do (same username), but I haven't published any of this (and in fact my Github has sadly languished lately); I keep working on it with the intent to publish eventually. The big problem with models like this is that the training dynamics have so many degrees of freedom that every time I get close to something I want to publish I end up chasing down another set of rabbit holes.

https://gist.github.com/cheald/7d9a436b3f23f27b8d543d805b77f... - here's a quick dump of my SVDLora module though. I wrote it for use in OneTrainer though it should be adaptable to other frameworks easily enough. If you want to try it out, I'd love to hear what you find.

ttul•2mo ago
This is super cool work. I’ve built some new sampling techniques for flow matching models that encourage the model to take a “second look” by rewinding sampling to a midpoint and then running the clock forward again. This worked really well with diffusion models (pre-DiT models like SDXL) and I was curious whether it would work with flow matching models like Qwen Image. Yes, it does, but the design is different because flow matching models aren’t de-noising pixels so much as they are simply following a vector field at each step like a ship being pushed by the wind.
cheald•2mo ago
Neat! Is that published anywhere?

It seems conceptually related to ddpm/ancestral sampling, no? Except they're just adding noise to the intermediate latent to simulate a "trajectory jump". How does your method compare?

ethmarks•2mo ago
Gemini 3 is a 10 trillion parameter model?
ttul•2mo ago
I read that the pre-training model behind Gemini 3 has 10T parameters. That does not mean that the model they’re serving each day has 10T parameters. The online model is likely distilled from 10T down to something smaller, but I have not had either fact confirmed by Google. These are anecdotes.
tpoacher•2mo ago
If the authors are reading. I notice you used a "Soft IoU" for validation.

A large part of my 2017 phd thesis [0] is dedicated in exploring the formulation and utility of soft validation operators, including this soft IoU, and the extent to which they are "better" / "more reliable" than thresholding (whether this occurs in isolation, or even when marginalised out, as in with the AUC). Long story short, soft operators are at least an order of magnitude more reliable than their thresholding counterparts [1], despite the fact that thresholding still seems to be the industry/academia standard. This is the case for any set-operation-based operator, such as the Dice coefficient (a.k.a. F1-score), not just for the IoU. Recently, influential groups have proposed the matthews correlation coefficient as a "better operator", but still treat it in binary / thresholding terms, which means it's still unreliable to an order of magnitude. I suspect this insight goes beyond images (e.g. the F1-score is often used in ML problems more generally, in situations where probabilistic outputs are thresholded to compare against binary ground truth labels), but I haven't tested that hypothesis explicitly beyond the image domain (yet).

In this work you effectively used the "goedel" (i.e. min/max) fuzzy operator to define fuzzy intersection and union, for the purposes of using it in an IoU operator. There are other fuzzy norms with interesting properties that you can also explore. Other classical ones include product and lukasiewicz. I show in [0] and [1] that these have "best case scenario sub-pixel overlap", "average case" and "worst-case scenario" underlying semantics. (In other words, min/max should not be a random choice of T-norm, but a conscious choice which should match your problem, and what the operator is intended to validate specifically). In my own work, I then proceeded show that if you take gradient direction at the boundary into account, you can come up with a fuzzy intersection/union pair which has directional semantics, and is even more reliable an operator when used to define a soft IoU.

Having said that, in your case you're comparing against a binary ground truth. This collapses all the different T-norms to the same value. I wonder if this is the reason you chose a binary ground truth. If yes, you might want to consider my work, and use original 'soft' ground truths instead, for higher reliability, as well as ability to define intersection semantics.

I hope the above is of interest / use to you :) (and, if you were to decide to cite my work, it wouldn't be the eeeeeend of the world, I gueeeeesss xD )

[0] https://ora.ox.ac.uk/objects/uuid:dc352697-c804-4257-8aec-08...

[1] https://repository.essex.ac.uk/24856/1/Papastylianou.etal201...

N_Lens•2mo ago
According to the paper the image models can 'recognize' and track objects in videos. There are a lot of emergent properties in both diffusion models and LLMs that don't align with simplistic descriptions such as 'next token predictor'. It's not surprising to me that 'diffusing' mass amounts of image data leads to semantic developments and the emergence of recognition.