Flow Where You Want – Guidance for Flow Models

https://drscotthawley.github.io/blog/posts/FlowWhereYouWant.html

41•rundigen12•2mo ago

Comments

rundigen12•2mo ago

An intuitive introduction to adding inference controls to pretrained (latent) flow-based generative models

ElevenLathe•1mo ago

I'm far far far far far far from a mathematician or even an amateur ML/AI practitioner, but to a layman the magic of LLMs (and now, multimodal models) seems clearly to be in the data. IOW the robot knows stuff, including how to "use" language at all, because it has read lots of stuff that humans have written down over the past several thousand years. IOW, as has often been given as advice to would-be big-thinkers, reading and writing is thinking. One can simplify (admittedly, possibly to the point of meaninglessness) to say that the language is really doing the thinking, humans were a meat-based substrate for it, and now we have a new kind of substrate for it in the form of datacenters the size of Connecticut filled with video cards.

So...given that dumb guy (or, more charitably to myself, humanities guy who happens to work in tech) understanding of these phenomena, my ears perk up when they say they've trained a model on random numbers, but still get it to do something semi-useful. Is this as big a deal as it seems? Have we now worked out a way to make the gigawatts' worth of video cards "smart" without human language?

drscotthawley•1mo ago

Author here. Hey, My students and I could really use your help!

Can anyone help shed light on why the MPS backend for PyTorch produces different numbers compared to the CUDA & CPU devices? I don't mean unsupported ops & CPU fallback, I mean fast, garbage numbers coming out of MPS. This PR references numerous other PyTorch issues related to MPS inaccuracy: https://github.com/Stability-AI/stable-audio-tools/pull/225

The story:

This tutorial post was a lesson I wrote for my undergrad "Deep Learning & AI Ethics" class (https://github.com/drscotthawley/DLAIE).

The plan for the semester was to abandon the standard lesson+assignment format (since LLMs make coding assignments moot) in favor of a project-based learning approach: We would, as a class, build a text-conditioned latent flow matching generative model from scratch, because in so doing we'd cover essentially all the key topics of a "normal" course.

We pivoted to adding guidance to pretrained models for logistical reasons, specifically working with Stable Audio Open Small, but we hit a snag re. our MPS outputs and I wonder if any readers here can help.

(Students are overwhelmingly Mac users, my small college doesn't provide GPUs, CPU execution is too slow, Colab takes too long to setup and then kicks us off. Waiting on an NSF NAIRR Pilot education allocation for some remote GPU access.)

KV Cache Transform Coding for Compact Storage in LLM Inference

A quantitative, multimodal wearable bioelectronic device for stress assessment

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

How to shoot yourself in the foot – 2026 edition

Eight More Months of Agents

From Human Thought to Machine Coordination

The new X API pricing must be a joke

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

Python Only Has One Real Competitor

Tmux to Zellij (and Back)

Ask HN: How are you using specialized agents to accelerate your work?

Passing user_id through 6 services? OTel Baggage fixes this

DavMail Pop/IMAP/SMTP/Caldav/Carddav/LDAP Exchange Gateway

Visual data modelling in the browser (open source)

Show HN: Tharos – CLI to find and autofix security bugs using local LLMs

Oddly Simple GUI Programs

The New Playbook for Leaders [pdf]

Interactive Unboxing of J Dilla's Donuts

OneCourt helps blind and low-vision fans to track Super Bowl live

Rudolf Vrba

Autism Incidence in Girls and Boys May Be Nearly Equal, Study Suggests

Wellness Hotels Discovery Application

NASA delays moon rocket launch by a month after fuel leaks during test

Sebastian Galiani on the Marginal Revolution

Ask HN: Are we at the point where software can improve itself?

Binance Gives Trump Family's Crypto Firm a Leg Up

Reverse engineering Chinese 'shit-program' for absolute glory: R/ClaudeCode

Indian Culture

Show HN: Maravel-Framework 10.61 prevents circular dependency