frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Text-to-video model from scratch (2 brothers, 2 years, 2B params)

https://huggingface.co/collections/Linum-AI/linum-v2-2b-text-to-video
12•schopra909•4h ago
Writeup (includes good/bad sample generations): https://www.linum.ai/field-notes/launch-linum-v2

We're Sahil and Manu, two brothers who spent the last 2 years training text-to-video models from scratch. Today we're releasing them under Apache 2.0.

These are 2B param models capable of generating 2-5 seconds of footage at either 360p or 720p. In terms of model size, the closest comparison is Alibaba's Wan 2.1 1.3B. From our testing, we get significantly better motion capture and aesthetics.

We're not claiming to have reached the frontier. For us, this is a stepping stone towards SOTA - proof we can train these models end-to-end ourselves.

Why train a model from scratch?

We shipped our first model in January 2024 (pre-Sora) as a 180p, 1-second GIF bot, bootstrapped off Stable Diffusion XL. Image VAEs don't understand temporal coherence, and without the original training data, you can't smoothly transition between image and video distributions. At some point you're better off starting over.

For v2, we use T5 for text encoding, Wan 2.1 VAE for compression, and a DiT-variant backbone trained with flow matching. We built our own temporal VAE but Wan's was smaller with equivalent performance, so we used it to save on embedding costs. (We'll open-source our VAE shortly.)

The bulk of development time went into building curation pipelines that actually work (e.g., hand-labeling aesthetic properties and fine-tuning VLMs to filter at scale).

What works: Cartoon/animated styles, food and nature scenes, simple character motion. What doesn't: Complex physics, fast motion (e.g., gymnastics, dancing), consistent text.

Why build this when Veo/Sora exist? Products are extensions of the underlying model's capabilities. If users want a feature the model doesn't support (character consistency, camera controls, editing, style mapping, etc.), you're stuck. To build the product we want, we need to update the model itself. That means owning the development process. It's a bet that will take time (and a lot of GPU compute) to pay off, but we think it's the right one.

What’s next? - Post-training for physics/deformations - Distillation for speed - Audio capabilities - Model scaling

We kept a “lab notebook” of all our experiments in Notion. Happy to answer questions about building a model from 0 → 1. Comments and feedback welcome!

Comments

streamer45•3h ago
Rad! huggingface link gives 404 on my side though.
schopra909•3h ago
Oh damn! Thanks for catching that -- going to ping the HF folks to see what they can do to fix the collection link.

In the meantime here's the individual links to the models:

https://huggingface.co/Linum-AI/linum-v2-720p https://huggingface.co/Linum-AI/linum-v2-360p

schopra909•3h ago
Should be fixed now! Thanks again for the heads up
streamer45•3h ago
All good, cheers!
schopra909•3h ago
Per the RAM comment, you may able to get it run locally with two tweaks:

https://github.com/Linum-AI/linum-v2/blob/298b1bb9186b5b9ff6...

1) Free up the t5 as soon as the text is encoded, so you reclaim GPU RAM

2) Manual Layer Offloading; move layers off GPU once they're done being used to free up space for the remaining layers + activations

streamer45•3h ago
Looks like 20GB VRAM isn't enough for the 360p demo :( need to bump my specs :sweat_smile:

Show HN: isometric.nyc – giant isometric pixel art map of NYC

https://cannoneyed.com/isometric-nyc/
297•cannoneyed•3h ago•95 comments

Show HN: Text-to-video model from scratch (2 brothers, 2 years, 2B params)

https://huggingface.co/collections/Linum-AI/linum-v2-2b-text-to-video
12•schopra909•4h ago•6 comments

Show HN: BrowserOS – "Claude Cowork" in the browser

https://github.com/browseros-ai/BrowserOS
25•felarof•4h ago•12 comments

Show HN: CLI for working with Apple Core ML models

https://github.com/schappim/coreml-cli
3•schappim•33m ago•0 comments

Show HN: Sweep, Open-weights 1.5B model for next-edit autocomplete

https://huggingface.co/sweepai/sweep-next-edit-1.5B
500•williamzeng0•21h ago•111 comments

Show HN: I'm writing an alternative to Lutris

https://github.com/navid-m/styx
2•death_eternal•15m ago•0 comments

Show HN: Synesthesia, make noise music with a colorpicker

https://visualnoise.ca
9•tevans3•14h ago•3 comments

Show HN: ProblemHunt – A place to collect real problems before building startups

https://problemhunt.pro
3•gostroverhov•22m ago•2 comments

Show HN: Elden Ring–style "Git Pushed" screen when you Git push in VS Code

https://github.com/iiviie/CODE_PUSHED_darkSouls
2•iiviie•25m ago•0 comments

Show HN: Interactive physics simulations I built while teaching my daughter

https://www.projectlumen.app/
60•anticlickwise•3d ago•14 comments

Show HN: Figr – AI that thinks through product problems before designing

https://figr.design/
2•Mokshgarg003•33m ago•0 comments

Show HN: macOS CLI tool for managing Calendar events and Reminders via EventKit

https://github.com/schappim/ekctl
2•schappim•38m ago•0 comments

Show HN: Infrastructure for multi-agent AI memory

https://nexuswaitlist.framer.website/
2•sillygoose_189•59m ago•0 comments

Show HN: ChartGPU – WebGPU-powered charting library (1M points at 60fps)

https://github.com/ChartGPU/ChartGPU
651•huntergemmer•1d ago•203 comments

Show HN: AI Search Index – Track which AI bots crawl your website

https://www.aisearchindex.com
2•ihmissuti•1h ago•0 comments

Show HN: I've been using AI to analyze every supplement on the market

https://pillser.com/
9•lilouartz•6h ago•6 comments

Show HN: LaReview, local open-source CodeRabbit alternative

https://github.com/puemos/lareview
2•deofoo•1h ago•0 comments

Show HN: A Node Based Editor for Three.js Shading Language (TSL)

https://www.tsl-graph.xyz/
3•bhushanwtf•3h ago•1 comments

Show HN: An AI-powered web video editor built with Next.js and Fabric.js

https://pablituuu.space/video-editor
2•pablituuu•2h ago•0 comments

Show HN: Rails UI

https://railsui.com/
196•justalever•1d ago•107 comments

Show HN: Bible translated using LLMs from source Greek and Hebrew

https://biblexica.com
29•epsteingpt•4h ago•36 comments

Show HN: RatatuiRuby wraps Rust Ratatui as a RubyGem – TUIs with the joy of Ruby

https://www.ratatui-ruby.dev/
149•Kerrick•5d ago•31 comments

Show HN: I'm tired of my LLM bullshitting. So I fixed it

3•BobbyLLM•3h ago•6 comments

Show HN: yolo-cage – AI coding agents that can't exfiltrate secrets

https://github.com/borenstein/yolo-cage
59•borenstein•1d ago•72 comments

Show HN: Differentiable Quantum Chemistry

https://github.com/lowdanie/hartree-fock-solver
49•lowdanie•5d ago•14 comments

Show HN: Mastra 1.0, open-source JavaScript agent framework from the Gatsby devs

https://github.com/mastra-ai/mastra
212•calcsam•2d ago•69 comments

Show HN: High speed graphics rendering research with tinygrad/tinyJIT

https://github.com/quantbagel/gtinygrad
28•quantbagel•17h ago•9 comments

Show HN: Retain – A unified knowledge base for all your AI coding conversations

https://github.com/BayramAnnakov/retain
43•Bayram•1d ago•14 comments

Show HN: Open-source certificate from GitHub activity

https://certificate.brendonmatos.com
41•brendonmatos•4d ago•9 comments

Show HN: I built a JSON viewer that decodes Base64 media inline

https://viewjson.net
3•dassh•6h ago•0 comments