frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Gemma 4 Multimodal Fine-Tuner for Apple Silicon

https://github.com/mattmireles/gemma-tuner-multimodal
95•MediaSquirrel•3h ago
About six months ago, I started working on a project to fine-tune Whisper locally on my M2 Ultra Mac Studio with a limited compute budget. I got into it. The problem I had at the time was I had 15,000 hours of audio data in Google Cloud Storage, and there was no way I could fit all the audio onto my local machine, so I built a system to stream data from my GCS to my machine during training.

Gemma 3n came out, so I added that. Kinda went nuts, tbh.

Then I put it on the shelf.

When Gemma 4 came out a few days ago, I dusted it off, cleaned it up, broke out the Gemma part from the Whisper fine-tuning and added support for Gemma 4.

I'm presenting it for you here today to play with, fork and improve upon.

One thing I have learned so far: It's very easy to OOM when you fine-tune on longer sequences! My local Mac Studio has 64GB RAM, so I run out of memory constantly.

Anywho, given how much interest there is in Gemma 4, and frankly, the fact that you can't really do audio fine-tuning with MLX, that's really the reason this exists (in addition to my personal interest). I would have preferred to use MLX and not have had to make this, but here we are. Welcome to my little side quest.

And so I made this. I hope you have as much fun using it as I had fun making it.

-Matt

Comments

dsabanin•3h ago
Thanks for doing this. Looks interesting, I'm going to check it out soon.
MediaSquirrel•2h ago
you are welcome! It was a fun side quest
craze3•3h ago
Nice! I've been wanting to try local audio fine-tuning. Hopefully it works with music vocals too
LuxBennu•2h ago
I run whisper large-v3 on an m2 max 96gb and even with just inference the memory gets tight on longer audio, can only imagine what fine-tuning looks like. Does the 64gb vs 96gb make a meaningful difference for gemma 4 fine-tuning or does it just push the oom wall back a bit? Been wanting to try local fine-tuning on apple silicon but the tooling gap has kept me on inference only so far.
MediaSquirrel•2h ago
Memory usage increases quadratically with sequence length. Therefore, using shorter sequences during fine-tuning can prevent memory explosions. On my 64GB RAM machine, I'm limited to input sequences of about 2,000 tokens, considering my average output for the fine-tuning task is around 1,000 tokens (~3k tokens total).
LuxBennu•26m ago
Ah that makes sense, quadratic scaling is brutal. So with 96gb i'd probably get somewhere around 4-5k total sequence length before hitting the wall, which is still pretty limiting for anything multimodal. Do you do any gradient checkpointing or is that not worth the speed tradeoff at these sizes?
yousifa•2h ago
This is super cool, will definitely try it out! Nice work
pivoshenko•2h ago
nice!

Show HN: Gemma 4 Multimodal Fine-Tuner for Apple Silicon

https://github.com/mattmireles/gemma-tuner-multimodal
96•MediaSquirrel•3h ago•8 comments

Show HN: Brutalist Concrete Laptop Stand (2024)

https://sam-burns.com/posts/concrete-laptop-stand/
677•sam-bee•11h ago•211 comments

Show HN: An interactive map of Tolkien's Middle-earth

https://middle-earth-interactive-map.web.app/
31•frasermarlow•2h ago•4 comments

Show HN: Unicode Steganography

https://steganography.patrickvuscan.com
9•PatrickVuscan•10h ago•3 comments

Show HN: A cartographer's attempt to realistically map Tolkien's world

https://www.intofarlands.com/atlasofarda
149•intofarlands•10h ago•26 comments

Show HN: Finalrun – Spec-driven testing using English and vision for mobile apps

https://github.com/final-run/finalrun-agent
22•ashish004•8h ago•8 comments

Show HN: Pion/handoff – Move WebRTC out of browser and into Go

https://github.com/pion/handoff
89•Sean-Der•11h ago•15 comments

Show HN: Open-source GDPR router for LLMs detects PII, forces EU-only inference

https://github.com/mahadillahm4di-cyber/mh-gdpr-ai.eu
3•mahadillah-ai•1h ago•0 comments

Show HN: Mo – checks GitHub PRs against decisions approved in Slack

https://motionode.com/index
2•oscarcaldera•1h ago•0 comments

Show HN: Stop paying for Dropbox/Google Drive, use your own S3 bucket instead

https://locker.dev
223•Zm44•11h ago•184 comments

Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS

https://github.com/matthartman/ghost-pepper
448•MattHart88•1d ago•195 comments

Show HN: Anos – a hand-written ~100KiB microkernel for x86-64 and RISC-V

https://github.com/roscopeco/anos
105•noone_youknow•3d ago•31 comments

Show HN: A (marginally) useful x86-64 ELF executable in 298 bytes

https://github.com/meribold/btry
9•meribold•6h ago•0 comments

Show HN: AdaShape-3D modeler for intuitive 3D printing parts / Windows 11

https://adashape.com
29•fsloth•3d ago•27 comments

Show HN: Hippo, biologically inspired memory for AI agents

https://github.com/kitfunso/hippo-memory
119•kitfunso•1d ago•23 comments

Show HN: Tusk for macOS and Gnome

https://shapemachine.xyz/tusk/
113•factorialboy•3d ago•42 comments

Show HN: TTF-DOOM – A raycaster running inside TrueType font hinting

https://github.com/4RH1T3CT0R7/ttf-doom
63•4RH1T3CT0R•1d ago•12 comments

Show HN: GovAuctions lets you browse government auctions at once

https://www.govauctions.app/
310•player_piano•1d ago•88 comments

Show HN: I built a tiny LLM to demystify how language models work

https://github.com/arman-bd/guppylm
891•armanified•1d ago•134 comments

Show HN: Marimo pair – Reactive Python notebooks as environments for agents

https://github.com/marimo-team/marimo-pair
22•manzt•5h ago•1 comments

Show HN: A social feed with no algo where communities decide what gets seen

https://veridonia.com
5•smnkgv•10h ago•5 comments

Show HN: A reasoning hierarchical robotics pipeline you can run in the browser

https://avikde.github.io/vla-pipeline/
4•avikde•5h ago•0 comments

Show HN: Real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B

https://github.com/fikrikarim/parlor
284•karimf•2d ago•35 comments

Show HN: The King James Bible deserved a better website

https://officialkingjamesbible.com/
5•L23234•8h ago•3 comments

Show HN: Gemma Gem – AI model embedded in a browser – no API keys, no cloud

https://github.com/kessler/gemma-gem
153•ikessler•1d ago•21 comments

Show HN: Clawcast – A peer-to-peer podcast network for agents

https://www.clawcast.dev/
7•PiersonMarks•6h ago•5 comments

Show HN: C64 Ultimate Toolbox for macOS

https://github.com/amiantos/c64-ultimate-toolbox
2•amiantos•6h ago•0 comments

Show HN: Interactive object storage cost calculator

https://storage.mixpeek.com
2•Beefin•6h ago•0 comments

Show HN: I made a YouTube search form with advanced filters

https://playlists.at/youtube/search/
316•nevernothing•1d ago•201 comments

Show HN: A game where you build a GPU

https://jaso1024.com/mvidia/
953•Jaso1024•3d ago•186 comments