Tool Update10d ago

Show HN: Gemma 4 Multimodal Fine-Tuner for Apple Silicon

What it is

Gemma Tuner is a training pipeline that lets you fine-tune vision-language models on Apple Silicon without copying massive datasets to your machine. Picture it as a bridge: your Mac pulls training batches from cloud storage on-demand while the model trains locally. Originally built for audio (Whisper), now handles Gemma 4's multimodal capabilities.

Why it matters

If you're training models on a Mac with limited SSD space, this solves the storage bottleneck. Instead of needing terabytes locally, you stream data during training runs—useful for image-text pairs, audio transcription datasets, or any large training corpus. Turns an M2 Ultra into a practical fine-tuning rig without enterprise infrastructure.

Key details

•Built for M-series Macs (tested on M2 Ultra Mac Studio)
•Streams training data from Google Cloud Storage during runs—no local copy needed
•Supports Whisper audio models and Gemma 3n/4 vision-language models
•Open-source on GitHub (mattmireles/gemma-tuner-multimodal)
•Designed for 'limited compute budget' scenarios—hobbyist to small team scale

Sources

Show HN: Gemma 4 Multimodal Fine-Tuner for Apple Silicon(hn)