Tool Update10d ago
Show HN: Gemma 4 Multimodal Fine-Tuner for Apple Silicon
What it is
Gemma Tuner is a training pipeline that lets you fine-tune vision-language models on Apple Silicon without copying massive datasets to your machine. Picture it as a bridge: your Mac pulls training batches from cloud storage on-demand while the model trains locally. Originally built for audio (Whisper), now handles Gemma 4's multimodal capabilities.
Why it matters
If you're training models on a Mac with limited SSD space, this solves the storage bottleneck. Instead of needing terabytes locally, you stream data during training runs—useful for image-text pairs, audio transcription datasets, or any large training corpus. Turns an M2 Ultra into a practical fine-tuning rig without enterprise infrastructure.
Key details
- •Built for M-series Macs (tested on M2 Ultra Mac Studio)
- •Streams training data from Google Cloud Storage during runs—no local copy needed
- •Supports Whisper audio models and Gemma 3n/4 vision-language models
- •Open-source on GitHub (mattmireles/gemma-tuner-multimodal)
- •Designed for 'limited compute budget' scenarios—hobbyist to small team scale