Gemma Multimodal Fine-Tuner
Fine-tune Gemma on text, images, and audio — on your Mac, on data that doesn't fit on your Mac.
🖼️ Image + text LoRA — captioning and VQA on local CSV.
— captioning and VQA on local CSV. 🎙️ Audio + text LoRA — the only Apple-Silicon-native path that does this.
— the only Apple-Silicon-native path that does this. 📝 Text-only LoRA — instruction or completion on CSV.
— instruction or completion on CSV. ☁️ Stream from GCS / BigQuery — train on terabytes without filling your SSD.
— train on terabytes without filling your SSD. 🍎 Runs on Apple Silicon — MPS-native, no NVIDIA box required.
Source: github.com/mattmireles/gemma-tuner-multimodal (public).
LoRA for Gemma 4 & 3n — why not just use…?
This MLX-LM Unsloth axolotl Fine-tune Gemma (text-only CSV) ✅ ✅ ✅ ✅ Fine-tune Gemma image + text (caption / VQA CSV) ✅ ⚠️ varies ⚠️ varies ⚠️ varies Fine-tune Gemma audio + text ✅ ❌ ❌ ⚠️ CUDA only Runs on Apple Silicon (MPS) ✅ ✅ ❌ ❌ Stream training data from cloud ✅ ❌ ❌ ⚠️ partial No NVIDIA GPU required ✅ ✅ ❌ ❌
... continue reading