autofit2
Few-shot text classification. Massively multilingual (50+ languages), fully automated pipeline built on setfit and SBERT embeddings.
Key Features
Few-Shot Learning: High precision (95–99%) with a few dozen labeled examples.
High precision (95–99%) with a few dozen labeled examples. Multilingual Support: Pretrained models for 20 languages; evaluation corpora for 50+. Scalable to 100+ via Common Crawl.
Pretrained models for 20 languages; evaluation corpora for 50+. Scalable to 100+ via Common Crawl. Automated Pipeline: End-to-end preprocessing, fine-tuning, evaluation, and deployment from a single JSON config.
End-to-end preprocessing, fine-tuning, evaluation, and deployment from a single JSON config. Reproducibility & Transparency: JSON-based configuration, model card generation, and CO₂ emission tracking.
Usage
1. Prepare Data Use dataload or implement a custom loader providing labeled examples.
2. Configure Create myproject.json specifying dataset paths, model settings, and output directories. Supports multi-language/task blocks.
... continue reading