Show HN: Autofit2 – End-to-end pipeline for multilingual text classification

autofit2

Few-shot text classification. Massively multilingual (50+ languages), fully automated pipeline built on setfit and SBERT embeddings.

Key Features

Few-Shot Learning: High precision (95–99%) with a few dozen labeled examples.

High precision (95–99%) with a few dozen labeled examples. Multilingual Support: Pretrained models for 20 languages; evaluation corpora for 50+. Scalable to 100+ via Common Crawl.

Pretrained models for 20 languages; evaluation corpora for 50+. Scalable to 100+ via Common Crawl. Automated Pipeline: End-to-end preprocessing, fine-tuning, evaluation, and deployment from a single JSON config.

End-to-end preprocessing, fine-tuning, evaluation, and deployment from a single JSON config. Reproducibility & Transparency: JSON-based configuration, model card generation, and CO₂ emission tracking.

Usage

1. Prepare Data Use dataload or implement a custom loader providing labeled examples.

2. Configure Create myproject.json specifying dataset paths, model settings, and output directories. Supports multi-language/task blocks.

... continue reading