Tech News
← Back to articles

ArXiv says submissions must be in English: are AI translators up for the job?

read original related products more articles

Artificial-intelligence translators could help researchers to meet the arXiv preprint server’s new mandate that all manuscripts be submitted in English.Credit: Sharaf Maksumov/Alamy

Every month, more than 20,000 scientific manuscripts by authors from around the world are posted on the preprint repository arXiv, the oldest and best-known preprint site. Now researchers uploading their work to the site are facing a new requirement: from 11 February, all submissions must be either written in English or accompanied by a full English translation.

Until now, authors have had to submit only an abstract in English. Staff at arXiv say that the English rule will make life easier for its moderators and keep its readership broad. “We can’t be fair in judging papers if they are not in English,” says Ralph Wijers, the chair of the arXiv editorial advisory council and an astronomer at the University of Amsterdam, whose native language is Dutch. The site, based at Cornell University in Ithaca, New York, does not undertake peer review, but a team of some 300 volunteer moderators verifies that submissions are “appropriate and topical”.

Scientists hide messages in papers to game AI peer review

ArXiv hosts nearly 3 million preprints across eight subject areas, although the vast majority of the manuscripts are in computer science, physics and mathematics. Just 1% of submissions are in a language other than English. Nonetheless, the revised language policy has prompted some vocal complaints, including arguments that the burden of the mandate might deter people from making content such as PhD theses and preprints of textbook chapters public. Authors of such texts might think it is not worth the effort to translate them or to find an alternative venue for making them accessible

“I personally see it as a loss for our community,” says mathematician Angelo Lucia at the Polytechnic of Milan in Italy.

Several French mathematicians posted on the arXiv announcement saying they might take their manuscripts to the French preprint server HAL (Hyper Articles en Ligne), instead. HAL hosts works in several languages including English, French and Spanish, without requiring translations.

Machine translation

The arXiv policy specifies that automated translations, such as those done by artificial-intelligence chatbots, are acceptable, so long as they are faithful to the original work.

Editors at arXiv have some reservations about these systems’ capabilities, however. “Our advice is: feel free to use an AI or an LLM [large language model] to translate your text, but please check it,” says Wijers. “Our own experience is that AI translation is good but not good enough.”

... continue reading