ScienceFeatured5 min readlogoRead on nature.com

ArXiv's English-Only Mandate: Can AI Translators Bridge the Scientific Language Gap?

The arXiv preprint server's new policy requiring all submissions to be in English has sparked debate within the global research community. While intended to streamline moderation and maintain broad accessibility, the mandate raises questions about inclusivity and the burden on non-native English speakers. This article examines the rationale behind the rule, the community's mixed reactions, and the emerging role of artificial intelligence translation tools as a potential solution. We explore whether AI systems like large language models are sufficiently accurate for translating complex scientific manuscripts and what this shift means for the future of open scientific communication.

The global scientific community is facing a new linguistic frontier. Starting February 11th, the venerable arXiv preprint repository—hosting over 20,000 new manuscripts monthly—has implemented a mandate requiring all submissions to be either written in English or accompanied by a full English translation. This policy shift from requiring only an English abstract to demanding complete English content represents a significant change for the world's oldest and most prominent preprint server, raising important questions about accessibility, fairness, and the role of emerging technology in scientific communication.

arXiv preprint server homepage on a computer screen
The arXiv preprint server homepage, where the new English language policy is now in effect.

The Rationale Behind arXiv's English-Only Policy

arXiv's leadership cites practical and philosophical reasons for the language mandate. With nearly three million preprints across eight subject areas—primarily in computer science, physics, and mathematics—the platform relies on approximately 300 volunteer moderators to verify that submissions are "appropriate and topical." According to Ralph Wijers, chair of the arXiv editorial advisory council and an astronomer at the University of Amsterdam, "We can't be fair in judging papers if they are not in English." This perspective emphasizes consistency in moderation processes, ensuring that all submissions receive equivalent evaluation regardless of the moderator's linguistic background.

The policy also aims to maintain arXiv's broad international readership. As the primary destination for rapid dissemination of scientific findings before formal peer review, arXiv serves researchers worldwide who predominantly use English as the lingua franca of science. The platform's leadership believes that requiring English translations will ultimately increase the visibility and impact of research that might otherwise remain inaccessible to the global scientific community.

Community Reactions and Concerns

Despite only 1% of arXiv submissions being in languages other than English, the new policy has generated vocal criticism from segments of the research community. Mathematician Angelo Lucia from the Polytechnic of Milan in Italy expressed concern that "I personally see it as a loss for our community," suggesting that valuable contributions might be excluded or discouraged.

Some researchers have raised practical objections about the additional burden the translation requirement creates. Authors of specialized content such as PhD theses or textbook chapters might find the translation effort disproportionate to the benefit of posting on arXiv, potentially leading them to seek alternative venues. Several French mathematicians have publicly suggested they might redirect their manuscripts to the French preprint server HAL (Hyper Articles en Ligne), which hosts works in multiple languages without requiring translations.

HAL French preprint server logo and interface
The HAL preprint server interface, which accepts multiple languages without requiring English translation.

AI Translation as a Permitted Solution

In a notable concession to technological advancement, arXiv's policy explicitly permits the use of automated translation systems, including those powered by artificial intelligence. The guidelines state that AI-generated translations are acceptable "as long as they are faithful to the original work." This acknowledgment reflects the growing capabilities of large language models (LLMs) in handling complex linguistic tasks.

However, arXiv editors maintain reservations about current AI translation quality. Wijers advises researchers: "Feel free to use an AI or an LLM to translate your text, but please check it. Our own experience is that AI translation is good but not good enough." This caution aligns with findings from a 2025 Nature survey of over 5,000 researchers worldwide, where more than half of respondents believed AI-translated papers should be checked by a native speaker before submission.

Evaluating AI's Scientific Translation Capabilities

The question of whether AI systems can adequately translate complex scientific manuscripts requires careful examination. While LLMs like GPT-4o excel at conversational translation, their performance on specialized academic text merits scrutiny. Research by James Zou and Hannah Kleidermacher at Stanford University investigated this very question through an innovative methodology. They created automated benchmarks by having GPT-4o generate multiple-choice quizzes based on scientific papers, then evaluated the model's performance on translated versions of those same papers.

This research approach highlights both the potential and limitations of current AI translation systems for scientific content. The technical precision required in scientific writing—with its specialized terminology, mathematical notation, and nuanced argumentation—presents unique challenges that differ significantly from general language translation. As noted in the Nature article covering this development, limited attention has been given to LLMs' prowess specifically at translating scientific papers, suggesting this remains an emerging area of both technological development and scholarly investigation.

OpenAI GPT-4o interface showing translation capabilities
An AI translation interface, such as OpenAI's GPT-4o, which researchers may use for manuscript translation.

Implications for Global Scientific Communication

arXiv's language policy represents more than just an administrative change—it reflects broader tensions in global scientific communication. The dominance of English in scientific publishing has long been criticized for creating barriers for researchers from non-English-speaking regions, potentially excluding valuable perspectives and contributions. The new mandate, while intended to streamline operations, may inadvertently reinforce these existing inequities.

Yet the permission of AI translation tools offers a potential pathway toward greater inclusivity. If these systems continue to improve in accuracy and accessibility, they could reduce the translation burden on individual researchers while maintaining the benefits of a common scientific language. The key question becomes whether AI translation quality will advance sufficiently to handle the precision required in scientific discourse without introducing errors or misinterpretations that could compromise research integrity.

Looking Forward: Balancing Accessibility and Quality

The arXiv language mandate arrives at a pivotal moment in the evolution of both scientific communication and language technology. As AI translation systems become increasingly sophisticated, their role in facilitating cross-linguistic scientific exchange will likely expand. However, the scientific community must establish clear guidelines and best practices for using these tools to ensure that translations maintain the accuracy and nuance essential to rigorous research.

The ongoing development of specialized scientific translation models, combined with human oversight mechanisms, may offer the most promising path forward. By leveraging technology while maintaining quality standards, the global research community can work toward a future where language barriers diminish without compromising scientific rigor. As this transition unfolds, platforms like arXiv will serve as important testing grounds for how technology can bridge linguistic divides in the pursuit of shared scientific knowledge.

Enjoyed reading?Share with your circle

Similar articles

1
2
3
4
5
6
7
8