#270 AI Translation State of the Art with Tom Kocmi and Alon Lavie

SlatorPod

SlatorPod
#270 AI Translation State of the Art with Tom Kocmi and Alon Lavie
Nov 21, 2025
Slator

Tom Kocmi, Researcher at Cohere, and Alon Lavie, Distinguished Career Professor at Carnegie Mellon University, join Florian and Slator language AI Research Analyst, Maria Stasimioti, on SlatorPod to talk about the state-of-the-art in AI translation and what the latest WMT25 results reveal about progress and remaining challenges.

Tom outlines how the WMT conference has become a crucial annual benchmark for assessing AI translation quality and ensuring systems are tested on fresh, demanding datasets. He notes that systems now face literary text, social-media language, ASR-noisy speech transcripts, and data selected through a difficulty-sampling algorithm. He stresses that these harder inputs expose far more system weaknesses than in previous years.

He adds that human translators also struggle as they face fatigue, time pressure, and constraints such as not being allowed to post-edit. He emphasizes that human parity claims are unreliable and highlights the need for improved human evaluation design.

Alon underscores that harder test data also challenges evaluators. He explains that segment-level scoring is now more difficult, and even human evaluators miss different subsets of errors. He highlights that automated metrics built on earlier-era training data underperformed, particularly COMET, because they absorbed their own biases.

He reports that the strongest performers in the evaluation task were reasoning-capable large language models (LLMs), either lightly prompted or submitted with elaborate evaluation-specific prompting. He notes that while these LLM-as-judge setups outperformed traditional neural metrics overall, their segment-level performance varied.

Tom points out that the translation task also revealed notable progress from smaller academic models around 9B parameters, some ranking near trillion-parameter frontier models. He sees this as a sign that competitive research is still widely accessible.

The duo concludes that they must carefully choose evaluation methods, avoid assessing models with the same metric used during training, and adopt LLM-based judging for more reliable assessments.

Episode Artwork #270 AI Translation State of the Art with Tom Kocmi and Alon Lavie 53:57 Episode Artwork #269 Milestone Localization Founder on Automated Glossaries, LSI Leadership, AI Fatigue 30:49 Episode Artwork #268 Thordur Arnason on Why Capgemini Is Building an AI Speech Translator 32:32 Episode Artwork Bizarre AI Research, Perplexity Ad Blunder, New RWS Hires 24:16 Episode Artwork #266 CaptionHub CEO Tom Bridges on AI-Powered Real-Time Media Accessibility 33:00 Episode Artwork #265 Slator Award, DeepL’s $5 Billion Plan, Merz Stirs EU Interpreter Debate 34:17 Episode Artwork #264 ElevenLabs Surprise, ChatGPT Stunner, YouTube Dubs, Microsoft Interpreting API 34:59 Episode Artwork #263 SlatorCon Recap, Cohere’s Big AI Translation Launch, TransPerfect Buys Unbabel 32:48 Episode Artwork #262 The Hard Facts About AI in Healthcare Interpreting with GLOBO CEO Dipak Patel 39:24 Episode Artwork #261 Finding Product-Market Fit in Language AI with Naitiv Founder Gayatri Shahane 26:29 Episode Artwork #260 Pairaphrase Co-Founder Rick Woyde on Building a Language Technology Platform 33:01 Episode Artwork #259 What Microsoft’s Misunderstood Copilot Study Actually Means for the Language Industry 23:46 Episode Artwork #258 Outdoor and Action Sports as a Growth Market for Localization with Martina Russo 34:32 Episode Artwork #257 The 50 Top Language AI Startups of 2025 29:29 Episode Artwork #256 YouTube Dub Fail, Propio Buys CyraCom, LSIs Cheer Scale AI Deal 31:17 Episode Artwork #255 The Rise of Voice Productivity with Krisp CEO Davit Baghdasaryan 36:36 Episode Artwork #254 EU Language Law with Professor Stefaan van der Jeught 50:31 Episode Artwork #253 SlatorPod Tom Elias Hanna on Why On-Site Interpreting Is Here to Stay and the Trump EO Impact 30:25 Episode Artwork #252 What Are Language Solutions Integrators and Language Technology Platforms? 38:22 Episode Artwork #251 Inside the LSA–Lingolet Strategic AI Partnership 46:12 Episode Artwork #250 HeyGen CTO Rong Yan on AI Video Generation and the Language Challenge 43:44 Episode Artwork #249 How to Expand in AI Data Services with DATAmundi CEO Véronique Özkaya 37:12 Episode Artwork #248 DeepL Plants Flag on iPhone, RWS Stock Puzzle 29:38 Episode Artwork #247 CIOL CEO John Worne on How AI Is Impacting the Language Profession 46:17 Episode Artwork #246 AI and the Future of Transcreation with Creative Translation CEO Luke Innes 36:56