SlatorPod

#218 How Large Language Models Replace Neural Machine Translation with Unbabel’s João Graca

July 12, 2024 Slator
#218 How Large Language Models Replace Neural Machine Translation with Unbabel’s João Graca
SlatorPod
More Info
SlatorPod
#218 How Large Language Models Replace Neural Machine Translation with Unbabel’s João Graca
Jul 12, 2024
Slator

João Graça, Co-founder and CTO of language operations platform Unbabel, joins SlatorPod to talk about the present and future of large language models (LLMs) and their broad impact across all things translation and localization.

First, the CTO explains how Unbabel was founded to address language barriers for people using services like Airbnb, combining MT with human validation to improve translation quality.

João believes that LLMs are quickly replacing neural MT models as much more R&D is going into LLMs vs NMT. He highlights that LLMs can handle more complex tasks like automatic post-editing, source correction, and cultural adaptation, which were previously difficult to achieve with traditional models.

He also tells the backstory of the company's decision to develop TowerLLM. João shares how Unbabel's approach involves using open-source LLMs, fine-tuning them with multilingual data, and applying techniques like retrieval-augmented generation to improve translation quality in production settings.

Despite the advancements, João acknowledges that human intervention is still necessary for high-stakes translation tasks.

The podcast concludes with the hiring environment for AI talent and the future directions for LLM development, with João expressing optimism about the continued progress and potential of these models.

Show Notes Chapter Markers

João Graça, Co-founder and CTO of language operations platform Unbabel, joins SlatorPod to talk about the present and future of large language models (LLMs) and their broad impact across all things translation and localization.

First, the CTO explains how Unbabel was founded to address language barriers for people using services like Airbnb, combining MT with human validation to improve translation quality.

João believes that LLMs are quickly replacing neural MT models as much more R&D is going into LLMs vs NMT. He highlights that LLMs can handle more complex tasks like automatic post-editing, source correction, and cultural adaptation, which were previously difficult to achieve with traditional models.

He also tells the backstory of the company's decision to develop TowerLLM. João shares how Unbabel's approach involves using open-source LLMs, fine-tuning them with multilingual data, and applying techniques like retrieval-augmented generation to improve translation quality in production settings.

Despite the advancements, João acknowledges that human intervention is still necessary for high-stakes translation tasks.

The podcast concludes with the hiring environment for AI talent and the future directions for LLM development, with João expressing optimism about the continued progress and potential of these models.

Intro
Background and Motivation Behind Unbabel
Research Contributions
NLP and LLM Impact
RAG Approach
Adapting Production Processes
Evaluating Model Usage
Evolution from Neural MT to LLMs
Comparing Price
Why Unbabel Decided to Build TowerLLM
TowerLLM Development Process
Multilingual Model Performance
Model Usage and Commercial Restrictions
Quality Testing Process
TowerLLM Challenges
Future of Translation Technology
Areas of Application for LLMs
Understanding xTOWER
AI Pipelines
Language Coverage
Hiring Environment
Acceleration of LLMs and AI Progress