AI Portfolio Podcast

Kyle Kranen: End Points, Optimizing LLMs, JNNs, Foundation Models - AI Portfolio Podcast

June 27, 2024 Mark Moyou, PhD Season 1 Episode 11
Kyle Kranen: End Points, Optimizing LLMs, JNNs, Foundation Models - AI Portfolio Podcast
AI Portfolio Podcast
More Info
AI Portfolio Podcast
Kyle Kranen: End Points, Optimizing LLMs, JNNs, Foundation Models - AI Portfolio Podcast
Jun 27, 2024 Season 1 Episode 11
Mark Moyou, PhD

Kyle Kranen, an engineering leader at NVIDIA, who is at the forefront of deep learning, real-world applications, and production. Kyle shares his expertise on optimizing large language models (LLMs) for deployment, exploring the complexities of scaling and parallelism.

📲 Kyle Kranen Socials:
LinkedIn: https://www.linkedin.com/in/kyle-kranen/
Twitter: https://x.com/kranenkyle

📲 Mark Moyou, PhD Socials:
LinkedIn: https://www.linkedin.com/in/markmoyou/
Twitter: https://twitter.com/MarkMoyou

📗 Chapters
[00:00] Intro
[01:26] Optimizing LLM for deployment
[10:23] Economy of Scale (Batch Size)
[13:18] Data Parallelism
[14:30] Kernel
[18:48] Hardest part of optimizing
[22:26] Choosing hardware for LLM
[31:33] Storage and Networking - Analyzing Performance
[32:33] Minimum size of model where tensor parallel gives you advantage
[35:20] Director Level folks thinking about deploying LLM
[37:29] Kyle is working on AI foundation models
[40:38] Deploying Models with endpoints
[42:43] Fine Tuning, Deploying Loras
[45:02] Stare LM
[48:09] KV Cache
[51:43] Advice for people for deploying reasonable and large scale LLMs
[58:08] Graph Neural Network
[01:00:04] JNNs
[01:04:22] Using GPUs to do JNNs
[01:08:25] Starting JNN journey
[01:12:51] Career Optimization Function
[01:14:46] Solving Hard Problems
[01:16:20] Maintaining Technical Skills
[01:20:53] Deep learning expert
[01:26:00] Rapid Round

Show Notes

Kyle Kranen, an engineering leader at NVIDIA, who is at the forefront of deep learning, real-world applications, and production. Kyle shares his expertise on optimizing large language models (LLMs) for deployment, exploring the complexities of scaling and parallelism.

📲 Kyle Kranen Socials:
LinkedIn: https://www.linkedin.com/in/kyle-kranen/
Twitter: https://x.com/kranenkyle

📲 Mark Moyou, PhD Socials:
LinkedIn: https://www.linkedin.com/in/markmoyou/
Twitter: https://twitter.com/MarkMoyou

📗 Chapters
[00:00] Intro
[01:26] Optimizing LLM for deployment
[10:23] Economy of Scale (Batch Size)
[13:18] Data Parallelism
[14:30] Kernel
[18:48] Hardest part of optimizing
[22:26] Choosing hardware for LLM
[31:33] Storage and Networking - Analyzing Performance
[32:33] Minimum size of model where tensor parallel gives you advantage
[35:20] Director Level folks thinking about deploying LLM
[37:29] Kyle is working on AI foundation models
[40:38] Deploying Models with endpoints
[42:43] Fine Tuning, Deploying Loras
[45:02] Stare LM
[48:09] KV Cache
[51:43] Advice for people for deploying reasonable and large scale LLMs
[58:08] Graph Neural Network
[01:00:04] JNNs
[01:04:22] Using GPUs to do JNNs
[01:08:25] Starting JNN journey
[01:12:51] Career Optimization Function
[01:14:46] Solving Hard Problems
[01:16:20] Maintaining Technical Skills
[01:20:53] Deep learning expert
[01:26:00] Rapid Round