Your AI Roadmap

LLMs and Multimodal AI with Stefania Druga of Google

April 29, 2024 Dr. Joan Palmiter Bajorek / Stefania Druga Season 1 Episode 1
LLMs and Multimodal AI with Stefania Druga of Google
Your AI Roadmap
More Info
Your AI Roadmap
LLMs and Multimodal AI with Stefania Druga of Google
Apr 29, 2024 Season 1 Episode 1
Dr. Joan Palmiter Bajorek / Stefania Druga

In this episode, Stefania Druga (she/her), Research Scientist on the Bard (now Gemini) team at Google, shares insights into her work on developing multimodal AI applications. She explains how Bard, a LLM, is trained on a massive amount of internet data, enabling it to understand and generate text, summarize content, and interact with images. They cover the advancements in AI and importance of multimodal capabilities that extend beyond text to include images and other data forms, pushing the boundaries of AI's application in daily life.


Stefania Druga Quotes

📊 Data Set Sizes: "You can think of what does it mean to scrape the entire internet and that's pretty much it...all the data that was ever created digitally and has the right permissions."
⚙️ “These AI assistants are becoming now embedded in not only standalone applications like Bard and ChatGPT, but also in a variety of products...they can become embedded in your calendar to help with smart planning” 
🚀 “I realized in 2016 that AI, voice assistants, machine learning are going to be huge. So I started thinking of how do we teach the next generation? That's how I started working on Cognimates & launched a platform.”


Resources

  • Fast.ai: Making neural nets uncool again
  • LangChain: Build context-aware, reasoning applications with LangChain’s flexible abstractions & AI-first toolkit
  • Stefania’s publications: stefania11.github.io
  • Cognimates: Stefania’s coding education project started @ MIT Media Lab


Stefania Druga is a Research Scientist at Google Gemini AI. She was a principal researcher at the Center of Applied AI Research at the University of Chicago. She graduated with a PhD in Creative AI Literacies at the University of Washington and a master of science at MIT. During her PhD, she did several research internships at Google X, Microsoft Research, and Fixie.ai focusing on LLM applications for developer tools, programming languages, and data science applications. She loves trail running & drawing with robots. Connect with Stefania: LinkedIn & Twitter


More from Your AI Roadmap
Watch on YouTube! @YourAIRoadmap
LinkedIn: Connect with Joan, and let her know you listened! ⁠
Joan has a BOOK with Wiley coming! AI, Careers, and Future-Proofing Your Income: Book Waitlist

Who is Joan? Ranked the #4⁠⁠ in Voice AI Influencer, ⁠⁠Dr. Joan Palmiter Bajorek⁠⁠ is the CEO of ⁠⁠Clarity AI⁠⁠, Founder of ⁠⁠Women in Voice⁠⁠, & Host of ⁠⁠Your AI Roadmap⁠⁠. With a decade in software & AI, she has worked at Nuance, VERSA Agency, & OneReach.ai in data & analysis, product, & digital transformation. She's an investor & technical advisor to startup & enterprise. A CES & VentureBeat speaker & Harvard Business Review published author, she has a PhD & is based in Seattle.

Clarity AI builds AI that makes businesses run better. Our mission is to help SMB and enterprise leverage the power of AI. Whether your budget is 5, 6, 7, or 8 figures, we can build effective AI solutions. Book a 15min

♥️ Love it? Rate, Review, Subscribe. Send it to a friend 😊

...
Show Notes Transcript Chapter Markers

In this episode, Stefania Druga (she/her), Research Scientist on the Bard (now Gemini) team at Google, shares insights into her work on developing multimodal AI applications. She explains how Bard, a LLM, is trained on a massive amount of internet data, enabling it to understand and generate text, summarize content, and interact with images. They cover the advancements in AI and importance of multimodal capabilities that extend beyond text to include images and other data forms, pushing the boundaries of AI's application in daily life.


Stefania Druga Quotes

📊 Data Set Sizes: "You can think of what does it mean to scrape the entire internet and that's pretty much it...all the data that was ever created digitally and has the right permissions."
⚙️ “These AI assistants are becoming now embedded in not only standalone applications like Bard and ChatGPT, but also in a variety of products...they can become embedded in your calendar to help with smart planning” 
🚀 “I realized in 2016 that AI, voice assistants, machine learning are going to be huge. So I started thinking of how do we teach the next generation? That's how I started working on Cognimates & launched a platform.”


Resources

  • Fast.ai: Making neural nets uncool again
  • LangChain: Build context-aware, reasoning applications with LangChain’s flexible abstractions & AI-first toolkit
  • Stefania’s publications: stefania11.github.io
  • Cognimates: Stefania’s coding education project started @ MIT Media Lab


Stefania Druga is a Research Scientist at Google Gemini AI. She was a principal researcher at the Center of Applied AI Research at the University of Chicago. She graduated with a PhD in Creative AI Literacies at the University of Washington and a master of science at MIT. During her PhD, she did several research internships at Google X, Microsoft Research, and Fixie.ai focusing on LLM applications for developer tools, programming languages, and data science applications. She loves trail running & drawing with robots. Connect with Stefania: LinkedIn & Twitter


More from Your AI Roadmap
Watch on YouTube! @YourAIRoadmap
LinkedIn: Connect with Joan, and let her know you listened! ⁠
Joan has a BOOK with Wiley coming! AI, Careers, and Future-Proofing Your Income: Book Waitlist

Who is Joan? Ranked the #4⁠⁠ in Voice AI Influencer, ⁠⁠Dr. Joan Palmiter Bajorek⁠⁠ is the CEO of ⁠⁠Clarity AI⁠⁠, Founder of ⁠⁠Women in Voice⁠⁠, & Host of ⁠⁠Your AI Roadmap⁠⁠. With a decade in software & AI, she has worked at Nuance, VERSA Agency, & OneReach.ai in data & analysis, product, & digital transformation. She's an investor & technical advisor to startup & enterprise. A CES & VentureBeat speaker & Harvard Business Review published author, she has a PhD & is based in Seattle.

Clarity AI builds AI that makes businesses run better. Our mission is to help SMB and enterprise leverage the power of AI. Whether your budget is 5, 6, 7, or 8 figures, we can build effective AI solutions. Book a 15min

♥️ Love it? Rate, Review, Subscribe. Send it to a friend 😊

...

Hi, my name is Joan Palmiter Bajorek. I'm on a mission to decrease fluffy hype and talk about the people actually building in AI. Anyone can build in AI, including you. Whether you're terrified or excited, there's been no better time than today to dive in. Now is the time to be curious and future-proof your career, and ultimately, your income. This podcast isn't about white dudes patting themselves on the back. This is about you and me. and all the paths into cool projects around the world. So what's next on your AI roadmap? Let's figure it out together. You ready? This is Your AI Roadmap, the podcast. Hello, welcome. Glad to have you here. Would you mind doing an introduction? Hi, my name is Stefania Druga. I'm a research scientist in the BARD team at Google. Cool, awesome. Well, to the extent you can tell us about it, what projects are you working on? I'm working on new Bard applications with that the team has been developing. Okay, super cool. Well, and some of our listeners will be like, Bard, obviously, duh. And some may be really new to that product. Could you tell us more about what is Bard, kind of goals around it? Bard is one of the large language models from Google. a large language model is basically a new form of AI technology that enables us to respond to all sorts of questions and generate text, generate summaries based on text that we provide. And more recently, it can also answer questions about images, not only text. So if you upload a picture and ask, like, What is it that you see in the picture, if the picture is from your living room and you want to ask this AI, give me ideas for how I could decorate my living room in this style, it could help you with that. these large language models are trained on a large quantity of data that is existent in the internet or from other sources. based on that, it uses to determine what the next probable word should be in a sentence. we did not expect that to work so well, but it turns out that if you train a model on extremely large quantities of data, just by predicting what the next word should be, the next token, we can actually get very, very accurate responses to questions. Awesome, great explanation. And when you talk about big data, can you give, again, some people will know exactly what you're talking about, other people, when we're talking about big, like how big of datasets do you think are usually used for this type of project? You can think of what does it mean to scrape the entire internet and that's pretty much it, right? So all the data that was ever created digitally and has the right permissions, right? Of course, because there are certain information we find on the internet that is shared with certain licenses. So it cannot be included in the training sets for these large language models. That makes sense, I think just the vastness of these size of datasets boggles the mind. It's really hard to wrap my head around at least. Well, I could tell you the amount of petabytes or like trillions of, but I don't, I don't know if that would be necessarily representative to your listeners. Yeah, yeah, no, that makes sense. Ginormous, big, big, big, you think about the Bard product, and I think some people may think of it as a corollary to ChatGPT, and you mentioned kind of multimodalities, or how might people use Bard differently from other products that are out there? Yeah, so it's very similar to chat GPT. You can think of it as a chat bot, like a useful assistant be your companion for all sorts of tasks. Like maybe you want to get a new recipe, like you take a picture of what you have in your fridge and what are all the things I can cook, right, with this leftovers. Or maybe you want to crunch some data you have like. some data that you want to very quickly transform into a visualization for a deck, or maybe you're an educator and you want to figure out like, oh, what are all the ways in which I could evaluate lesson, right? Or give me ideas for how best to teach this this duration of time. So people have been using this applications like this. Let's call them chatbots for now, because that's the main interaction form. Like you dialogue, like you type a question and you get the answer. And you can follow up with In all aspects of life, like for drafting emails, for doing code generation. So if you don't know how to code and you want to get started, a lot of people have used Copilot in GitHub. And there are lots of other variants of that. That's another application using large language models. for code. So these AI assistants are becoming now embedded in not only standalone application like BARD and ChatGPT, but also in a variety of products. So they can become embedded in your calendar to help with smart planning or meetings or time allocation in your email, in whatever you use for drafting text, or if you're a content creator. If you're a designer, like now, even applications like Figma or in slides, right? Like you images, like Generative AI, in your slide deck or in your design tool. So across the board, we're seeing an integration of Generative AI and large language models in all suite of products that people are using. Wow, that's a pretty big purview. All the different products. also just wanna make sure to unpack, you mentioned the word multimodal and I think you gave an implicit explanation, but for those people who haven't heard of the term multimodal, how would you might you explain that to them? Yeah, yeah. So when the research started on language models and large language models, which use primarily the architecture that is called transformers, most of the data that these models were trained on was made of text. So it was natural language processing and it would output text. Multi -model expands that capability to include other forms of data that is not text. So being able to work with images is the first step, being able to work with audio, synthesize new forms of audio or of audio. Video generation is huge these days as well. So multi -model is... using the same architecture and underlying technology but operate with many other modalities beyond text, images, audio, video. Thank you, that was very comprehensive, Bard and for your work, is there a specific part of the project you're working on or you're on the research end? Can you tell any more about what you're specifically working on? on the research end on its applied research. We are working on a new suite of products that will be launched at IO. And I think that's all I can say for now. That's totally reasonable. I get that. You've been working in this field a little while, I believe. What are some surprises to you as you've been working in this field? been working in this field before ChatGPT existed or before we had GPT -4. I've actually been working in AI since 2016 when I did my master's at MIT. And initially, I built a platform to teach AI to kids and families. So I expanded Scratch, which is the largest platform for coding for kids, to have extensions for AI. So anyone who's non -technical could use this visual programming language, it's kind of like Lego blocks on the screen, to create programs and be able to play with things with smart things, like sentiment recognition, be able to play with object recognition, with program there's voice assistance, like if they had a Google Home or an Alexa or whatever They were able to teach it things whenever it didn't know how to answer a question. Connect that to IoT. And the core functionality at the time for the platform I built, Cogni Mates, was to enable users to train custom models. So if you wanted to train a model to recognize specific images, like let's say you wanted to train a model to recognize rock, paper, scissors so you can program a game where you play against the computer, showed kids how they could do that, right? Like take 10 pictures of your hand in each of the gestures, give them the right name. The model learns how to recognize those gestures, and then you can program a game using your custom model. Same goes for text. Like I had students who really wanted to train a model for backhanded compliments or all sorts of things. And this was back Like I started working on it in 2017. I launched the platform in 2018. It's free. It's open source. It's still online. at the time, we didn't have language models. We had more like classic machine learning classification models. And I was using something that is called transfer learning, which means that you already have a model that is very good at classifying images. But you add some examples at the top that get ranked higher. So for a child to be able to see the results of their training, they're not going to have the patience to add hundreds of images to do a custom training. So they're only going to add maybe 10 images or even less, right? And that still needs to show a result in the training. So for that, we use transfer learning. But it's highly empowering to be able to say, I know how I can make this technology more. fun or more relevant or more meaningful for my use case, right? Like maybe I'm very interested into something very quirky. I had a student who loved sloths, right? she wanted to train a model that can really recognize sloths and understand everything about sloths from the way they move what's their habitat and what they like to eat. that's something that she could do based on a passion or an interest. so that's kind of how I got started by, democratizing, the early stages of machine learning and AI for everyone who's non-technical. the idea there was to foster critical understanding of this technology, because I realized at the time that voice assistants are becoming part of the home and we have the first generation of youth that is growing up with AI. that really changes, how we perceive this technology, how we learn with it. And therefore it's quite important beyond like classic literacy, learning how to read and write, also having AI literacy, learning how to ask questions from this chatbots, robots, AI models, and also learning how to make a critical use of them. in terms of large language models, specifically for the past two and a half years, I've been doing lots of, residencies and internships and collaboration with large tech companies and also startups. I got to work on application of language model for code generation, data scientists before, before GPT -4, before ChatGPT. it's been quite exciting to be part of that and realize how quickly things have been moving in terms of the quality of the output from these models, but also to understand that gaining people's trust and figuring out how to integrate these new workflows into their existing workflows, it's a very humbling experience, right? understanding like how do we still give people the agency so they can do what they do best. while maybe allowing them to automate some parts of their work that are more tedious so they can focus on the things that they are best at and where they can contribute the most. That's awesome. Yeah. Well and, I think about a lot of these tools as kind of an augmentation or, you know, hopefully clearing up more of this manual clutter that we have to frequently do, but really enhancing our ability to do other And I think there are lots of concerns. I recently spoke at the Creative Summit in Europe. This was in Sweden. There were lots of creatives in the room, designers. the first question, or one of the questions that comes up a lot, are we going to be replaced by generative AI? What happens with the IP? How can you opt out? So if you're an artist, then you do not want your data to be included in the training sets. What can you do? So those are all extremely, extremely important and valid questions. But I think people in general, in particular for a specific profession, but in general, they're excited because they've never seen a technology that is so accessible. You can just chat with it and ask any question and get pretty, pretty meaningful results. Not always. It's not perfect. but at the same time be like, oh, what does that mean for my career or for my kids or for how we teach, for education, for future of work. I think this is on everyone's minds these days. Definitely, yeah. That's one of the reasons for this podcast of Your AI Roadmap, letting people know about different projects, but also kind of actionable steps if they wanna get into the field, since so many people aren't and have not been working in this field for several years. Before we get into kind of career things though, you've done so many cool projects. What are some of the learnings that you've had from some of these things or things that you were like, whoa, I thought it was gonna go this way and actually, you know, different examples. let me think There's so many. There's several. So one of the things that really surprised me when I worked on code generation, part of my residency with Google. part of the features that I was working on were launch. So I can talk about them. They are now integrated in Google Colab, which is this tool that a lot of data scientists or a lot of people use in the browser. basically you could write a prompt in natural language, like just say, write me a function that counts certain values or counts the number of letters in this text or whatever it is that you want to do. And then it generates the code for you. And then you can edit it or run it. I worked on the team that built this functionality for code generation in Colab. I think it was very humbling in the process of building that tool. One, to understand how much the quality of the data matters, how it's very important to really understand what goes into the training data, how do we ensure that we have very high quality code that goes into the training data, that it's also not duplicated. There is a lot of people. copying code from each other on the web or Stack Overflow, or there's a lot of template code because we build similar applications. Like when you do web websites or when you do other forms of, web apps, like there's a lot of template code. So that's going to be duplicated a lot. And it really kind of biases the type of results you're going to get from the model. So understanding how much the quality of the data matters and how do we control for that. But then also how every single... signal that you give to the user on the output in the application really influences the use. if we tell them when we generate the code, like, oh, this code was accepted by many other users like this many times before, that's a trust signal that matters a lot. Or if we tell them this code comes from a validated source, like it comes from a very high or organization, or we tested this code and it already runs. It doesn't have any errors. You don't need to do that. Or sometimes the code can execute and have an output, and that really helps. So for example, if the generated code creates a visualization for humans, it's much easier to quickly evaluate the qualities. That's the graph I wanted to get or not. rather than looking at the code, right? being able to have all sorts of outputs of the execution of the code, it's a very useful signal for people. the other thing that matters a lot is to understand what part of the programming workflow people wanted to automate first. something that surprised me, I did large study and a qualitative study and I got different insights from both, but in the large quantitative study I did not expect that the majority of the potential users actually wanted the first thing they wanted to automate was template code. they wanted to say like, I want to build a web app that takes in this files and takes in like images and outputs like. animation based on these images, right? And it creates all the template code for your application. And you can just go in and customize like the colors and the sizes or the type of images, formats, or things like that. one of the things that people wanted to automate first was template code. I was surprised because my expectation was that the first thing they would want to automate would be maybe supporting new programming languages or in applications that they're not so familiar with, right? That was an interesting finding. The other interesting finding was that we assume that because we use natural language, like to talk to these models, that that's extremely accessible to people, but that's not always the case. So it's not obvious. How would you describe generating a graph in natural language, right? we are not very used to doing that. like to say X -axis should have this and Y-axis should have this and the plot should be in this. That's a lot of description that we don't do on regular basis, right? similarly, any sort of prompt where I want to, if I want to ask the model to help me build an application, I need to have a mental map of what are the steps to build that application. So I need to have some sort of a... algorithmic thinking, computational thinking, even if I'm describing it in natural language, like to know what the steps are or to know what are the important features of my app. it was interesting to see that many users were like, yes, now I don't need to read and learn Python, but I need to learn prompting, right? Like this way of giving directions to the model or giving questions to the model, that's called prompting. Prompt engineering, it's also a skill, like knowing what sort of questions work better than others or knowing how to format our instructions to the model to get the best quality output. that is a skill that people can learn and practice. And of course, the models are getting better. But it is something that... maybe we sort of take for granted that humans will always need to be able to break down a task or their workflow into actionable steps. that requires a certain level of expertise. Yeah, almost like you're telling, you're saying like, humans need to learn how to say something so the computer will understand what they even need. And part of me is like, oh, but I shouldn't have to explain myself. And other times, I immediately think of some of my aunts and uncles, if they were to say, make a graph, where would even one begin? Or that it's so ambiguous. I'm sure a graph could be. but would it be the graph that my aunt is thinking about? Probably not. So having that vocabulary. vocabulary and then also just expectations. Because when you start, it's kind of like starting with a blank page, and you don't know what the technology can or cannot do. Having examples of what other people have tried and what worked and what didn't is very helpful. But it makes me think of the early days of search, which Google is. I actually used to work in search team at Google in 2009, so a long time ago. But early days of search, it's kind of interesting because people have modified their information retrieval mental models to match what the search engines at the time could do. So they would not necessarily ask a question in the way that it was the most natural for them. They would ask a question in a way they knew, like, if I use specific queries, I'm going to get better results, or I need to... order the queries in this way, or if I use advanced search, these are the filters that I can add. we actually have lots of studies that show that our mental model and understanding of how search works influence the way we search, which makes sense, right? And it's super interesting to do it from a developmental point of view. Like if you ask kids to draw, how a search engine works, or if you ask kids to draw what's inside a voice assistant, how a voice assistant work. I haven't done it with like Bard or chat GPT, but that would be fascinating too. That's very telling on like what's their mental model, what happens under the hood, like what's in the black box, right? seeing how that mental model evolves once they start to use the tool they form hypothesis like, oh, maybe if I change this, if I add a keyword, a synonym to the keyword, it would give me more, or maybe if I add in prompts, like a lot of words, it would increase the quality of the image or people have all sorts of hypothesis and they get to test them and then refine their understanding of what the tool can or cannot do. Absolutely. Yeah, no, I always am when I'm doing prompt engineering for copy for myself for text. I always have to like change the tone or like mention like playful or like, you know, joking because my own tone and personality are not frequently the same as the output of just quote unquote generic or, you whatever that that tone may be. So I think if that's the goal that it would match my tone. You also use the phrase natural language. could you explain a little bit what you mean by natural language? Maybe I'll just start there. Natural language is basically language, like written language, spoken language. The term was coined more in the research community. We talk a lot about NLP, which stands for natural language processing. it was a more kind of systematic way of referring to written language and spoken language. It's how people talk. on a day to day basis basically. Totally. I think the phrase outside our community sounds so clinical, like natural language. some people don't even break apart NLP. It just becomes a stagnant. Yeah. Yeah, acronyms are interesting. I was doing this residency for six months and I came back like a few months later at the company and there are a ton of new acronyms. So every time I'm like, so what does that mean? actually today I got a nice explanation saying like, oh, I don't know what these means, but I just kind of like figure it out in context and I don't try to understand. But yeah, I think there's an interesting form of gatekeeping that happens with acronyms. there's also a certain type of culture that gets propagated, right? Because we don't want this knowledge to belong only to the academic community or only to the tech community. it's very important to use words where we have a common. understanding and this understanding keeps evolving and not only understanding, but maybe reference, right? the way we explain and perceive certain words is culturally defined and it changes and evolves over time. in particular in the field of AI is really, really important to find a language that resonates with most people and their experiences. I've been working on a study, we just submitted this, basically showing why this matters, right? That if you have anthropomorphized language when describing AI systems and AI products, that can really impact the trust on the users. anthropomorphized means when you ascribe human -like abilities to a non -human entity. if you say something, the model thinks, Bard remembers, or Chad GPT remembers, or thinks, or is intelligent, or reasons, or like all of these cognizer, all of these verbs that assign cognition or assign human -like abilities to a non -human entity, right? Because ultimately, this is like probabilistic automation systems and devices, right? That comes with a certain set of expectations. we wanted to really measure exactly how much does that influence, how much, how people trust a product. if we describe the same product, let's say it's a robot lawyer. If we describe it with this kind of language where Robert lawyer will remember and think about and reason and versus like use much more down to earth or like very factual language to describe the abilities of the system. How does that play a role in how likely people are to use it, how much they trust it, things like that. So stay tuned for, for the results. There's just a lot of press coverage and the description of AI technology is overemphasizing the abilities of these technologies. that can come at a cost because if you set the expectations too high, people are likely to get disappointed. And it's just not an accurate description of what the system is or can do. That makes a lot of sense. What I'm very much reminded of, especially in the talking about anthropomorphizing different systems, when we name them or choose how they're named, there's all kinds of questions about gender. I think one of the things that has boggled my mind in the last few years being in the voice and AI space is how many people, mostly families, happy hours, don't realize that humans are working on them. Mm -hmm. literally somehow think these devices just arrive. Ka -baw, ta -ding. And I'm like, there are thousands of people that have worked so hard on that exact feature that is driving you crazy. But really thinking of how hard we have to work sometimes to make such hopefully beautiful user experiences. I agree with you, the expectation of what this can do already or shocking people with, or just I think how users come to... to use these devices or different tools. It's been fascinating for me to watch them interact with some of the state of art, the art stuff that's public at least. Well, we've heard so much about your work. Thank you for sharing your projects. before we jump into your career and your own path, how do you, and not Google, you, how do you see kind of the future of this part of the field or kind of the shape of the next few years? multi -modal AI is going to be a game changer in many ways. even in the past month or the past months, like the discoveries we've seen in nature, like when we are able to parse, go beyond the written word and be able to work with images and videos and schematics and... chemical reactions, physics simulations, physics engines, material structures. There's just such a wide range of possibilities that open up. there was a recent paper from DeepMind in Nature that showed that by using multimodal, they were able to discover so many new types of materials. Material science is a field that can disrupt so many aspects of our life, right? Like maybe we'd have better batteries or photovoltaic cells, right? there's so many things that come with that. So being able to use multimodal language models to explore new materials or new drugs there was a study. about geometry, right, and math, how a model that was trained in particular on geometric problems was able to solve Olympiad challenge much faster and better than humans. because I worked for so long in education, you know, I'm very, very excited for the possibilities in education. Let's talk about math for a second, right? Like the largest learning laws during the pandemic in K through 12 were in math. So if all of a sudden you could draw a figure or an equation or whatever problem you're working on, take a picture of it and have a model kind of analyze or explain, draw over it and say, what if you draw this line? What happens to this angle? being able to have a more direct link between the problems that we deal with on a day -to -day basis and potential solutions. I also think that we should not give people the solution straight away. I think we should give them like hints or questions and help them figure it out on their own. So I build a co -pilot for kids too in Scratch. And when I tested it with kids, what I found out was that most of the times they did not want the AI to write the code for them or fix their code. They just wanted it to ask them questions until they can figure it out because it was their program, their game, their project. So it's more kind of like a debugging duck, but a smart debugging duck, right? when you're working on something with a friend or like with multiple friends and you could have like this collective intelligence. what if we tried this? What if we tried that? What if we tried this other thing? opportunities in discovery in chemistry, medicine, material science, physics, like all of these domains where we have a lot. data that is visual that we couldn't parse before and now we can. And where we could just like run simulations and run experiments in parallel, like hundreds of thousands of experiments, that's gonna lead to new discoveries just by the sheer scale of how much faster we could make hypotheses and test them. I'm very excited about that. Also in terms of how it's gonna augment and accelerate our learning in research, a lot of our research is conditioned by the research questions we ask. And there are lots of blind spots there, right? Like in a field, how do we know like there's kind of like this clusters of schools of thought and approaches and what if we could actually have. AI that could show us like in all of your papers or in all of your research career, here are the blind spots, right? Or here's an area where nobody has looked at in math, right? we could write new proofs or write new theories I've been using LLM's language models extensively in research whenever I want to keep track of what's happening in a field. I can just write an abstract or I can write a description of here's what I'm interested on. And they would write a summary on hundreds of papers that were written about that topic on Google Scholar or Archivics or Semantic Scholar and reference like the top 10 or 20 or 30, I can tell it most relevant studies. I find that alone very, very helpful. I'm excited about how we might use it to change people's minds and people's beliefs. we live in a very polarized society without going too much into sensitive topics like politics. this year is a is a very important year for democracy. We have four billion people around the world who are going to vote in 2024. what that means is that the way we process information and how we gather insights, seen lots of studies around polarization online, like you could present the same information to two different camps and they're going to have very different interpretation. I'm very excited to think of how might we use AI in the future to change people's mind, or at least like open them up to. contradictory points of view or creating safe spaces where you could have debates with from different perspectives without that being over sensitive or overly controlled or without being afraid of being judged, right? being open to learn. that's so cool. there's so many ideas you're throwing around. That sounds wonderful. Well, as people hear this and they say, whoa, this is so cool, I would want to do something similar. Can you share about your background to getting into this work and then advice you might have for folks? So how did you get into this field? Let's start there. journey. I'm older, so I don't know if we'll have time to cover all of it, but I can do a high level. my trajectory is not the most conventional because I'm first generation to go to college. I come from a very small town in Transylvania in Romania. I am the only person from my family who left to live abroad. And I left when I was... 18 I didn't necessarily have people to give me advice or help me financially or even tell me what I should try or not try. I had to do a lot of figuring that out and trailblazing on my own. there were several pivotal moments, but early on in my undergrad, one of the things that helped me the most was to get involved with the student movement and in all sorts of like student organization. I learned a lot of leadership skills and community skills. So not so much related to technical or to AI, but having like those skills of how do you build a community? How do you have like good leadership skills? how do you learn how to learn? A lot of that came beyond school, from student movement and student organization. another pivotal point was when I decided to start my own NGO. So I started an NGO where I was teaching kids around the world. It's called Hackademia. It was like STEM skills, and it was project -based learning and teaching teams, local teams, in different parts of the world. I've been to 63 countries teaching it. like teaching local teams so they can run these workshops on a regular basis and use things they have access to, if they have microcontrollers like Arduino or sensors or robots, or you could do a lot of things with old electronics or old things that you can repair older arcade devices, all sorts of things. So that was really mind opening in terms of helpful it is to start learning something hard based on a passion you have, because that's what would keep you through when things don't work or when you need to bang your head against the wall because you're running into harder problems. But you really want to get your project to work. starting with a passion, like if you're into music or if you're into sports whatever it is that you're into, starting with that passion and build a project with it and in that process, learn all the skills that you need. that was like an amazing experience and very humbling also to see what it means to have direct impact. I decided to go back to school after doing the NGO and I did a full stack. development school in New York, that was very helpful for me to kind of polish my technical skills and be able to go from idea to implementation much faster. the first summer, so I did this, this full stack development class the next summer I was teaching it for girls who could. that's also a very important, important lesson to take away is the best way of learning is to teach. So it's like, I just learned these things. I've been working on it for a year. Now I'm just going to go and teach it. in order to teach it, you need to have it very, very clear in your head such that you can explain it to other people. after that, I got accepted into Media Lab. that was how I got started with AI education. By that point, I had worked a lot on technical education for youth, like encoding and microcontrollers, electronics. I realized in 2016 that AI, voice assistants, machine learning are going to be huge. So I started thinking of how do we teach the next generation? that's how I started working on Cognimates and launched a platform. I continued that work during my PhD. I took a gap year between the master at MIT and the PhD at UW in Seattle, where I was teaching at RISD, which is an industrial design school, and at ITP at NYU. I was teaching a class on how to design meaningful smart toys. that was awesome to work with industrial designers or interaction designers more creative coders and some of their could have been products that they launched, right? Like inflatable Legos or like box where you could just hum a song and you would recognize the song and you could add different effects all sorts of like cool things came out of that class. then yeah, I did my PhD at UW in Seattle. like six months before the pandemic. I only had six months in person on campus and the rest was all remote. there I had to learn like, how do I really change the way I work so that I can work with families or kids remotely and create this community. I had 15 families from 10 different states in US that I worked with for two and a half years. that really changed the nature of the work because I had access to families with very diverse backgrounds who spoke lots of different languages, different ethnicities, not so much like exposed to tech or tech world. also to get to know them very well and for them to trust me and be able to see how they integrate and use these technologies in their home over a very long period of time. that was great. the pivotal points in terms of technical knowledge or getting access to opportunities definitely came from internships. I applied to a lot of internships I did two internships with Microsoft research. One was in make code and one was in the human AI interaction team where I worked on the Bing chat. And then I did a six months. residency with Google when I was working on the code generation tools also on some of the image generation like prompting. internships were fantastic. I also did an internship with a startup called Fixie. They build agents. So I got to explore. It allowed me. one, to learn a lot of technical skills and see how some of the things that I was learning in my PhD get applied like in the real world how the constraints of research changes from academia to industry. But it also allowed me to discover the culture of different companies and different organizations that are a different size or prioritizing different things and then decide what would be the best home for me to work long-term. that was probably the most important decision. I tried to do as many collaboration and internships with the industry, with startups, with public organizations like Hugging Face or Data in Society. we talked so often about... academia versus industry and forget that there's like this third option, right, where you could work either in not -for-profit organizations or collaborate with the government or write your own grant. Like there's this NSF small businesses innovation and research grants that a lot of my friends got. that could be another option. The only constant is change. being ready to always learn hard things and be comfortable with chaos and be comfortable with complexity. It's especially in today's world. I don't know about your parents, but my parents, you know, they had one job all their life, like a different kind of. work stability and a different vision of their career. when they see me move location or move jobs or that they're very kind of surprised. for my generation, continuous learning is the new normal, right? getting very good at learning that's like the biggest advice. be very reflective on how you learn and how you can improve. That's excellent advice. Yeah, one, it's cool to hear the internships and how fruitful that was for you. You just listed off so many different things, asking for more resources and you yourself going to academia. Are there any specific programs or certificates or actionable things that if people were like, oh, I want to sign up today or resources that you might point people to? Fast .ai is a fantastic resource for people who just want to get started. Jeremy Howard is behind it, together with other folks. Jeremy also started Kaggle, which was the first competition on AI solution using different data sets. I found the Fast .ai courses extremely easy to get started with and quite valuable. I also really like the LangChain community. So in this space of language models and agents is applications that are being used with this new language models. LangChain is an open source library that is very, very popular. they have communities in all sorts of different cities in US. you could go to a local meetup or a local hackathon. But their documentation is fantastic. So even if you've never programmed before or don't know anything about language models or AI, it's a good place to start. It's great examples and a great community. those are the top two things that come to mind to get started. There's also an AI engineering summit that happened for the first time last year. It's going to happen again this year. The talks are available online on YouTube and they're fantastic. that's kind of the top bleeding edge of people pushing the boundaries in AI engineering and AI research. then on the academic side, some of the big conferences like New Reps are always going to be a good insight of where the field is going or how the field is advancing. That's awesome. Those are amazing resources. And for the people listening, we will write these up in the show notes. My team now has a lot of homework to make sure to find all those links. Yeah, that's fantastic. You've given so much advice. Any last nuggets last advice you might give to a listener listening to this? Don't despair, I think there's a lot of imposter syndrome that manifests at so many different levels. When I was working with kids and parents, I could see a lot of the parents being overwhelmed by these new technologies and how fast they're changing and also having to make hard decisions of how much do they expose their kids to them so they're not left behind, but at the same time, they protect their kids. privacy and also make sure that they still have time to be kids, right? if you feel that way, you're not alone. what's unique in this new field is that you can learn together with your child and it's totally okay if your child knows more than you do about snap filters or like how to hack like a voice assistant and kind of be okay to be. a teammate or learn together with your child, but still provide that mentoring and guidance in terms of, let's double check on this question, or maybe let's not delegate this type of task to the application, or maybe it's not a good idea to get this toy right now, right? similarly in the workplace, whatever your background is or... when you hear conversations, I see there's two camps, either super progressive people who are like, this is awesome, I'm using it everywhere all the time, or people who are like, this is terrible, it's gonna steal my job, and I'm so afraid. the reality is that there's a spectrum. There are certain aspects of current AI technologies that are... worrisome and there are serious ethical considerations and risks that come with it. At the same time, it's not going away. figuring out how exactly can we use it, when should we use it, to which extent, and not being afraid to ask questions. You can ask questions to Bard, to ChatGPT If you haven't, definitely try to learn a bit more about this because don't think going away. And I do think that it's important to learn by doing and learn by trying. And hopefully that will demystify a little bit some of the public discourse around this technology. Excellent advice. Wow. Well, I know people will find this extremely valuable and thank you so much for your time. If people wanna follow up with you or follow your work, where should they go? Yeah, I'm easy to find on social media. I'm on Twitter also LinkedIn. So both of those places are probably like a great, great place to follow up. Awesome. Well, thank you so much. I really appreciate it. And it was lovely speaking with you. Absolutely. Oh gosh, was that fun. Did you enjoy that episode as much as I did? Well, now be sure to check out our show notes for this episode that has tons of links and resources and our guest bio, etc. Go check it out. If you're ready to dive in to personalize your AI journey, download the free Your AI Roadmap workbook at yourairoadmap .com / workbook. Well, maybe you work at a company and you're like, hey, we want to grow in data and AI and I'd love to work with you. Please schedule an intro and sync with me at Clarity AI at hireclarity .ai. We'd love to talk to you about it. My team builds custom AI solutions, digital twins, optimizations, data, fun stuff for small and medium sized businesses. Our price points start at five, six, seven, eight figures, depends on your needs, depending on your time scales, et cetera. If you liked the podcast, please support us. Can you please rate, review, subscribe, send it to your friend, DM your boss, follow wherever you get your podcasts. I certainly learned something new and I hope you did too. Next episode drops soon. Can't wait to hear another amazing expert building in AI. Talk to you soon!

Introduction and Overview 
What is Bard? (Google Gemini AI)
Data Set Sizes for Large Language Models
Applications of Bard (now Google Gemini)
Multimodal AI
Stefania's Work on Bard
Transfer Learning and Empowering Users
The Importance of AI Literacy
Ethical Considerations and Trust in AI
Prompt Engineering
What is NLP? "Natural Language" and Acronyms in AI
Forecasting Multimodal AI Applications of the Future
Stefania's Background and Career Path
Pivoting from Academia to Industry
Resources and Advice for Getting Started in AI