VP Land

Imaginario: Using AI to speed up video search and editing

Joey Daoud Season 3 Episode 6

In this episode, we talk with Imaginario AI co-founder and CEO Jose Puga on how AI is going to help us search, transform, and create videos faster and cheaper.

We cover:
‣ How Imaginario uses AI to search through videos based on context instead of tags and keywords
‣ The importance of content repurposing and customization
‣ What roles need to level up to stay ahead of AI
‣ Predictions on AI and the role of creatives in 2024

And a whole lot more!

📧 GET THE VP LAND NEWSLETTER 
Subscribe for free for the latest news and BTS insights on video creation 2-3x a week: 
https://ntm.link/vp_land


Connect with Jose @ Imaginario AI:

Imaginario AI - https://www.imaginario.ai
YouTube - https://www.youtube.com/@imaginario2030
Facebook - https://www.facebook.com/imaginarioai
Instagram - https://www.instagram.com/imaginario.ai
Twitter - https://twitter.com/Imaginario2030
LinkedIn - https://www.linkedin.com/company/imaginario-ai/
Jose @ LinkedIn - https://www.linkedin.com/in/jose-m-puga-a397922b

#############

📺 MORE VIDEOS

Final Pixel: How this virtual production studio lives up to its name
https://youtu.be/t0M0WVPv8w4

HIGH: Virtual Production on an Indie Film Budget
https://youtu.be/DdMlx3YX7h8

Fully Remote: Exploring PostHero's Blackmagic Cloud Editing Workflow
https://youtu.be/L0S9sewH61E

📝 SHOW NOTES

Midjourney
https://www.midjourney.com

Runway ML
https://runwayml.com

Pika
https://pika.art

ElevenLabs
https://ntm.link/elevenlabs

Descript
https://ntm.link/descript

Synthesia
https://www.synthesia.io

Respeecher
https://www.respeecher.com

Zapier
https://zapier.com

Apple Vision Pro
https://www.apple.com/apple-vision-pro

Magic Leap
https://www.magicleap.com

Grand Theft Auto
https://www.rockstargames.com/gta-v

Fortnite
https://www.fortnite.com

Imaginario.ai Brings Artificial Intelligence to Video Editing
https://lift.comcast.com/2022/09/22/imaginario-ai-brings-artificial-intelligence-to-video-editing

How is AI disrupting content marketing and how can this help you
https://www.linkedin.com/pulse/how-ai-disrupting-content-marketing-can-help-you-jose-m-puga


#############

⏱ CHAPTERS

00:00 Intro and Overview of Imaginario
02:48 Multimodal AI and Contextual Understanding
03:45 Data Storage and Hosting Options
04:30 Content Repurposing and Enriching Metadata
07:00 Use Cases for Raw Footage and Completed Media
08:30 Training AI Models for Specific Use Cases
10:01 The Vision for Imaginario and AI-Powered Creativity
13:05 AI Agents in Video Creation
15:13 The Impact of AI Tools on Creatives
29:19 The Future of Metaverse and AR
38:40 The Dominance of Netflix and the Importance of Social Media
40:18 AI Tools for Content Creation
42:54 2024 Projections

Jose Puga:

I'm not sure why people get scared about AI thinking about their jobs, when in actual fact you should be able to do more with less.

Joey Daoud:

Welcome to VP Land, where we explore the latest technology that is changing the way we create media, from virtual production, to AI, to everything in between. I am Joey Dowd, your host. You just heard from Jose Puga, CEO of Imaginario AI. Imaginario is an AI powered platform that helps creators, content marketers, and studios Search, curate, and transform media at scale. In my conversation with Jose, we talk about how Imaginario is using AI to analyze and search through videos based on context and imagery, not just tags and keywords.

Jose Puga:

So we're trying to essentially bring curation and transformation closer to a human level. Rather than relying on, you know, thousands of different labels and keywords to sit through all your content.

Joey Daoud:

How we'll be able to generate synthetic media in the future based on our own media libraries.

Jose Puga:

Models are going to get smaller, smarter, and more focused on specific tasks. Text to video generation is getting better and better. And if anything, it's going to become also cheaper to create.

Joey Daoud:

What the role of creatives is going to be in the future. I

Jose Puga:

see creatives as Creative Directors. At the end of the day, we're talking about storytelling.

Joey Daoud:

And a whole lot more, including what roles should start leveling up their skills to stay ahead of AI automation. Now, if you're an

Jose Puga:

instructor, I think you should start retraining and looking for other jobs, because this is the first area where AI, at least in video workflows, will, will attack. But embracing AI, I think it's a matter also of survival for many creatives, rather than an option. And if they're not doing this, I don't think they're going to do well in the future.

Joey Daoud:

Personally. Show notes for everything that we talk about is available in the YouTube description or over on our website vp land. And now enjoy my insightful conversation with Jose. Well, Jose, thanks for joining me. I really appreciate it. So yeah, I wanted to kind of jump into two things. First part, let's talk about, uh, Imaginario.ai. And then second part, let's dive deeper into just AI in general. But for the first part, can you kind of just explain high overview of what, Imaginario is and does?

Jose Puga:

Yes, so Imaginario, uh, essentially is run by video and AI experts. So in my case, I come from a media entertainment background. My co-founder is a CTO with experience in robotic perception and autonomous driving. And we're essentially bringing that technology to marketing and video creative workflows. What we do on one hand is that we have an API that allows creative and marketing teams to find anything they need in their video libraries at scale, and then repurpose them or transform them into any sort of screen size. Or if they need to enrich the understanding of their videos, they can do that as well. On the other hand, we pair that API or that backend with custom video AI agents, so then people can create their own personal agents. That's still in R&D mode, but uh, we do have an app and an API that it's live for different users to test.

Joey Daoud:

And one of the big parts is being able to upload and have your AI index footage, not just based on objects and words, but context as well?

Jose Puga:

Exactly. So what we do is that we apply what is called multimodal AI, right? So what is multimodal AI? We don't look just at dialogues, which traditionally could be keywords, right? That's the easiest sort of way to search inside a video. But we also look at visuals down to one frame per second, as well as sounds. So let's say there's an engine, car engine, or an explosion, we can pick that, uh, inside a video. And once we look at those three modalities, so dialogues, uh, visuals, and sounds, we also understand the passing of time. So we can compare shot by shot every x number of seconds, and then understand how the passing of time impacts meaning inside a video. So we're trying to essentially bring curation and transformation, uh, closer to a human level rather than relying on, you know, thousands of different labels and keywords to sift through all your content.

Joey Daoud:

And so, I mean, you mentioned it's an API and so I imagine you can plug it in with, uh, whatever your own interface or backend app, which I'm curious about more. But also, in some of your demos, I saw that there was- I'm assuming it was an Imaginario interface, like user interface. So from a user perspective, where are you storing your data or like what data options are available? And like if you just wanted to use Imaginario to index your footage and search for it, is that an option or does it have to plug into another data system where you're storing something either you built yourself or where you're storing your data somewhere else?

Jose Puga:

Yeah, so there are two options, right? So you can either, uh, host the content with us and we are using AWS for that, or you can just use us to create an index, right? Um, now this index is not, as I said before, not driven by labels or time-based metadata. It's, uh, what we call mathematical representations or mathematical objects. And then all you have to do is just query our API for us to give you the time codes. But you can host the content wherever you want. Now, today, of course, we have the full solution that you can access through the website. This sort of custom solution where we just hold index that's more for enterprise at the moment. So that's not, uh, available to SMBs and creators, uh, just yet.

Joey Daoud:

Okay. And then what can you do once you kind of find what you're looking for? Your origin story of sort of the idea, uh, resonates with me where you're like, you saw, you worked as a journalist and you saw video editors like sifting through hours and hours of interview footage to try and find stuff. I have a background video editing. I've done a lot of that sifting. Um, so it's always on my radar of like what tools, like text-to-video editing, has been a huge game changer in all of the apps. When you're sorting through the footage on Imaginario, are you able to like build out string outs, rough cuts? Like once you're gonna find what you're looking for, like what's the next step?

Jose Puga:

Yeah, so you can create individual clips for different social media platforms. So that's one use case, right, which is, uh, repurposing long form content. And not just one piece of content, but multiple pieces of content, uh, into compilations. So, high-end compilations, let's say around the topic. If you're, you know, running your own podcast, it could also be the best moments of character x and y if you are a broadcaster or a streaming service, and then you put together those compilations for YouTube, TikTok, and others, right? Uh, in some cases, you will need to resize and also, uh, insert key frames where you're essentially, it's kind of like reshooting inside the shot where you need to follow a character and then the, the, the, let's say, emotional reaction of another character. So that's one use case, but uh, also another use case in media and entertainment, uh, is the one of enriching metadata, right? So you will have thousands of hours of content and then what these media companies want to do is build, uh, recommendation algorithms or just improve, you know, their media asset management systems. And for that they need metadata, and trustworthy metadata. So then they can use our API. And the output is not necessarily a video, but it's a description of what's happening every three to five seconds, uh, inside that video. And then they can use that for so many different, uh, use cases from contextual advertising, which is pairing the right ad with the right scene, to powering, you know, uh, global storage solutions, to recommendation algorithms and more. Again, it varies depending if it's an enterprise or if it's an SMB, so small production company or if it's a creator, right? And the type of content changes as well. It varies.

Joey Daoud:

Yeah, and I mean, so on a, on a smaller, on a smaller scale, right now, currently, it's like the use case, more for like, you already have produced media, like a library of like content that's already kind of completed and you're looking to find it or repurpose it. Uh, or is it more in the stage of raw footage, uh, like a digital asset manager? Well, I guess that would kind of apply to both, but, um, raw footage, like you've got all of your just raw B-roll clips or raw interview clips, and you're still in the like editing stage put that together and find your material.

Jose Puga:

It's both. It's both. So, because as you know, in editing, first, it's about curation. So you need to re-watch content traditionally, right? You select those key moments. And then you enter proper editing, you know, editing mode, which is an adding transitions, subtitles, whatever that is. Uh, so we do both. Um, and the use cases, again, they vary depending on the company. Like we've been speaking with around 20 production companies in the last month or so. Uh, and they are all about, bringing back to live their archives because they have, you terabytes of content where they have potential b-roll content they can use and new content. Uh, and not just visual scenes, but also sound effects, right? And then with our tool, they can use all of that, not just to use in their own projects, but also to resell them on the stock footage, uh, platforms as well. So production companies are also looking for ways to monetize and, create further value from these assets. Because at the end of the day, these are assets that we're just sitting on and doing absolutely nothing with, which is a shame, like there's so much knowledge and so much, you know, artistry inside this content that it's a shame that it's not being properly exploited, right?

Joey Daoud:

Uh, and then you sort of touched on it before, but also on your explainer video, um, you mentioned like another use case was, uh, being able to train your AI models for specific industries or specific use cases. So can you expand on that a little bit more?

Jose Puga:

Yeah, so, when it comes to fine-tuning our models, which is training for specific use cases, or let's say, uh, around your own library, there are different options. Where we are today is not where we wanna be tomorrow. Where we are today is that you can send us a specific type of data. We tend to focus more on high quality data, like textbook, uh, level data, right? So we will give certain parameters about the type of data that we need, which again, it's not complex, like you will be talking about, clip and text pairings or image and text pairings. When I say text pairings, I mean a textual description of what's happening in an image, right? Like let's say you are after, you know, Messi and you, you just wanna follow Messi and everything Messi is doing inside your archive. Then we just need photos of Messi, you know, and just a description of what he's doing in that image. And with a, a few, thousand of those images we can train and fine-tune models. So in that case, we can do this at a tenth of the cost, uh, without sacrificing quality, uh, and again, without the need for data notation, complex model training and the traditional sort of, uh, flows that you need in place if you want to build this in-house. On the other hand, all our clients from Universal Pictures to Warner Brothers, Syniverse, production companies, none of them have asked for fine-tuning. So that tells you that our baseline model is pretty good. Like they're very happy with it, which is great, right? Now in the future, uh, talking about where we wanna head towards is that we want self-training. So for people to self-serve themselves where they just ingest, you know, their entire archive, our AI models will be able to select the best data and then from there, train the models, even create synthetic data with your own permissions, so then you can further train and fine-tune the models. That's what what I mean by custom AI agents. Custom in the sense that they will be personalized, understand your editing style, your, uh, curation, style, or taste. And we wanna get to that point where editing becomes less and less relevant, at least for these sort of short form use cases and for understanding and reasoning. For long-form content, like feature films, of course, you're still gonna have editors and very, you know, high-end creatives, right? But we think that for regular short form content, this is not gonna be needed in the near future.

Joey Daoud:

Yeah. And you're sort of teasing what I wanted to ask next of like, where, what are the next sort of like stages for Imaginario? I think it was a Comcast article you said that you see your company as an AI-powered Canva. I think you kind of laid the ground work, but like what's sort of like next beyond sorting through data, what's sort of like the next vision that you have?

Jose Puga:

What we are quite focused on today is to build, uh, AI agents, because again, we believe that the future is going to be the way humans, uh, interact with software and specifically point-and-click graphical user interfaces is going to change. Software today, as we know it, is dumb in the sense that it requires your input, um, and therefore it's reactive, right? It's not proactive. It's not going to come and tell you, Hey, I need this from you to give you the best recommendations, or you just turn on your computer and immediately your computer knows who you are and what you like and how your day looks like. So that's what we mean, that we're moving from a reactive kind of like user interfaces to proactive user interfaces. And we believe that agents are gonna make this possible by essentially, or multi-agents, uh, that you will be able to customize, almost like, or like having a colleague in your team, right? Especially if you're a business owner, they will know your data, your company data, your personal data, and you'll be able to custom those agents. So we are, we have built in-house, one of those agents and, uh, but of course it's still a bit of hit and miss depending on the use case. So that's why we haven't released it to the world yet. What we want to do is then not just have a chat based interface, right, like ChatGPT. We believe that's one part of the user experience. But we also think that you should be able to zoom in or zoom out from a point-and-click user interface. So we believe that that way that humans will interact with computers will be based both on conversational, uh, design, uh, experiences, right, and visual interactions. That's what we wanna build today. And then of course, on the training side, we are pretty much focused on, uh, tackling specific use cases and training for those specific use cases, rather than, you know, trying to build a large language model with 1 trillion parameters, which we think that's a thing today. But in the future, it's not going to be about that. Even in the near future. We believe in small language models, uh, multimodality self-training, and open source overall. But again, that's our belief. I know many people think otherwise. That was a very long answer.

Joey Daoud:

No, that's great. Uh, for the agent part, can you connect that or explain a little bit more of, uh, because when I think of agents, I'm thinking of like chat agents or your, like, just communicating like chat interface. Can you connect that or bring that to me of like where that fits in with the video library or in media creation, video creation?

Jose Puga:

Yeah. So if I give you an example in a video repurposing, right? Like, uh, you have a podcast, or let's say you have 200 hours or 200 episodes, one hour episodes. All of that content is sitting on your Google Drive. So first you need to define the input. Then you pull the content from there. Then you have a few tasks, right? Because let's say the goal is to push this to TikTok, adapt them to TikTok. The first task might be look at top trending topics on content marketing. Let's say you're in the content marketing space. Then based on those topics, uh, find those specific moments. Um, then you need to track speakers, and resize the content. So there are a few tasks in the middle, right? When you say find trending content, you mean like go to the actual web or like TikTok.

Joey Daoud:

And see what topics are trending and what you have in the library to match that?

Jose Puga:

Exactly. So a key part of all of this is not just chaining together, or linking together different tasks and input and output platforms, right, but also layering external analysis. You need to understand what's happening inside different, you know, platforms. We will be partnering with other third-party analytic companies. And then you have your own, insights, right? So from your own YouTube channel or TikTok channels. And then the idea is that the AI learns from those sources and then can make the best decisions on how does that workflow should look like, right? Not just where to pull it and what to do, but also how to make that editorial decision that it's essentially it's like a digital twin like it's you cloned but just with augmented capabilities, right? And we wanna head towards that future, um, where essentially, you don't need to go to one tool, let's say Premiere to edit. You need to go to Google Drive to get your content. You need to go to, you know, Riverside. I love Riverside, but Riverside to record your content. But you just talk with one or two agents and they just chain all the different apps and they know you, uh, exactly, right? Like they know exactly what sort of output you need.

Joey Daoud:

Yeah, I think that's a big gap now, too.'cause like focusing on this specific example of short clips and, I mean, yeah, we put short clips up of the podcast. These will be short clips. but there's are tons of tools out, you know, where it's like, oh, give us your YouTube link or your podcast episode and we'll like create the short clips for you. They take a clip and they kind of do a decent job of condensing it. The soundbite they pick, maybe it's good, maybe it's not. But the biggest issue is they don't make those editorial decisions that we do, and we do it by hand where it's like, take this soundbite, move it ahead of the other soundbite because the soundbite is a better hook. It sounds more interesting, and we like reshuffle things out of order, like editing what editing is. And that's been a kind of a big gap in lacking in AI tools. And it sounds like this is that next step in like making those small scale editorial decisions for a short clip, for something 30 to 120 seconds. You said you defined shorts under two minutes.

Jose Puga:

I would say under five minutes shorts. But normally it's under two. Yeah, under two minutes. Uh, a, that's It's just that there's this gap between two minutes and 15 minutes that nobody knows how to define, right? But yeah. say like mid four.

Joey Daoud:

Whenever, whenever I hit the limit on, uh, YouTube or TikTok, except TikTok keeps expanding what they- how long you can upload, uh, everything else. Instagram Reels and, uh, YouTube Shorts is about 60 to 90 seconds, so-

Jose Puga:

And, Joey, based on what we just said, like we, from day one, we believed in human curation. So a human in the loop. I come from the media industry, and I do have high standards when it comes to curation. So, we don't think that AI can just- it can do a decent job with conversational content. There are a few platforms out there. You, you just type a link and then you get TikToks. And we're gonna be doing that, for some customers because they don't wanna spend, you know, more than five minutes doing this. And we understand that. Um, however, we are really catering more to production companies, marketing agencies, uh, marketing and creative teams inside large and mid-size media companies. So where the, the editorial aspect is, uh, it's very important, right? You, you need human curation. As the AI learns more from you, and of course with all the, all the terms and conditions, very clear about privacy and data, then that AI can do a great job for you and just help you do your job better, right?

Joey Daoud:

This is a platform where you're getting, uh, data, but you're getting like a specific data training set of like someone's media that they upload and assuming, you know, they give their permissions. You mentioned about, uh, synthesizing videos in the future. What does that look like?

Jose Puga:

Yeah, that's a, that's a great question. So when you train, uh, these models, this regards if it's a language model or a visual model, or you're in multimodality- in a multimodal space, um, there are different sources for this, right? Like if you look at OpenAI, it's a mix of data that they've commissioned, then open source data sets, then you have some synthetic data as well. So it's a mix of like, you need to curate the data, of course. And we strongly believe in curating data, before training. It's not just a matter of just pushing all the data you find online and just hoping for the best, right? Because that's when the processing costs go up and you need hundreds of millions of dollars in funding. So, um, anyhow, so,

Joey Daoud:

A lot of NVIDIA chips.

Jose Puga:

So yeah- so yeah, NVIDIA chips. I believe we have more than 90 something of those, right, the A100s. But anyhow, this is not me being, you know, jealous about other companies, but we just think that the future, it's not, it's not about, uh, having to spend hundreds of millions of dollars in, in, you know, in training models. Uh, we will believe academia will come back in 2024. We believe that open source will get stronger. But anyhow, going back to your question about synthetic media. Text-to-video generation is getting better and better. And if anything, it's gonna become also cheaper to create. So if you can mimic or recreate customer's data to then train your models that at some point, will become cheaper than licensing that data set from others, right? And that's where- we wanna head towards that. Now, in some cases, you'll still need to license that data or commission it, depending on the use case. But if we can take advantage of synthetic data, by all means, we will do it. Now, a key challenge here is also, uh, bias inside data, right? So any personal data has bias. We are human beings after all. So it's also about having the safeguards to, balance out the different data sets and, and just trying to reduce bias depending on the use case as well.

Joey Daoud:

And now are we talking about- uh, you did a shot at a a client's office, you forgot to get a wide shot at synthesizing that type of shot or are we talking about you need them, you need a pickup line of a sit down interview saying something. and you're gonna generate that pickup line of the sit down interview. Where are we talking about sort of in this range of like what kind of media we're gonna create?

Jose Puga:

Yeah, so, so it depends on the type of content. So for instance, on one extreme, you have conversational content, right? So these are podcasts, uh, interviews and, and, uh, webinars and content marketing, kind of like B2B content marketing, uh, videos, right? And then on the other extreme, you have highly sophisticated content that it's, more visually sophisticated like feature films, TV shows, and others. And in the middle you might have some corporate videos or event videos or a mix of- or documentaries, let's say. So it depends on the type of content. But what's most valuable for us are the inputs. So the start time, right? And then the end time of that clip, for example. So this is some type of data that we need, and then descriptions of what's happening inside a video. That's ideally what you want. And then finally, of course, dialogues or timestamped scripts. So this is the sort of data that, that we need to train our models. Some of that data is sitting with media companies, but we just use like, uh, user's data for- in terms of exports, right? So you use the platform at the end of the day that, at the end of the workflow, you are downloading that or pushing it to TikTok, and that has already been curated. So somebody already probably adjusted a bit at the beginning, at the end, added subtitles. And then with that, you can then use that for training as well, and actively learn from users. cause that's the tricky part is, is like what's a well-rounded idea in conversational content and what's a great scene with sophisticated content, right? And you need to learn from curators to do that, or hooks, for example, you were talking about hooks at the beginning of the video, is like, what's a great hook that can summarize what's happening inside that snippet? but that would also generate views on TikTok and other platforms, right? So that's sort of data that you need to overlay between, um, trends and insights, as well as personal, like subjective curation of content, right?

Joey - Shure Mic:

All right, real quick, just want to jump in here. If you are enjoying this content, then you will like the VP Land Newsletter. It's not just a podcast, we have a newsletter. It goes out twice a week. It covers all sorts of things in the virtual production industry and the latest AI tools that are affecting how we make videos and media, and a whole bunch of stuff in between. So you'll highly enjoy that. You can get it for free over at vp-land.com, or the links are going to be in wherever you're listening or watching to this in the show notes. All right, and now back to my interview with Jose.

Joey Daoud:

Widening out a bit you know, with, not just Imaginario, but just the other rise of AI tools and everything that's happening right now. where do you see the role of creatives in the future?

Jose Puga:

I see creatives as creative directors. Um, essentially at the end of the day, we're talking about storytelling, right? So for me, at the core of everything that we do in, in media and in content in general, there needs to be storytelling. Ideally, educational value as well, depending on the type of content. For instance, for content marketing, educational content is super important, that sort of educational values. High-end content is, is very important. Yeah, and I think creatives will be able to orchestrate different workflows and be able to create Hollywood great films with much more- okay. Not one person probably, but a small team, a small production company will be able to create quite sophisticated content in record time. I, We believe that the problem might become distribution, though. So if anything, uh, video production is going to increase exponentially and it's gonna get more sophisticated, more creators are gonna be able to create, go beyond conversational content and be able to create fantastic long form content even right? Uh, which is quite exciting because it's like, if you do a, an analogy, it's a bit like television in the 1930s, 1940s, where most of the content was, uh, studio content, talking heads, like us now, right? But ideally, in the near future, you wanna do something that's closer to a news report or a high-end documentary. Uh, and that really adds value to people, where people can actually learn something, you know, from that content or be entertained as well, right? So I don't think the role of editors will go away. Now, if you're an editor, and I've said this before, that is used to do like cookie cutting editing, I think you should start retraining and looking for other jobs or just become a more high-end sophisticated editor because this is the first area where AI, at least in video workflows will, will attack, right? So it's search curation, reediting, recutting for short form.

Joey Daoud:

I mean editor aside or let's say, very basic editor aside where you're just repurposing type clips, but not heavy, you know, feature or, long form story editing, what other roles do you think, will, like after the editing, like get affected or shift in how we do things?

Jose Puga:

Like managers or creative directors, essentially. So you need to define- have a quite a, a good understanding of pre-production, production and post-production, and be able to define those roles and agents again, that you want to work in tandem with your team, right? You need to be knowledgeable about AI. So if anything, if you don't, you are pushing back because you believe in artistry. Great. Like there's still space for high-end content. But embracing AI, I think it's a matter also of survival for many creatives rather than an option. And if they're not doing this, I don't think they're gonna do well in the future personally. However, I believe that storytelling curation and curation overall And human ingenuity and creativity is still, it's still here, right? And if anything, AI learned from us, we can also learn from them, uh, from, from these agents and these sort of models just work in tandem to be more productive, create more content and better content, higher quality content for a lower cost. I don't see anything wrong with that. I'm not sure why people get scared about AI, taking away their jobs when, in actual fact, you should be able to do more with less, right?

Joey Daoud:

Yeah, definitely more with less. I feel like the importance of story is only gonna shine more, uh, especially with, you didn't say this, but my interpretation is we're just, we're gonna have more noise, you know? And it's gonna be a little bit tougher for everyone to find the signal of what's good. I feel this sort of reminds me of just when, you know, iPhones started shooting video, or even before that when like, you know, digital cameras and VHS tapes, and it became easier for everyone to make movies. You know, you didn't have to have shoot 16 millimeter, you didn't have to like pay for processing fees. You're able to do it a lot lower costs. More people made movies. There are a lot of crappy, mediocre movies, but then it also gave rise to just a lot of like good stuff coming out. And I think it's just gonna be another, the next step in that where easier to make stuff, there's gonna be a lot more stuff being made. A lot of it's just gonna be garbage, but someone who might not have had the resources before tell a great story will have it, but ultimately it all always comes back to good storytelling.

Jose Puga:

Exactly. And, and I think that competition is good, right? Like at the end of the day, consumers can be other companies learning, you know, from other companies that are creating educational content. If anything this is gonna help, in terms of quality, it's gonna just push the quality up. It's gonna increase quality of content, there's gonna be more competition. Um, and I think at the end of the day, the winners are gonna be on one hand, consumers and on the other hand, distribution platforms. So if you look at high-end, uh, you know, streaming services, uh, like Netflix and, a bunch of others, there are like three or four of them, I think they will be the big winners in this, in this race, at least in the, in the high-end, you know, media space. And when it comes to content marketing, honestly, I, I'm excited because that means that people will not necessarily need to go to university. Uh, they can take a YouTube, you know, course or just learn from other companies like HubSpot, for example. They've done a, a great job when it comes to educational content, right? We believe that creators, uh, or small companies, from two to 10 employees to they need to become more active. And this is just that as a business owner, and I can tell you, Joey, probably feel the same, you just don't have the time to, to, you know, to create really high quality content and do this at scale, uh, to be top of mind and, uh, and increase brand awareness. Uh, and I think if anything, if, if these tools become more accessible, then great, uh, whoever's more creative and engaging will win. And I don't see anything wrong with that personally. I think it's better for everyone.

Joey Daoud:

Do you think there's gonna be a lot of just consolidation of the tools as well, or like that some the platforms that sort of where you could just be that one are going to be the more dominant players? Uh, because like right now it's like, uh, if you wanna try to make some sort of AI film in quotes, it's like generate images in Midjourney, then go to Runway or Pika and animate the images, and then go generate your voice in ElevenLabs or some other platform. And it's just like a bunch of tools that you have to stitch together right now. not like a central platform yet. Do you think it's gonna be the future where it's just like where wherever this can integrate the tools more is gonna be the dominant player, which could be an existing player, like an Adobe or something or, or just something new that we, we don't even know yet?

Jose Puga:

I think the problem with Adobe is that part of our business model, just to talk about Adobe for a second, uh, it's a fantastic company and they have incredible AI, but they rely on people licensing individual tools, right? And as I was saying initially, I think that relying on different tools as part of your workflow, in that sense there will be consolidation. So, this multi-agent, or even, one single custom agent will be able to chain together different tools. And even not rely on some of those tools, just with the task themselves, Now consolidation from a tool perspective, I'm not sure. Like if you look at Zapier, for example, their entire business model is just integrating input, tasks and output, right? So it might be a sense that a case that you just need a, a good integrator that can understand goals and just be able to customize those goals then the AI will figure it out where to pull those models from and what to do, right? We don't think there's one player that has the right to win in this space. Uh, I think it's quite early as well to tell this. But we, we also think that if anything, processing costs are gonna go down. There's Moore's law, right? And that's gonna keep happening. Uh, models are gonna get smaller and smarter, be able to be highly personalized. And the user interface is gonna be all about integrations, right? So that's what we believe in and that's the sort of future that we're building with Imaginario. We're not there yet, of course, we're just starting, but yeah.

Joey Daoud:

I wanna ask you too, um, you'd written, uh, metaverse was a hot topic a couple years ago, and then of died down a little bit definitely in 2023. Where do you see it going? Especially we've got, uh, the Apple Vision Pro coming out next year. Spatial video is now a thing. Augmented reality metaverse might be in a little dip and then coming back up. So what, what are your thoughts, the future of just Metaverse, AR? You've written a bit about this. If, uh, you where you still see this going five, 10 years from now.

Jose Puga:

So, my views a bit controversial and probably some of my investors that love, AR and VR might not agree with me. Uh, that's one that I have in mind though. I do believe that XR is the future and will happen, and it will be everywhere, right? Now, the, the main constraints though, and I'm talking from an experiential point of view, right? So being able to experience like immersive or augmented,

Joey Daoud:

Like from a user perspective?

Jose Puga:

Yeah, from user's perspective. Now many things need to happen. Um, and I, did study, this was my final thesis with my MBA, so I, I did a lot of work here. I even predicted that Magic Leap will pivot to B2B, by the-

Joey Daoud:

There you go.

Jose Puga:

Way before that happened. But anyhow, my view on this is that, many things need to change. First is infrastructure, right? So we need high end, high speed broadband everywhere to take this technology outside home. That hasn't happened and there are many things that need to happen. Not even 5G has been fully adopted. And here, the bottlenecks are, you know, telecoms and other infrastructure players, probably with Starlink and other, you know, solutions. This might change in the future, but we need to solve this from an infrastructure point of view. Then the second hurdle, and everybody knows about this, is that head-mounted displays are clunky, are too big, are uncomfortable. Great if you're into gaming, but if this is not something you can wear, like, you know, a pair of frames. That needs to change. They need to get smaller, cheaper, and be able to connect with your phone, probably augment your laptop or your phone, uh, probably as a peripheral, I would say. There are some efforts here, and there are some companies that are pushing the boundaries when it comes to the form factor, right, and adoption. But also, I don't think we're quite there yet. Uh, there's still latency as well. People don't talk much about this, but they should talk about it how socially acceptable is to wear- let's say this is one of those devices to wear these sort of devices in public, right? So if you are-

Joey Daoud:

Yeah. I think that was the biggest issue with Google Glass.

Jose Puga:

Yeah, well the glass, glass holes, you remember the term glass holes, right? So those people that wear were Google Glass while, while commuting to work. So that is another problem that it's also like if you have someone staring at you with, with, you know, some smart glasses. You don't know what they're looking at, like they might might be doing something dodgy or trying to look through your personal profile. So there are a lot of privacy issues like social acceptance as well, that, and cultural matters as well. Every culture is different around privacy and, and being able to- people to access that sort of personal information, uh, on the go, right? So I think that's another, another factor. So there are a few things still that need to happen before there's wider adoption of this. Now, I think there's still growing interest and I still think, uh, in two particular areas. So that's future of work. That's one. So where you can augment your screens, for example, for productivity, right? Um, remote collaboration as well. So virtual offices, that's another area where I'm still trying to see the winner of COVID-19 when it comes to future work. But distributed teams and, and, and virtual teams, it's here to stay. Even if some large corporates want to tell us, no, you have to go back to the office. You look at the SMB space, especially startups, there are many distributed teams, right? And it works. We are one and it works. Those are two areas. And I would say also in corporate training. So, for instance, in manufacturing or, or where you need some kind of special training where it's too costly to bring people to a specific location, that's where we believe VR, especially VR or even AR, you know, it could be quite, uh, useful. But is it, is it ready for primetime? Is it gonna disrupt the, like the smartphone? No, I don't think so. I don't think that's gonna happen in the next five to 10 years. Uh, it's gonna take longer, but again, it's my, my own point of view. I'm not sure what, what do you think about this, Joey?

Joey Daoud:

I mean, do you think the idea like- Meta's vision was like, uh, it's gonna be a metaverse, there's gonna be like a fully immersive 3D world like, uh, Ready Player One oasis kind of thing. And Apple, I mean, it's not out yet, but from their demo video and their pitch, definitely did not mention Metaverse once. they didn't, I I don't even think they mentioned virtual reality once. Uh, and it was, their pitch was very much augmented reality. This is like in a blend with your real world. They have a huge focus on the eyes, so you're not separated. So, I mean, do you think it's the metaverse of, like the virtual world is maybe way further off if that even ever happens and the more immediate use is just gonna be some sort of augmented screen, like just a way to have more screens, possibly have some sort of virtual meeting in the 3D space but sort of still grounded in our real world?

Jose Puga:

I think it's going to start like that, the latter. So essentially focus on productivity, collaboration, 3D immersive environments that help there training education as well, for example. However, at some point we will move towards an immersive metaverse, but I'm, I'm even doubting if, if this will be enough. Like we probably need to do it without glasses, right? Like it needs to be either 3D projections or something that it's immersed within smart cities or, or infrastructure. But for that, you need government intervention. You need, you know, the government to spend money. This be a mix of, you know, governments and private capital pushing cities to the next stage. Uh, places like Dubai for example, you know, there's quite a strong focus on, on smart cities there. Will this work in Los Angeles? I'm not sure, right? So I think there's a lot of things that need to happen. But we will eventually get there. I'm a hardcore believer that the Metaverse 3D, immersive Metaverse will happen. It's just calculating the timing. You need ecosystem alignment. Uh, it doesn't depend just on one company. And you need people to adopt it. And for that, it's not just about technology, but it's also about their cultural background, their own habits. Is it uncomfortable or not about latency user experience? And we tend to forget about that. Like, people that love technology, they focus too much on the latest trends and the latest technology, but they don't focus on the form factor on adoption. And, and that's the most important thing. And I, for instance, Steve Jobs completely got this right, like with the, um, with the iPhone and before that, with the uh, what was it called again? The, uh, the, not the the iPod. Yeah. So there were attempts. Steve Jobs didn't build the first, uh, iPod, right? Or the first, uh, sorry, the first MP3 player.

Joey Daoud:

MP3 player. Yeah.

Jose Puga:

Yeah. Um, he had to align, music labels with a DRM system, with, uh, an app store, with the right user experience. Everything needed to be aligned. Uh, full, integrated vertical stack that can work, right? Um, that can deliver a flawless experience. So it's not just one factor. I think Apple is probably the only one that can get away with bringing to this world an immersive, you know, metaverse, Now, do we live in the Metaverse already? Yes, but it's not immersive. It's optional and it's 2D, right? So right now, for instance, this is a virtual representation of two people talking. We're in different parts of the world. There are gaming environments as well, like Fortnite, and other, uh, other sort of gated, uh, you know, communities, especially in gaming. That's the metaverse as well. Metaverse is not just 3D immersive and adding a blockchain layer to that, right?

Joey Daoud:

And then tying this back to media and video, um, I mean, do, how do you see this, uh, potentially changing the ways or types of media that we create? I mean, so we've got the spatial video, the iPhone 15 Pro can shoot spatial video. We can't really do anything with that currently, unless you have the Vision Pro, which is not out yet. Or I've seen some people hack Meta 3 to watch it. But yeah, some journalists said that they did it and it like brought them to tears, seeing some of their memories like in this 3D environment. And also just to tie in every buzzword, photogrammetry and being able to scene, scan a room, and then possibly, you know, in the future you can like scan your childhood room and 50 years from now, you can walk around in it like you're there in the future. How do you think this all ties back into just potentially changing the way that movies are made and what kind of movies we are making?

Jose Puga:

Well, that's a, that's a huge question, Joey, but, uh, because it's, it's gonna- probably like storytelling is not gonna be linear, but there will be multiple, different types of, uh, storylines within a single film, right? And it's all about influencing the audience towards a specific sort of ending or specific sort of outcome. So I think filmmaking is gonna look more and more like gaming if you ask me and, and not necessarily open world gaming but gaming overall. So you need to achieve certain levels, uh, talk with people and look, you know, treasures or whatever that is, right? I think there will be at some point gaming comp, like, you know, the gaming sort of a, way of work, impacting, uh, Hollywood and vice versa, which is already happening in gaming, right? Like if you look at Grand Theft Auto and these franchises that cost more than Hollywood films, I think gaming is borrowing more from Hollywood than Hollywood from gaming, and it should be going both ways. Because the natural progression of 2D media is 3D and we will get there. It's just that it's not here yet, right? Now when it comes to search curation and what we do, we have this as part of our roadmap. We've already been doing tests with 3D asset search. And the beauty of our technology is that, I'm not trying to sell the Imaginario again, but it does work with 3D assets and, photogrammetry as well and, and whatnot. But of course, you need to add other layers and, and modalities to this, right? So it's not like 2D media that's a lower hanging fruit.

Joey Daoud:

Yeah. I mean, I, I see this all converging and blending as well, and that's sort of one of the theories of just this VP Land, this whole idea where it's like, it started on virtual production, but it's like these are just all overlapping. And even in a conversation in another episode with, uh, Final Pixel, it was like with virtual production, it's like, yes, we have to build these 3D environments to film in our scene, but it's like we built these 3D environments, we can easily repurpose this to make it an immersive experience where it's like, Hey, you want to like walk around the world that we filmed this movie in'cause we had to build this world for this movie. We could turn it into a video game or we could turn it into something else where you could walk around. It's just like all of these elements are just overlapping and, and, and converging in all facets of what used to be kind of separate media as just sort of like evolving into one big thing.

Jose Puga:

Yeah. and the beauty of AI is that it can help you regenerate. Well, it's repurposing, but it's also regenerating new worlds, new storylines, new characters. Plots and twists that you didn't even thought about and that you can still guide and customize, right, um, as a creative company, as a creative agency or production company. So again, I think it's an exciting future and, and also for, again, for the likes of Netflix and others, which are the ones dominating the distribution, right? Uh, because distribution is, has traditionally been the bottleneck in media entertainment, at least where production is fragmented, right? And the closer you get to the consumer, the more consolidation there is. So unfortunately, uh, unless production companies, make use of, uh, social media better and more proactively, and there are tools that can help you surface your content, again, I think Netflix and the likes are gonna win you know, in the long run. And I don't think it's a coincidence also that Netflix is experimenting with, uh, gaming, for example, right? So offering games and-

Joey Daoud:

No. Yeah, not at all. Yeah. Yeah. Big push into gaming. All right, last one. What AI tools or updates lately, and I should preface this is December 2023.'cause this changes so quickly. So if

Jose Puga:

I know. It's Crazy.

Joey Daoud:

what, uh, what's been on your radar? What's kind of been interesting, uh, for you, uh, recently?

Jose Puga:

When it comes to tools or you're talking about my 2024, uh, projections?

Joey Daoud:

Oh, that's a good one. Let's talk about tools and then let's end on, uh, your 2024 projections.

Jose Puga:

Yeah. So I think, uh, when it comes to tools, on one hand, I think what Runway ML is doing is pretty incredible. Like, the text to video generation tools that they have, I think they've improved the user interface, like much better user experience. It's, it's much better, easier to use. I think stock footage companies should be worried. I, I think there is like some kind of collaboration between, I, I'm not sure if it's Shutterstock and, uh-

Joey Daoud:

Shutterstock. Yeah.

Jose Puga:

And Runway ML, but I can completely see Runway ML taking over that business. So, that's one to watch. Now the costs will need to come down, right? This is VC funded, but, uh, I'm pretty sure probably they're not making money with text to video generation. But once the unit costs make sense for Runway ML, I think they can easily take over the stock footage, uh, space. I know they have much larger ambitions, but that's something you can use today. Midjourney. Uh, it's getting incredible. I'm preaching to the choir, probably like the, when it comes to up-res-ing content, to 4K or even beyond that, it's becoming better and much more powerful. And if you chain together the likes of Midjourney, uh, as well as uh, Runway Ml, you can create really high quality stock footage that we've done it ourselves actually for our pitch deck. Uh, funny enough, we were in LA recently and we used, some of these, uh, images and videos. Then I would say tools like Descript. Uh, and there are a few of others that are just focused on conversational content. They're already commoditized. For us, it's all about going beyond conversational content, language-based media, right? Um, Podcasts are hugely important, but, but content will get more sophisticated. So I, I do think that that's pretty important. And then what else? Well, synthetic voices and avatars. from Synthesia to ElevenLabs, uh, or Respeecher in the, synthetic voice space. Uh, if anything, the quality also is going to increase. There are some pretty amazing, uh, tools out there, like the ones I mentioned that can mimic your voice. Uh, however, I do see some bottlenecks when it comes to training. So it still asks you to train on, I think, half an hour of your voice or 20 minutes. It takes, in some cases, 24 hours to process that. So I think there's still a lot to be done on the user experience part of things and active learning. So that's on one hand on on on tools, but do you want me to talk about my projections? I can talk a lot. I'm Latin American, so, um-

Joey Daoud:

That was great. Uh, yeah. So what are your, uh, 2024 projections?

Jose Puga:

First, I think they're gonna be more and more, State-of-the-Art, foundation models in 2024. that have, less parameters, uh, so that they use, for instance, anywhere between one and 5 billion parameters. there are some models like Phi 1.5, Phi 2 released by Microsoft. Uh, and then you have companies like Mistral or Deci AI that are creating what's called small language models that can outperform today. I think Mistral's model, the latest one they've released, uh, has around 7 billion parameters. And they've been able to, deliver results that are better than GPT-3.5 with hundreds of billions of parameters. So if anything, models are gonna get smaller, smarter, uh, and more focused on specific tasks. The focus is gonna be also more on, synthetic data, as I said before. But also on being able to curate, uh, data and data that has educational value, that has content quality, that can, for instance, understand common sense reasoning, right? Most people just want common sense reasoning, that understand general knowledge including science, daily activities or theory of the mind, right? And then based on this textbook quality data, they will be able to achieve. I would say equally, uh, you know, like high results as, as many of the GPT models. Then I think that, data notation companies are in big trouble, uh, right now. So if you are in the business of, let's say, managing annotators in India or wherever you are, uh, I think that's a business that is gonna change completely with, with multimodal, models, coming out of, academia and also from these large labs. Also model quality. I think everybody's competing for quality today, like GPT-4 can do this or Mistral can do that. The reality is that users are carrying less and less, when it comes to the, um, how can I say, like the accuracy or, because open benchmarks are quite limited. And uh, I think at the end of the day in the AI space, it's gonna be all about distribution, user experience. And what people perceive as a quality experience overall or, or, or brand quality perception, right? And, uh, and yeah, that's some of, uh, kind of like some of the predictions that I have that I'm not sure if, if that, that helps a bit. But, uh, yeah, we don't believe in the large language model play where you need hundreds of, of millions of dollars to play in this space. I think it's a VC move. Is it sustainable? No, we don't believe in that. We think open source and academia are gonna catch up and accuracy is already pretty high in many areas. It's just gonna become better and cheaper.

Joey Daoud:

Yeah. Well, I really appreciate it, Jose. Uh, thanks a lot. Where could, uh, people find more about, uh, Imaginario?

Jose Puga:

So they can go to our website to Imaginario.ai. We have a free trial, uh, packages, uh, or free forever, tier as well. So if you're a production company or a marketing agency, um, not necessarily just in the repurposing space, but you have a ton of content that you wanna, bring back to life and be able to search and repurpose or use in different ways, please, you can, uh, also, you know, just sign up there or just send me an email on jose.puga@imaginario.ai. And, yeah, happy to give you a demo.

Joey Daoud:

All right. Well thanks a lot. I appreciate it.

Jose Puga:

Thank you, Joey. Have a good one.

Joey - Shure Mic:

And that is it for this episode of VP Land. I hope you enjoyed my conversation with Jose. Let me know what you thought over in YouTube in the comments section. I would love to know what you think about this episode. And be sure to subscribe over on YouTube or in whatever your favorite podcast app of choice is. And if you made it to this point in the podcast, you will probably like the newsletter that we send out twice a week. So be sure to subscribe to VP Land. You can go to vp-land.com or just click on the link in the description of wherever you are listening or watching this. Thanks again for watching. I will catch you in the next episode.

People on this episode