The VideoVerse
The VideoVerse
TVV EP 09 - Hybrid Video Conferencing Directions
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Today on the Video Verse, we have Thomas Davies, tech lead for Visionular. He will be discussing the hybrid world of video conferencing: where it started and how its use has developed over time. In addition, he'll let us in on some of the amazing changes this technology has undergone in a mere fifteen years as well as businesses slowly adapting to take full advantage of what it can offer them.
Learn more about Visionular and get more information on AV1.
Thomas: Hi, I'm Thomas Davies. I'm a tech lead for Visionular. I work in RTC and video coding specifically. I've been in the industry for many years, working in RTC applications for Cisco and leading R&D at the BBC on video coding.
Mark: Thomas, it's great to have you on the VideoVerse. I know you really are an expert in RTC and we're gonna talk today about the hybrid world of video conferencing, just really what's happened in the space. A lot has changed over the last, certainly like 15 years, now, with the proliferation of basically RTC type video conferencing apps on literally any device. The other day, I was setting up something in a car we just got, and although of course they don't put video on the dashboard console navigation system, but there was full integration with Zoom, full integration with Teams. It was really remarkable. I thought, wow, it's coming to the car, literally. So, I think it'd be awesome to just start our interview and start the conversation around, give us a good overview of how the video conferencing world, the products, the technologies, the codecs, what has changed? And, if you wanna keep it pre pandemic, post pandemic, I'll let you kind of determine the divider there, but...
(02:02 Video conferencing changed)
Thomas: So, I think what happened with the pandemic was really acceleration rather than a step change in the kind of things. So, there's a few different ways in which things have developed. So, you've... We've talked about video conferencing and people have in mind calling like this, or having an online meeting, and that, of course, was a lot of what people experienced during the pandemic often for the first time. But, what has happened really is the creation of whole new domains of video related collaboration.
So, things like hybrid events, online conferences where you are trying to create curated streams that people can participate in. And, that involves the integration of video conferencing with traditional streaming architecture, CDNs essentially, where you have to deliver video to many thousands of people, but you also have to allow them to participate at the same time. And, then there, there are kind of video augmented collaboration applications. So, you might want to do some kind of collaborative... Collaborative coding, for example, developing an application or collaboratively editing a document. So, there has been a proliferation of companies and applications where you edit documents online, you share your screen in creative ways, and so on.
So, I think there's been a specialization and a fragmentation of the solution space. I think that is one thing that has happened. There's been this convergence between streaming and video conferencing to cope with very large scale. And, there's been the development of the infrastructure to support the rollout of these services to very large numbers of people and support either very large numbers of conferences or, and calls or simply very large numbers of participants in a call. So, I think it's been all those things together. And, there's been, within those spaces, there's been a growth in the number and kind of support tools that you have. So, we're not just talking about video, but video plus lots of helpful tools to help you with your working life in particular.
Mark: That's right. It's... It strikes me as now transcription is almost a required feature in a lot of these, these solutions and tools, so that you can produce a transcript for meeting notes or whatever. And, I mean, that's just scratching the surface. There's so many other apps I guess you could call... Call them as really what they are, that are getting integrated into these platforms that really do make our work life better. They are productivity tools. Maybe, we don't all use them, but it's interesting.
Thomas: Yeah, there's certainly a large toolkit increasingly driven by AI techniques.
Mark: That's right. Yeah.
Thomas: So, real time translation is... Has been a hugely successful recent innovation. Some teams, especially in international companies, are relying on that all the time for... To allow their teams to work together. And, then coupled with transcription, which... It's increasingly not just about writing down the text of what people said, but identifying subheadings, action points, agenda items, so that you can go back and find who it was who agreed to do X.
It's not just hidden in the text. It's identified. So, these are the kind of features that you want that actually enable you to... To really have a more successful meeting.
It's obviously not a substitute for running your meeting well and in a clear and intelligent manner, but certainly it's very useful that you don't need to minute these meetings in the way that you might have wanted to minute an important meeting in the past.
Mark: That's right. Yeah. So, I'm super curious. You've been working in this space for a long time, RTC, and can you tell us how the expectations have changed, technically, like even in terms of meeting size that a platform needs to support? So, I don't know, 10 years ago or 15, 12 years ago, whatever, was a big meeting, like 10 participants, 12, 20, 40, or like what size did, when you were designing, and you were engineering products, and you're optimizing these solutions, what were your targets then? And, then what are the targets now? 'Cause, I think they're quite different.
Thomas: So, I think in the early days, things were more driven by transcoding. So, in order to provide meetings at good meeting quality, there were MCUs that did transcoded video conferences and they had a certain capacity of 50 ports, or maybe they got bigger, 100 or 200 ports. Now, they could support larger meetings by cascading to additional service or additional infrastructure.
But, you are talking about meetings being dozens of people rather than hundreds or thousands. So, to reach that number of people, you either need a switched architecture or you need some kind of streaming architecture or some mixture of both. So, you can have various architectures, you can have your... Your MCU can have a streaming component or your switch server can have a streaming component. It's really about how you... How you want to manage distribution of those streams and what kind of level of interactivity you want. But, it's always been a goal for lots of customers that they want to be able to have their all hands over video and...
Typically, in the past, that required a different product. So, Cisco WebEx would do that, but you needed to use a sort of more TV like part of the product, rather than just using a conventional meeting to have a very large meeting with a large number of participants. So, increasingly things are getting hybrid in those kind of solutions.
Mark: Yeah, yeah, absolutely. Remind me again, MCU, what does that stand for?
(09:46 Multi-point control)
Thomas: It stands for multi-point control unit. So, it was the term given to, usually given to a transcoding solution. On the switching side, you often see something like SFU, selective forwarding units. These are the sort of acronyms that tend to be banded about in the space.
Mark: Got it. Got it. Okay. Well, it's interesting, because I think about... MCU is, I don't know if Cisco originated that, but at least that's where I've seen the term used, and I'm sure other vendors are using the same terminology, and then along comes Zoom, and it's all OTT, internet based. Is there still a place for this, I don't know, correct me if I'm wrong, I think of as sort of the enterprise, on premise, heavy duty type, type conferencing solutions, or is everything really already gone to the internet? And, you have Microsoft with teams, and, obviously, Zoom, and of course WebEx is also, I believe, fully on the internet. But, is... What does that look like architecturally?
Thomas: So, that's... So, that's not quite so, actually. So, you, with WebEx, I believe you could deploy it locally and there... There were certainly many internal debates about what the best architecture to support enterprise customers was. There is always going to be a critical issue about the amount of bandwidths that you will consume if you use cloud infrastructure or entirely for your internal video communications if you're using video very heavily. So... So, you can deploy conferencing products from... From most vendors in an internal way if you want and save yourself those, those bandwidth costs. And... And, what tends to be the pain point for companies is having to manage that infrastructure themselves.
And, I think that was the issue in the old world was having to deploy physical boxes to people's premises. Now, that, that required specific management. Now, that moved pretty quickly to deploying generic virtual machines to people's... To people's premises for them to do their conferencing. But, I think there will always be a need for companies to own their own video and own their video infrastructure, but the majority of small companies, in particular, in medium sized enterprises, I think will depend on the cloud world, and on generally provided video, video infrastructure.
Mark: Yeah, that's interesting. Now, we did another episode, and I'm not sure which one's gonna come out first, so maybe you've heard it or maybe it's coming next, but where we're... Where we focused on the video coding aspects of RTC. So, I'm curious, one of the things I don't believe we talked about in that interview was, are there any differences between what you can do in the area of video coding and whether there's maybe more tools that you can utilize or if you have a little more even CPU horsepower available, are you able to approach things differently in sort of the MCU world versus the cloud? Or, is it really just a matter of where the technology is being deployed, where the software--
Thomas: I think it's pretty similar really. So, when you talk about cloud deployments, you are really talking about the server that people talk to.
So, whether it's SFU versus MCU or cloud versus on premise, it's... It's possible that if you had a traditional on premise transcoding product, you could... You could change quality levels and capacity if you want. You could do fewer ports at higher quality. I think almost no one did that ever. They really wanted the maximum capacity, 'cause this stuff was really expensive, right?
So, you just wanted to get the most, most through that you could. The main thing would be in the transcoding case that you really have an issue with scale, which meant that some very clever and very specialized transcoding products got built that really do a very, very good job of trading off complexity versus speed to get reasonable scale.
But, I think what's happening now more is that you get these hybrid switched transcoded architectures. So, the reason that you still want transcoding is because it breaks that end to end communication between one client and all the other clients. So, in a switched architecture, if one receiver is having lots of difficulty receiving, it's going to keep on requesting the stream is restarted with iframes.
Now, if you can decouple that bad receiver by transcoding for them, then it's only getting iframes from the server. It's not getting them from the originator of the stream. So, those iframes are not going to every single participant in the conference. And, that could be a big deal if the conference is really large.
So, for people on mobile, for example, they might end up always transcoded, in that kind of hybrid architecture, because they're... The area resilience is poor and they're just spoiling it for everyone else.
Mark: Interesting. Yeah, interesting. How many of these video conferencing platforms are... Let me see if I can phrase the question in the most clear way. Is it video conferencing first with like virtual meeting as an add-on? Or, are they two kind of separate applications that some vendors sort of, in some cases, market, like I'm thinking there's some... There's some well known virtual meeting platforms that are just virtual meeting platforms, even though I suppose you could strip them down to their most basic element and be like, well, we could... We could use it as just a video conference, but... But, I'm thinking of Zoom as one vendor, not to name, but where they really seem to be focused on adding on this whole kind of, hey, we can do virtual events now in the platform. I'm curious from your perspective, like what are you seeing as general trends and is there an advantage to kind of an all in one solution that can do it all? Do you really need specialization? How do you think about that as an engineer and as a...
Thomas: Yeah, so I think these things are very driven by features. I think it's noticeable. There's been a lot of feature competition between Zoom, and WebEx, and Teams about getting the highest level of integration. So, Teams has the advantage that it can integrate with Office, so that...
Mark: That's right. That's compelling.
Thomas: You can have your Outlook and you... And, so on. Having messaging integrated, it seems to be really useful. And, when you think about the sort of trends in the kind of requirements that users have, when you think about advances in conferencing, you think wealth, 3D, and immersive reality. And, so you think about the video part.
But, actually, if you wanna be productive, often it's all having these other tools together that allows you to work collaboratively that is maybe more important. So, having shared document repositories as you can have in Teams, having a really good chat application, and I always found that the WebEx, WebEx Teams as it was, the WebEx chat was very useful.
So, having this integration can give you a toolkit of things, but talking about the engineering challenge, the engineering challenge is just the complexity of shipping hundreds of features in a single product, which makes it hard to ship everything at once. And, so to meet deadlines for shipping things is tough. Your feature may not get into the next release just because of the complexity of testing all these features for a single program, a single product. Yeah.
Mark: Sure, sure. Yeah, that makes sense. Well, there's one trend that I think is very clear and I think it's safe to say this is a post pandemic trend, is that, at least in... I'll talk for myself, I put up with sort of lower quality and just a... Just sort of a, I'll just say in general, a lesser experience, and didn't really think twice about it when I used it, and, in fact, of course we all, many of us anyway, mostly were using literally conference calls, it literally was a call. Remember those days, Thomas?
Nowadays, it's like if someone says, hey, let's do a call. Like I know immediately, well, it's actually a Zoom, it's a video, or... But, everybody still says call. But, now that we live our professional lives anyway or on video, it seems to me that there has been a real raising of the bar of just what the quality is, and the experience, and it really matters, and people notice, and, perhaps, are even driving decisions to stop using, or to switch platforms, or switch technologies, whatever, based on this. Does this ring true? Is this, and, again, from as an engineer, is this consistent with even what you see in... In the market?
Thomas: I don't know. It's quite hard to say to be honest, because I was always in a video first world, when I was developing conferencing products. And, so we had specialist hardware video endpoints on our desks. So, we were not just doing things on laptops, we were doing things on expensive hardware units. So, it always looked great to me, but...
Mark: You had pristine quality.
(23:13 What drove AV1 development)
Thomas: Yeah, so I think it... One of the things is if you are in that world, you may not realize how bad things were for everyone else. And, I think, therefore, one thing that changed was that video had to get good for hundreds of millions of people very quickly over the pandemic, or at least acceptable. Now, when it first exploded, in fact, people were downgrading the resolution that they were delivering in order to deliver any video at all. If they would... At one point, everything was going down to 360p for everyone in the world. And, then, very often you would have to turn off video for whole swathes to get audio through, but, pretty quickly, the bandwidth got put in.
So, there was a vast increase of network bandwidth in order to support better quality video. And, then there was also huge pressure for people to improve the quality of their video solutions. So, that drove AV1 development... At Cisco. And, I think it's continuing to drive improvements in video coding and video delivery. Across all vendors now, people are really competing on video quality and there was... Has been real cut throat competition on video quality between the different platforms. People were waiting for the latest Wayne House report on comparing different platforms, meeting quality and disputing what the findings were and so on. So, it really, really mattered. And, competition is the mother of invention and , so the solutions got very much better very quickly across the board.
Mark: For sure, for sure. And, we're seeing now some really incredible pre and post-processing media chains or really capabilities that are built in on the audio front obviously. So, I'm curious, there's super resolution, seems to be something, better face detection. I'm even finding like now people really care about background removal in a way that, previously, it was kind of a novel feature, kinda like, oh, look, I can... I can be in a forest right now, you know? And, of course, it usually looked kind of hokey and it was just novel. But, now, not everybody has maybe a nice... They're not able to work with some simple background or something. There can be a lot of... You might be in a cafe or a coffee shop and you want to go live, but you don't wanna have the person on the other end looking at the table behind you, so, there's a lot of reasons that, yeah.
Mark: Yeah, so there's a lot more going on under the hood in these solutions than there used to be. So, there's a lot of pre-processing and increasing... And, increasingly ML based. So, I would also pull out background noise reduction, that's very important, similar to tidying up your visual background. If you are in a cafe, it can have dramatic effects in terms of making things audible to other participants.
But, in terms of the challenges of getting the product out, you are always sweating over the amount of CPU available. So, in our other conversation, we talked about the CPU available for the video encoder. Well, the reason why there... One of the reasons though is not so much available is 'cause you're doing lots of other things.
So, you are doing this ML stuff to suppress noise or to remove the background nicely, but, yeah, with super resolution is a really interesting one, because being able to rely, for example, on having super resolution at a receiver means that all your choices for video and coding could be different. You can be trying to encode something that is a good starting point for a super resolution algorithm rather than the best highest resolution representation of the video that you can produce yourself.
So, you can co optimize those things. And, that's going to be a really interesting thing going forward. But, then there is pre-processing as well. Denoising is really important. I have a good camera. Not everyone has a good camera on laptops. In fact, laptop cameras are just extraordinarily bad sometimes. And, so--Having sophisticated noise for, yeah.
Mark: I can't figure that one out. Why is that? I have good noise resolution. Noise detection would be really good. No, I don't understand quite why that is. You can pay a lot for a laptop and get a camera that costs 10 cents, so...
Well, so, look, we all have, I'm holding up of course my iPhone 13 here, but what an amazing... What just an absolutely incredible camera, and, even Apple, now, obviously they're not gonna put three lenses and there's the industrial design. I totally get that part of it, but why isn't Apple putting something even better in their. I don't get it, in the MacBook Pro, but, yeah, there's a reason I'm... Somebody can explain to us, and maybe we'll find that person, and we'll have them on the show. They won't be able to tell us anything though, being from Apple. So...
Yeah. Yeah. That's amazing. Well, so where do you see... Now, we have the metaverse, the big hot topic, the Metaverse, and, but I am actually... One of my observations about the whole Metaverse conversation is, is that I think it's easy to get hung up a little bit too much for some of us in, oh, is the metaverse gonna be a thing? Am I really gonna live my life with goggles on, headsets, that sort of thing? And, in my opinion, like that's kind of not the point. Like, who knows? I don't know. Did we think... There's so many activities today that I think we could say 15 years ago, 20 years ago, could we honestly say that we believe that we'd be doing the things we're doing now?
And, I think there's... There's plenty of examples where we'd say, no, I probably wouldn't have, and yet we are. So, but what's interesting about the Metaverse to me is that what it does represent is an at scale movement of, in real life, into a virtual environment. And, of course, my personal feeling is is that we're kind of already there for those people who spend two hours a day on Facebook or on any of the social platforms, and they're connecting via that social platform other than the fact that you're not looking at your friends as avatars, and you're not maybe dressed up as different characters or whatever it is. It's kind of like, how is that much different, you know?
But, what's exciting to me about the metaverse is that it does then extend all of these technologies and it's gonna start, it seems to me, and the question here for you, Thomas, is do you agree that, at the most base level, it's going to be web RTC technologies that even enable call it a dot one of the universe, I mean, of the Metaverse, because it has to be ultra low latency, it's video, even if it's streaming some sort of an avatar graphical representation, if it's not camera content, but it's... So it's synthetic. Do you agree with that? That that's kind of a... Or, am I way off here?
(30:34 Good technology with low cost to customers)
Thomas: Well, I think it's certainly true that there will need to be standards for accessing these virtual environments that can be applied by multiple vendors across multiple platforms. I don't know how successful the Metaverse can be if it's a single closed platform for a single vendor. I don't think even Meta are large enough for... To build an entire world that changes how people interact. So, I think there will be need, need to be standards for low latency communication, but whether... Whether web RTC is sufficient for that, I don't know, because it's... There's a whole extra level of requirements around resilience in particular that you need to support.
So, can we really operate with video frames being dropped when they represent 3D avatars, can you get a glitch in your immersive experience? So, I still think that things are very open on the transport front about how all of this gets supported so that people can actually experience it when they don't have hundreds of megabits of bandwidth to be able to do it. It may require much more client side rendering, for example, to keep bandwidth really, really low.
And, that requires more compute at the end points and so on. So, I don't know how the transport side of it works going forward. The other thing is, I think it's worth thinking about what kind of applications this really fits into, because this is being driven somewhat by very large corporations trying to focus on the business environment and trying to work out business uses for these kind of things. But, I wonder whether it is actually the social uses that will really drive adoption where you need to be kind of semi physically or... With an avatar, present with an avatar is in a more social scenario. So, I think being... Being able to provide it at low cost for consumer use for social applications will make the the big difference to adoption and getting the right technologies in place.
Mark: Yeah, and it's a very good point, because, even Zoom, just to reference, now has become a primary business tool for a lot of people. Obviously, all the other platforms are used equally, but, and yet it was not 3, 4, or 5 years ago, my... I actually knew about Zoom, I don't know, six, seven years ago. And, it was through some friends that were doing sort of like these small group kind of mastermind kind of things, where like 10, 12 people would get together and would talk about a business topic or something, and they found Zoom. It was very inexpensive. In fact, I think that they were using a free version and it's like, hey, there's this thing called Zoom, you know?
And, then, so it was not really a business tool. It was kind of more really a personal tool. And, so it can go both ways. And, of course, then Zoom became a personal, through the pandemic, all of a sudden grandma knew how to use Zoom. So, it went both ways. But, I think it's very interesting how, certainly, in the social networks, Facebook has become very important even to B2B SaaS companies and software companies, in terms of a marketing vehicle. And, yet it's clearly a consumer entertainment platform, primarily. It's a social platform.
So, it's a very good point that you're making. Well, it's a super exciting space that we're working in, and I know that, in your role at Visionular, you're doing a lot, you're working with some really fabulous companies and, maybe, in the future you can come back and share some more insights as to what you're working on. But, it's really great. And, thank you, Thomas, for coming on and sharing all of your experience. And, it's an exciting time to be in video, so, yeah, yeah.
Thomas: It certainly is. Well, thanks very much for having me. It's been been great fun. Thank you.
Mark: Awesome. Awesome. Well, thanks for coming on to the VideoVerse.