#1317: Qualcomm’s XR2 Gen 2 and AR 1 Gen 1 Announcement by Hugo Swart

I got consent from Qualcomm to publish Hugo Swart’s press briefing that he gave to journalists a week ahead of Meta Connect 2023 where he announced the new XR2 Gen 2 and AR1 Gen 1 chips for the first time. Here is the announcement video teaser that was the first video shown during the briefing. See more context in the rough transcript below.

Here are links to all all 12 interviews from my series covering Meta Connect 2023 in looking at first impressions of Quest 3 and Rayban Meta Smartglasses, mixed reality trends, AI, Meta Horizon World Builders, WebXR and alternative production pipelines like React Native, Apple Vision Pro buzz, VR filmmaking, unpacking the changes in Unity’s fees, and digging into Qualcomm’s new chips with the XR2 Gen 2 and AR1 Gen 1.

This is a listener-supported podcast through the Voices of VR Patreon.

Music: Fatality

Rough Transcript

[00:00:05.452] Kent Bye: The Voices of VR Podcast. Hello, my name is Kent Bye, and welcome to the Voices of VR Podcast. It's a podcast that looks at the future of spatial computing. You can support the podcast at patreon.com. So this is episode number 12 of 12 of looking at my coverage from MetaConnect. And today's episode is actually a press briefing that was given by Hugo Swart, who's the VP and General Manager of XR and the Metaverse at Qualcomm. So this was a press briefing that was given to journalists on Wednesday, September 20th, 2023, where he was announcing for the first time the XR2 Gen2 and the AR1 Gen1. So when folks were asking me, like, what was some of the biggest announcements that was taken away from MetaConnect? And honestly, some of the biggest announcements for me were what Qualcomm is doing with these new chipsets with the XR2 Gen 2 and the AR1 Gen 1. The XR2 Gen 2 is going to be in the Quest 3 and some of the top level specs is that it's got two and a half times faster GPU performance. And during MetaConnect, they said that there was 33% faster CPU performance. They didn't mention that in the context of this press briefing. And also it's going up from six gigabytes of RAM in the XR2 Gen 1 into eight gigabytes of RAM into the XR2 Gen 2. 3k by 3k per eye displays that it's optimized for 8x better AI performance and I think the new capabilities on the XR 2 Gen 2 for AI I think is actually gonna have a lot of the functionality for what is usually doing with like computer vision and other aspects for just making the XR technology work that's now gonna be rather than offloaded into the software that's gonna be baked into the hardware so there's gonna be a different increases there in terms of like 12 milliseconds of full color through see-through latency up to 10 concurrent cameras and sensors and then 50% better GPU efficiency and then the AR1 Gen 1 is a new chip that they're also announcing that is more for the smart glasses they had announced the AR1 Gen 2 and Scott Stein was saying that that didn't really get launched on anything quite yet even though it was launched last year so Doing an AR one gen one is a little bit lower powered features that are not necessarily driving what is going to happen with the display technology because these Ray-Ban meta smart glasses don't have any display. It's just got a camera might be able to do some computer vision and then how's it going to be tuned for like a virtual assistant for. Metas AI or to be able to actually have these conversational interfaces and I think the AI is actually gonna be driving a lot of where things are going here in the future, but the spatialized audio on this was absolutely amazing on the Ray-Ban Meta smart glasses. There's like five different microphones that are recording like spatialized audio and kind of mixed down into stereo playback. So that was probably one of the more impressive aspects. It really reminded me of a throwback to the Bose AR frames, which I know some folks in the XR industry still use the AR Bose frames as like a Bluetooth headset to be able to both listen to music and communicate with folks. So I think that's going to be one of the key features that may actually be a driver for these smart glasses, just because the spatial audio is so solid on it, but also just having 12 megapixel photos and then video recording of six megapixel video, 1280 by 1280 resolution per eye, which I guess if they were doing some sort of display would be able to show it. There's a heads-up experience that has 3DOF tracking, full-color binocular display support, on-glass AI, 5.8 gigabit per second peak speed on Wi-Fi 7, and then third-generation hexagon neural processing unit. So it actually tried to sync up and do an interview with Qualcomm at MetaConnect, but there was just a little bit of a miscommunication. So I got permission to air this press briefing that was given by Hugo Swart. Because like I said, I think of all the different announcements, I think the Quest 3, at the end of it, it's going to be the chips that are going to be driving the entirety of the ecosystem and that there's been this cooperation and collaboration between Meta and Qualcomm to help develop the architecture of these chips. And one of the first things that Hugo says is that the XR2 Gen 1 has over 80 different devices that have launched. So 80 different XR devices that have launched. And so as we start to go into the XR Gen 2, this is the next generation that is going to be out there for the next three to five plus years. And a lot of XR devices that are going to be coming onto the market are going to be based upon this. So I thought it was worth to dig a little bit deeper into the actual announcement that was given to press, but didn't have any time in the context of Meta's keynote, just because, you know, they're focusing on their products. And I'm sure we'll be able to hear more from Qualcomm as they go into their future presentations, but just want to give a chance to share some of their initial announcements of the XR2 Gen2 and the AR1 Gen1. So that's what we're covering on today's episode of the Voices of VR podcast. So this presentation by Hugo Swart happened on Wednesday, September 20th, 2023 remotely with a series of different XR journalists. So with that, let's go ahead and dive right in.

[00:05:01.915] Hugo Swart: All right. Good morning, everyone. Thank you so much for attending. So first of all, I also have a Said Bakadir on the call. He leads product management for all our XR product line. His team is the one responsible for managing what we're going to announce today. I usually am excited, but today it's especially exciting. You're going to understand why as we go through the slides. As you know, we have been investing in XR for the longest time, over a decade. I started this business in 2015, And we made a tremendous amount of progress. But what we are going to talk today is about revolutionizing spatial compute. It is about taking spatial compute, XR, to the next level. So over the last few years, we have already made available dedicated platforms. And when we say platforms, we really mean the chips, the processors, the chipset. for XR, you know, from the XR2 to XR1 to XR2+, to AR2. And with these platforms, we enabled more than 80 devices launched. And this is a new number that we are sharing with you today. And you may be, wow, 80 devices? Yes, that's true. And how come and why? Well, through our platforms, we are really able to scale with our reference designs, with our model of supporting customers from small startups like Lynx or iQiyi in China, Lynx in Europe, to the top brands in the world like Meta, like Microsoft, like HTC and Lenovo and ByteDance. So really a remarkable leadership story with our XR foundational technology and foundational products enabling the market to really grow. And also, when we talk about the products that we're enabling, they cover the whole spectrum of spatial computing from the VRMR type devices that today we see good traction in consumer front, be it with gaming, fitness, social, entertainment, but also lots of traction on the enterprise market with things like training, education, medical. And we also, of course, work with the glass form factor in augmented reality. Consumer, we're still kind of in the early days of consumer AR glasses. But on the enterprise side, we see a lot of case studies and successes in enterprise, again, with things like infinite desktop, remote assist, instructions, and so forth. So this is what's history, what have we done, and the markets and types of products that we're enabling. And as we look into the spatial computing spectrum, we're talking about two new announcements today, two new announcements that are going to push the boundaries and really push the boundary of the spectrum of spatial computing. from the VR where everything is digital, right? You're fully immersed to mixed reality where you have still a VR type headset. What we video pass through, I can now start mixing, you know, wide field of view virtual reality with the world around me. And then to augment a reality where virtual elements are overlaid in your physical space. So this is the big announcement. The big announcement is that we have not only one, but we have two platforms to announce today. One for VRNMR, that's the XR2 Gen2. So it builds upon the great success of XR2 Gen1 and now takes VRNMR to the next level. And now one that is the first purpose-built platform for smart glasses. So these are the two announcements, and I'm gonna be talking about each one of them over the next few slides. First, XR2 Gen2, right? It's really about powering immersive and true-to-life MR and VR experiences for all. And I want to underline experiences for all, right? We build a product here that is able to democratize access to great experience, VR and MR. It has a game changing structure, providing a step functioning performance as we compare to XR2 Gen1. And before I get into talking about the further details on it, I wanted to run a video highlighting some of the experiences that we foresee with XR2Gen2.

[00:10:32.347] Kent Bye: So at this point of the presentation, Qualcomm actually showed a video and I'll link to the video in the show notes and you can go watch it for yourself. But I just wanted to read the text of what was being shown. So it's first of all, showing the XO2 Gen 2 chip, and then it says breathtaking new worlds unlocked, 2.5X better GPU performance, 12 millisecond video see-through latency, and eight times better AI performance. And in this video, they're showing a number of different mixed reality use cases of an architect that's looking at different spatial designs and then working on a computer and having that at his workstation. And then there's an interior designer that's doing different designs within the context of a space. And then there's a video call that starts and then a dog comes up. And then as the person goes down to pet the dog, then the virtual world fades out and the physical reality comes through, through the video pass-through of mixed reality. So that's the end of the video. And then Hugo continues on with the presentation after that.

[00:11:24.879] Hugo Swart: So you start to get a glimpse on why I'm so excited about this new product. It is really about unlocking breathtaking new worlds, powering the next generation of mixed and virtual reality for all. And I will be talking about it a lot on providing it for all, about democratizing MR and VR, And the reason is that we put a lot of new goodies, a lot of new features into XRGEN2, but we kept it in a single chip design. We kept it in a way that is still small, power efficient and affordable for democratizing access to the technology, to great technology for everyone. There are three main pillars that I want to address as part of the benefits of XR2 Gen2. The first one is about the visual experience, graphics, multimedia. It's about the groundbreaking immersion. Second, we're talking about seamless emotion that is about interaction. How do I interact the digital environment and counting with the physical environment, mixing them two together? And then about realities reimagined, which is, you know, how are we improving mixed reality? How are we improving video see-through or pass-through? So when we're looking to VR, MR, I think one of the first things that come to mind is how do I define immersion? And it's about the rich visuals. You know, what do I experience in front of my eyes? And that, you know, was a front and center one of the three key pillars that we emphasize on XR2 Gen2. How do we increase performance? How do we make it more power efficient so that we can make small form factor devices? But keeping it in a single chip architecture, keeping it simple for our customers to productize products, and enabling 2.5 times the GPU performance, 2.5 times the graphics performance. That's why I was talking about a step function in terms of user experience, visual improvements when we go to XR2 Gen2. Of course, we need to still be very centered in providing it at low power. As a way to compare Gen 1 and Gen 2, if we look at the same type of graphics workload on Gen 2, we get a 50 percent reduction in power, which is, of course, critical. so that you don't have to run devices say above 20 watts or something. You keep it at a much more manageable situation. I don't need to have battery packs or anything of that sort to accommodate the experience that we want to provide to users. Again, making it available to everyone. This product is optimized for up to 3K by 3K displays. And why? Because we believe this is the sweet spot for VRMR today. This is where the display industry is at to producing it at scale, right? And again, coming back to making it available to all, making affordable solutions that can reach a much wider base of people to have access to it. And along with the display optimizations come advanced graphics features like game super resolution, full VR rendering, space warp, all techniques that help with what I was talking before, which is the improved performance while keeping power under control, while keeping power efficiency in the design. So really, you know, that's how we elevate VR and MR experiences to the next level. Now beyond the visual experience the multimedia experience let's talk about the interaction or how do we have this seamless motion between physical and digital world. It's a lot about what's called perception, right? Perception algorithms, perception workloads that provide environment understanding on the right side of this slide, where I can map the room that I'm at, identifying the floor, the walls, the ceiling, objects, where are the planes. And of course, that's critical both for VR, but even more so for MR when you want to blend digital and physical together. And in addition to environment understanding, of course, you need to have a very good user tracking, user understanding, head tracking, obviously, but also things like controllers and then, you know, you start getting to face and full body. And that's something that we paid very close attention, that we put a lot of focus as we designed the XR2 Gen2. And of course, When we talk about user understanding and environment understanding, one of the key technologies to help make it more efficient, lower power is AI. And that's an area we grew substantially in performance as I compare XR2 Gen1 to Gen2. Actually, we're doing up to eight times the performance and providing support for int8 type of operation that are more efficient. Therefore, enabling a lot more concurrency on these workloads, enabling way more complex use cases with this added AI. Tightly coupled with that, It's also, there are a few functions that we hard code, that we actually create new silicon dedicated for certain tasks, for certain workloads. Not as a fancy name, but Engine for Visual Analytics. It's the name of an IP block that we have for computer vision. So essentially, taking a few operations that we always need on and put it in hardware. Put it in hardware that makes it the most efficient, the lowest power solution. I think in this audience here, everyone is likely familiar with the term 6DOF, head tracking in six degrees of freedom. That's an area that we keep working on. Why? Because it's compute intensive. and it's always on while I'm playing a game or watching something or sixth off is always on. And therefore, that's an area of focus for us to lower the power, increase performance, but also make it higher accuracy and more robust so that the tracking is great. And, you know, on the right side, you see a comparison between Gen 1 and Gen 2. where you see the number of key points detected. Key points meaning points in the environment that are used for tracking. The more, the better. It gets more reliable, more robust, more accurate. Then you'll see the power and latency reduced when we go from Gen 1 to Gen 2. Now, I think this video is great in trying to show you why this improvements in AI and computer vision are so important because, as I said, they're very computationally intensive. And hey, if I have to allocate a lot of the CPU, the GPU, or other IP blocks, or other technology blocks to handle them, well, then the application itself doesn't have enough resource. So let me show you a video that is gonna highlight on the right side, what are the workloads that are happening concurrently. And then on the left side, you'll see actually the use case going on.

[00:19:53.998] Kent Bye: So he cuts away to another video that goes into a number of the different use cases for the perception workload concurrency. And I'm just going to read through the different types of. Features that were being shown. So there was the head tracking of sixth off the video, see through the VST, the depth estimation, plane detection, 3d reconstruction, semantic labeling, controller tracking, hand tracking, speech commands, generative AI, eye tracking, face tracking, avatar encoding, and avatar decoding. So the video as it's going through, it's showing all these different annotated things. And then on the side, it's unpacking all the different things that are happening on the AI side that are now going to be baked into the hardware of the XR2 Gen2. That was that video. And now back to Hugo's presentation.

[00:20:39.586] Hugo Swart: So this video illustrates how all of these come together. you know, how many of these workloads, this perception tasks are running concurrently, that if I don't have a focus and dedicated resources for, and optimized resources for, you know, you're not going to have a satisfactory user experience. still related to interactions, still related to the overall seamless experience, of course, connectivity is part of it. You're always connected with the cloud, and in particular, if you're talking about multi-user or cloud-rendered solutions and a mix of that, we need very good, high-performance, low-latency Wi-Fi. And we support both Wi-Fi 7 and Wi-Fi 6e. we offer what we call the Qualcomm FastConnect software suite for XR that are optimization specifically done for XR and then enabling 25 percent lower power. Of course, power is always a vector that we carefully optimize for big speeds of 5.8 gigabits per second and 80 percent lower latency. So together, the AI, the computer vision and Wi-Fi, that's the second pillar of what we're calling seamless emotion. Now, let's talk about MR, let's talk about realities re-imagined. Clearly, for a great experience of MR or VST, I'm using the terms almost interchangeably, VST and MR. Latency is a key variable. It's a key factor for users feeling good about seeing the real world as you have it without devices. And what we have done in XR2 Gen2 is really looking to the full pipeline that we have today and optimizing it. How can we cut latency while providing the quality of experience that you need for VST? So if you look into a traditional pipeline, And traditional, think about how you do it in phones today, right? When I have a preview, I just opened up my camera, I'm seeing the real world in the display of my phone, right? But in a phone, latency is not that critical. But now when you're trying to substitute what you see with your eyes to that camera and display, now that becomes the key metric. And we're able to achieve 12 milliseconds with our solution. And at the same time, still applying techniques like geometric correction, noise reduction of the image, dynamic light range, so that you have that balance of image quality and latency. So that's the summary of XR2 Gen2. It is a platform built ground up for VRMR. It's built from the ground up for efficient, affordable devices while providing the next generation, the next level, a groundbreaking performance with the 2.5 times the graphics, 8 times the AI, and 12 millisecond color video see-through. Now, let's switch gears. and go into AR. Oh, whoops, there is something. I actually forgot one of the most important things of this announcement. Usually, when you look into a Qualcomm platform announcement, we do it ahead of time, right? We do it, I don't know, one year ahead of time or so when you have a platform announcement. But XR2 Gen2 was special because we work very closely with Meta. If you remember, last year we announced a long-term multi-generation agreement with Meta, and voila, here's one of the outcomes of that collaboration when we have a chip that really works together between the teams on the specification of it for the device. You see our embargo, it's time to MetaConnect. So you're going to learn more, of course, about the device itself then. But we're super excited that we're announcing the XR2 Gen2 together with the end product announcement. So I really cannot wait for you to experience firsthand. Now we switch gear to glasses. So we have been talking about immersive AR for a while. We announced AR2 and viewers connected to phones and PCs, and that's still going on. We're still making progress on that front. But now we see a trend for glasses that maybe are not as immersive on the AR front, but that they look and feel just like a regular sunglass, that look and feel like a prescription glass, that is sleek and stylish. but enabling certain use cases like image and video capture as good as you can get on a phone, music streaming and audio-based features, voice assistant with fast connectivity. That's when our second product announcement, platform announcement of the day comes. We are announcing the AR1 Gen1. which is our first platform designed for these next generation smart glasses. So that's AR1 Gen1. Feature number one is a premium in the moment capture. Right? If you look into previous smart glasses, you know, there was a compromise on picture quality, on image quality, you know, versus your phone. We wanted to remove that. That's what we did with AR1 Gen1. We now have a 14-bit dual ISP for the superior image quality, up to 12 megapixels per photo, 6 megapixels video, and many of the features that you see on premium smartphones like computational HDR, portrait mode, auto face detection, auto exposure. Second highlight of the chip, again, AI. AI, you can do a lot of AI just with a camera input and audio input. So we beefed up the AI engine for this chip that works, of course, in conjunction with our sensor hub, our ISP to enable things like visual search. I think visual search, we're all used today to having text search. Voice search with voice assistants. Now we're going to start to see visual search. We do that a little bit with phone, but still sometimes it's a friction point to take your phone off and open the camera and detect what's in front of me and give me some contextual visual search. Now, it's in front of your eyes and you don't have to take it out of your pocket. It's already with you. So AI is important for that, is important for computer vision, aided audio and video capture, important for noise cancellation, clearer calls, and things like sensing health and environment. So that's the second one. The third point we want to highlight with AR1Gen1 is that it is flexible in supporting display, no display, single display, dual display. So yes, the smart glass, we're seeing it as a subcategory of the broader AR market. But within smart glasses, we expect these three type of variants, a simpler one without display that offers a lot of functionality already. some that will have a single display that is not a large field of view, it's probably small field of view, lower resolution, but provide value to users with notifications and certain functions that don't need an immersive visual, an immersive bigger field of view glass. Of course, a dual display makes it a richer experience for those that want the display. Quick video here to show an example. I'm cooking, there's some awareness of the activity, ask for a timer. If I have the display, the timer can be in my field of view. If I don't have the display, you can still have an audio-aided timer as we do today with assistance. But here are a few other examples. I mentioned visual search, a big fan of this one. But things like real-time translation, You can do it without display, but of course, with display becomes better. You can have the translation in your field of view. This could be even interesting and useful for hearing impaired persons that could have the audio coming in through the microphone and then translation going on and shown as subtitles to that person. Enterprise, Yes, there's many enterprises that want that richer AR experiences, but some only require reading barcodes and going through checklists. This type of functionality may be enough. Navigation. I think that's the kind of feature functionality we're expecting with the AR1. In AR1 also, we're connecting it, we're using Wi-Fi 7 with the same type of benefits as I mentioned in the VRMR solution. And important, especially if I want to upload my videos and images that I capture on the glasses. But if I have a display, I'm also streaming. And so these connectivity features come in very handy. That's the summary of the AR1 Gen1. It's going to be empowering the next generation of smart glasses. Camera is the big feature that we're highlighting with AI and display support, but resolutions, not necessarily the ones that we expect with immersive AR. You know, similar to our introduction of the XR2 Gen2, we're also announcing the AR1 Gen1. paired with a META initiative. The META product that is done together with Ray-Ban will be powered by AR1 Gen1. So you see us working closely as well with META on this form factor, on the Glassman factor, and I think that's a very powerful combination of the two companies now working both on VRMR and on the glass form factor, really driving the whole spectrum of XR, the whole spectrum of spatial compute together. Here's the summary slide for XR2 Gen2 and AR1 Gen1. I'm not going to repeat them. I think I would rather leave more time for Q&A for you. Great. Thank you, Hugo.

[00:32:36.642] Kent Bye: Thank you, everybody, for joining. We're really excited about today's announcements. All right.

[00:32:40.224] Hugo Swart: Thanks, everyone. Appreciate it. Bye.

[00:32:43.933] Kent Bye: So thanks again for tuning in to one of my dozen episodes about MetaConnect. There's lots that I've been unpacking throughout the course of the series, and I'm going to invite folks over to patreon.com to be able to join in to support my work that I've been doing here as an independent journalist trying to sustain this work. Realistically, I need to be at around $4,000 a month to be at a level of financial stability. I'm at around 30% of that goal. So I'd love for folks to be able to join in, and I'm hoping to expand out different offerings and events over the next year, starting with more unpacking of my coverage from Venice Immersive, where I've just posted 34 different interviews from over 30 hours of coverage. And I've already given a talk this week unpacking a little bit more my ideas about experiential design and immersive storytelling. And yeah, I feel like there's a need for independent journalism and independent research and just the type of coverage that i'm able to do and if you're able to join in on the patreon five dollars a month it's a great level to be able to help support and sustain it but if you can afford more than 10 20 50 or even 100 a month are all great levels as well and will help me to continue to bring not only you this coverage but also the broader XR industry. I now have transcripts on all the different interviews on the podcast on Voices of VR and in the process of adding categories as well into 1,317 interviews now that have been published after this series has concluded. So yeah, join me over on Patreon and we can start to explore the many different potentialities of virtual and augmented and mixed reality at patreon.com slash Voices of VR. Thanks for listening.

More from this show