NVIDIA Unveils Its Four Pillars Of XR Development [PAID]

March 10, 2022March 10, 2022

On the occasion of the upcoming GTC, I had the opportunity of speaking with two very important people that work every day at innovating XR at NVIDIA: David Weinstein, Director of Virtual and Augmented Reality, and Gregory Jones, Director of Global Business Development/Product Management. I had an interesting chat with them about NVIDIA’s vision for the future, they also clarified to me what NVIDIA Omniverse is, and when 5G cloud streaming can become reality. And yes, they also gave me a hint about one of their future announcements…

NVIDIA GTC (and the chance to win an RTX 3080 Ti!)

As a full disclaimer, this is a paid article that I’m writing for a recent collaboration I started with NVIDIA to help them promote GTC, NVIDIA’s flagship event, that this year will be held online on March 21st-24nd. You know me well, so you know that if I accepted a similar collaboration it is only because NVIDIA guaranteed me that it could be valuable for you, my readers. And in fact, this post is so full of interesting information that it doesn’t look like a paid one at all 🙂

This collaboration also lets me offer you one special bonus: if you live in the EMEA area (Europe, Middle East, Africa), if you register for the GTC event using my referral code https://www.nvidia.com/gtc/?ncid=ref-crea-201724, you can have a chance to win an NVIDIA RTX3080 Ti (worth $1200)! The graphics card will be extracted from all those who used my referral to register and who then followed at least one session during the event (the keynote does not count and just registering is not enough). I have not millions of followers, so the odds of winning aren’t bad: it can be worth trying 😉

You can discover more about the GTC and some of its cool XR sessions in the remainder of this article. Now let’s talk about what David Weinstein and Gregory Jones have revealed to me!

NVIDIA’s Four XR Pillars

The first topic I spoke about with David and Greg was NVIDIA’s vision of the future. It was interesting to discover that in recent times, NVIDIA is focusing its XR efforts on four different features, which they call the “Four XR Pillars”. You can see them in the picture below.

nvidia four xr pillars — NVIDIA’s 4 XR Pillars (Image by NVIDIA)

This is the first time they talk about this publicly, so they described to me what are these four pillars and why they are important for the XR future:

Photorealism & Physical Accuracy: providing high-end graphics has always been NVIDIA’s mission (it started as a graphics card company, after all!). And in fact, in these years, NVIDIA has always worked in offering the tools to improve the graphical quality of games and 3D applications in general, with ray-tracing (RTX ON / RTX OFF) being the latest big innovation it brought to the market. According to NVIDIA, photorealism is important for XR, especially for the enterprise. In sectors like automotive design, the graphical quality of the application must be the closest possible to reality. And also, for AI training, as we’ll see in a while, realistic rendering and physics engines to create synthetic training data are of paramount importance. For these reasons, photorealism and physical accuracy are important goals.

Artificial Intelligence: David stressed many times during the call about the importance of AI with immersive realities. “XR and Artificial Intelligence are increasingly going to be seen together”, he said. First of all, AI can be important to improve the quality of VR, and DLSS is a clear example of it. But AI will also be fundamental to improve the usability of XR. We need digital assistants in our immersive reality; we need intelligent interfaces that understand our intentions and let us do what we want to do by providing minimal input in a natural way (e.g. using the voice); we need smart modes to write in XR; we need XR glasses that understand what we are doing and provide suggestions. Menus should become something of the past, and AI should replace them to give us the perfect user experience in XR.

Collaboration: XR feels a bit sad when we are alone, and the possibility of meeting remotely is one of the superpowers of VR (I guess you’ve heard of that guy from Palo Alto who is a fan of sauces and that keeps talking about social VR…). NVIDIA has been one of the first companies that understood the potential of enterprise engineering and design collaboration in XR with its Holodeck software, whose trailer videos are still among the best to show the capabilities of XR collaborative environments.

Now Holodeck is being substituted by NVIDIA Omniverse, NVIDIA’s platform for 3D design collaboration and world simulation, but the vision is always the same: working with other people to build and interact with AR and VR scenes.

(Video courtesy NVIDIA)

Streaming: Streaming of data (for AI/ML cloud computations) or visuals (for remote rendering) is becoming more important, especially now that our goal is trying to build standalone glasses that are always smaller. “Streaming is becoming a linchpin for a lot of what NVIDIA is doing in XR” I’ve been told. The vision is that thanks to 5G/6G we’ll be able to wear headsets that have little computational power and demand almost everything to the cloud. But to do that, we need optimized streaming algorithms.

What I find interesting is that these XR pillars more or less represent NVIDIA’s vision of working at the intersection point of all the newest technologies, the ones that are creating the current technological revolution: XR, 5G, AI/ML… the “Convergence” Charlie Fink always talks about.

What is Omniverse?

https://i0.wp.com/p4-ofp.static.pub/fes/cms/2021/11/10/ldj0i16yk8ccxlz25o4hcfpphs88id931652.png?resize=640%2C514&ssl=1 — Omniverse logo (Image by NVIDIA)

I keep hearing NVIDIA talking about its product called “Omniverse”, but I have always had a confused idea about what it was: it seems a collaborative tool, but also a renderer, but also a simulation space. So in the end… what is it?

Speaking with Greg and David, I understood that it is… all of the above. It’s just that Omniverse is rarely explained in the right way. After the meeting, I can say that I got a general idea, so I can try explaining it well to you.

Omniverse is a powerful tool for creating 3D virtual worlds that is made of different modules. At its heart, there is the “Nucleus”, which is the core of the system. It is technically a “database and collaboration engine”, but simplifying, we can say that it is the mind of Omniverse.

Various creators’ tools can work together on the same scene thanks to the Nucleus (Image by NVIDIA)

Nucleus connects via special plugins called “Connectors” to standard applications that are used to work on 3D scenes, like Unreal Engine, Adobe Substance, Autodesk 3ds Max, Blender, etc… Normally to work on a 3D scene, you need a full team working on its various aspects (e.g. someone working on the materials, someone else on the textures, etc…), and every person provides his/her work to someone that performs the integration inside the final scene. This “integrator” gets the images from the texture artist, the 3D models from the 3D artist, etc… via e-mail or USB pen, and bridges them all together in a single scene. Now with Omniverse, Nucleus does all this manual work, but in an automated way: all the applications of the various artists are connected through the Connectors to Nucleus. Nucleus keeps a 3D representation of the scene, and as soon as someone makes a change, it gets automatically updated in the Omniverse scene. It is a bit like GIT (or Google Doc, if you find it easier to imagine), but for artists and designers. Multiple people can work on the same scene together, each of them with their software, and Nucleus takes care of integrating everything.

Nucleus can do this magic by ingesting or converting everything into the USD format (+ MDL for the materials). All the output of all applications that work with common 3D formats is automatically converted to USD and integrated into the scene. NVIDIA says that USD + MDL is like the 3D counterpart of HTML, a standard language that allows for the interoperability of web pages.

This “collaboration engine” is just the first part of what Nucleus allows. Nucleus has inside a USD representation of the scene that the team is building, and this scene can be provided to other software and services to perform some operations on it. This means that you can have on top of Nucleus applications and services that do something with this scene. For instance, one of these services could render the scene with ray tracing. Or it can be that you use that scene to perform the training of some AI (e.g. some robots), so you can use the AI training service. It can be that the service lets you use your digital scene as the digital twin of something real. And so on. NVIDIA makes available a few examples of using services to build applications using Omniverse Kit, for instance Omniverse Create, View, and Machinima but developers can build their own applications using Kit as well.

An architecture project made with Omniverse (Image by NVIDIA)

That’s why I said that Omniverse is a bit of everything: its basic use is that you use it as a collaboration tool, but then you can also have on top some applications for many different uses. And NVIDIA stresses the importance of these use cases:

You can train the AI of a self-driving car with a photorealistic, physically accurate scene created inside Omniverse. The realism of the scene makes sure that when the car operates in the real world, the AI can work with satisfying results (because the real world will be indistinguishable from the virtual one of the simulations);
In the same way, you can train robots to operate in real environments: only if the environments are indistinguishable from the real ones, the training is effective (and here we get back to the importance of the “photorealism” pillar);
Digital Twin is a popular use of Omniverse, and BMW’s “factory of the future” is a clear example of it. Since digital twins are used also for predictive maintenance (i.e. predicting when a machine will break and substitute it before it happens), they must be an exact replica of reality, constantly synchronized, and perfectly real-time. The physics must be right, the materials must be right, the lighting must be right… things can’t be “more or less similar”, they must be exactly identical for the prediction to be reliable. Omniverse rendering and simulation engines make this possible;
You can even simulate the propagation of 5G in a city: 5G propagation is a bit similar to light rays propagation, that NVIDIA does with ray-tracing, so it is not difficult to perform with the tools NVIDIA already has.

In all these examples, you see again why NVIDIA has this approach with the four pillars: these technologies are so intertwined one with the others that you can’t develop one without considering the others. In all the above examples, you can see that photorealism is so important for AI training, for instance, so these two technologies must be developed side-by-side. That’s why a graphics card company has become excellent with AI: at the end of the day, as Greg reminded me, a GPU is just a big parallel computer.

It’s curious to discover that NVIDIA started building Omniverse not as a product, but as a tool for itself, to train its AI by scaling on hundreds of GPUs. There weren’t tools that we were able to let a team collaborate on a project optimized for AI training, so they built it themselves. And then they realized it could be useful also for others.

I hope to have finally made Omniverse clear to you: imagine it as a software solution made of three layers. On the lowest layer, you have the tools with which you make a scene (Blender, Unreal Engine, 3ds Max, etc…). In the middle, you have Nucleus that assembles the scene. And at the top, you have services and applications built on Omniverse Kit that use this scene to produce an output (e.g. a ray-tracing renderer).

Toy Jensen

One of the coolest things about the latest GTC has been the digital avatar of Jensen Huang (NVIDIA CEO) answering questions. I was curious about the vision of NVIDIA’s digital avatars, but the answer has not been as I expected.

(Video courtesy NVIDIA)

“The idea is not to sell 1000’s of Toy Jensen’s” David told me. NVIDIA has not in mind building digital avatars as a product. Toy Jensen has been a (cool) showcase project to show many technologies involved in the pillars, like the rendering service that outputs the nice graphics of Jensen, and all the AI services that analyze videos and audio of the people Jensen is talking with. But it is not an upcoming product per se.

These services can also be used for other use cases that are more in line with NVIDIA’s business. Project Tokkio (an acronym coming from “Talking Kiosk”) is a smart kiosk that can be installed in airports or other locations with many users passing by. The kiosk is smart and can provide information to the users in a very natural way: it can understand what the users are saying and can also speak with natural language. It is not sexy like Toy Jensen, but it is more useful (sorry Jensen).

(Video courtesy NVIDIA)

The importance of Artificial Intelligence

During our chat, David stressed a lot of times how AI is important for our XR future. The purpose of AI, exactly like Project Tokkio shows, is to make the life of users easier. Nowadays we usually have different applications to do the same task (e.g. to 3D model you have Blender, 3D Studio Max, Cinema 4D, etc…), and they are all completely different: think about how 3D programs do not even always agree on what is the Y-axis. They have all different commands and shortcuts. If you know one, you don’t know how to use the other. This is total nonsense.

“I don’t want to become an expert on how to use every application”, David told me, “I just want to use my expertise with the app that via AI understands what I want to do and does it. Everything else is friction”. In the example above, if some AI would understand that I want to create a cube, it could create it no matter what software I am using. The power of AI can be the one of simplifying all the interfaces and letting the user focus on its knowledge instead of the tool he is using. This would be a great advancement for the UX of computer applications.

anna virtual concierge — CrowsNest’s Anna is a digital assistant that can help people entering a hotel for the first time. She’s an example about how AI can make our life easier in the future (Image by CrowsNest)

Greg has added that another advantage of AI is that it can give you an assistant that helps you 24/7. You won’t always need the help of a real person to perform some basic tasks, and an assistant will suffice to do them. Do you need to get a file about a building you are designing? The AI could retrieve it for you, without you having to disturb your colleagues.

AI in the end will be part of everything: all products will embed some kind of artificial intelligence, according to David. Hotels will have a virtual concierge that will receive you and help you, for instance. Cars will feature (and already feature) a lot of AI-powered systems. It is inevitable for our future.

Will 5G rendering be possible?

I’m a big fan of cloud rendering, so I asked Greg the mother of all questions: “Will XR cloud rendering ever be possible?”. Because there are people in the communities that think that it is (and will be) impossible that wherever we are, there is always a server close-enough to do the rendering with minimal latency. The answer he gave me has been the most complete I have ever had.

First of all, he highlighted that in controlled conditions, cloud rendering over 5G already works incredibly well. NVIDIA made various experiments on this, and one of the best ones has been at Piccadilly Circus, with an experience made in collaboration with Factory 42 and BBC. In this experience, called Green Planet, it is possible to go around the place with a smartphone and see David Attenborough shown in a volumetric video in AR that guides the user. Such a beautiful graphical quality would be impossible with just the smartphone, and the power of cloud rendering lets you have this amazing experience. Having worked with the NVIDIA Cloud XR SDK with Oculus Quest myself, I can confirm that cloud rendering provided by NVIDIA, in the right network conditions, is incredible, and has a quality that is comparable to the one of Virtual Desktop.

Then we have the problem of the network infrastructure, that is just a matter of time. We need 5G everywhere, and this will take some years to happen. Nowadays, if a factory wants to use cloud rendering, it can deploy 5G locally with private infrastructure. But in some years, public 5G is coming everywhere, enabling the fast network needed for cloud streaming.

We also need 5G-connected headsets: we already have a reference design by Qualcomm in this sense and that NVIDIA used in its tests, but no company has used it to build a product, yet. I guess this will come in the upcoming years too, and the recent news that Meta is looking for people expert in cellular networks to work on XR headsets is a positive message in this sense.

In the end, we come to the real problem, that is one of the servers. And here things become more nuanced than I expected. Greg highlighted that first of all not all XR is made equal: having a latency requirement of 10ms is different from one of 50ms and they have different associated complexities and costs. Obtaining 10ms requires a very close server, while 50ms leaves some more margin. While the theory says that we should always have 10-20ms max of latency, in reality, this is fundamental only in some contexts, like professional Beat Saber players. For a professional doing automotive design, 50ms is more than ok (and the success of Virtual Desktop shows that people are mostly ok with 30-50ms latencies). Also, AR has far looser requirements than VR. So it should be possible to build a cloud rendering service that answers with different latencies depending on the various requirements of the different applications. And this would be much cheaper than making something that always answers with 10ms latency.

(Video courtesy NVIDIA)

Telcos are all investigating cloud rendering, and they are especially investigating what are the technical requirements to make this happen (servers positions, GPUs required for every location, etc…) and the associated costs. They are evaluating the feasibility of widespread cloud rendering servers. The idea is starting by implementing it as an add-on to existing network infrastructure: for instance, AWS and Verizon already partnered for Wavelength, and there are some parts of the US with an infrastructure with edge servers connected via 5G that could be used also for cloud rendering.

Greg and David were almost sure that something cool is going to happen in the field in the next 3-4 years. They used expressions like “It’s inevitable” or “All telcos are looking at this” to show how they believe in this. They are actively working with telcos to make this happen, and they are optimizing cloud rendering algorithms so that they can work in the best way possible on all networks.

David reminded me how 5 years ago it seemed impossible to play games remotely and now NVIDIA Geforce Now is actually a reality and people can play games rendered by a server from everywhere they are. Technological evolutions sometimes surprise us.

(Video courtesy NVIDIA)

Cloud Rendering and AR

David wanted to stress an aspect of cloud rendering: when thinking about cloud rendering, we always think about VR, but actually, he says that it can be a game-changer technology for AR, too. “Start thinking about AR rendered by a PC. Because that one is the future” he told me. And I report this suggestion to all of you.

NVIDIA GTC announcements

I tried asking Greg and David for some teasers of announcements for this GTC. They told me that one important announcement will be further XR support for Omniverse. This means that there will be a VR service that will let you see the scene in an immersive way, with the power of Ray-Tracing that makes it ultra-realistic. I am kind of impressed by this!

They were also kind to give me a list of XR-oriented sessions and speakers that you can follow at GTC (remember that following one of them you can participate in the raffle I described to you above, so this list is important).

Some of the most important speakers in the XR field that you can find at GTC (Image by NVIDIA)

You can access the full list of XR-related sessions on the GTC website.

Ray-tracing and XR

Talking about ray-tracing, I asked them about the feasibility of ray-tracing in XR and I was glad to hear that it is actually possible. You need a GeForceRTX 3090 or an NVIDIA RTX A6000 to run it, because it is computationally demanding, but it is definitely possible.

Even better, Greg and David surprised me by saying that once ray tracing works because you have a graphics card that is powerful enough, the complexity of the scene doesn’t matter anymore. Ray-tracing has a little dependency on the scene complexity, because it depends almost only on the number of pixels to render: having 1000 or 10,000,000 polygons makes very little difference. This is because ray tracing has an inverse approach from the standard rendering: standard “raster” renderers start from the geometry of the scene and try to understand what is seen by your eyes, so the calculations are all about the geometry; ray-tracing renderers instead work by simulating the bounces of a light ray for every pixel of the image, so the starting point is the pixel.

This means that as soon as many people will have a graphics card powerful enough to perform ray-tracing on all the pixels of the screen of a VR headset, we can have mainstream ray tracing in all our VR applications! :O

Meme Maker - SWEET MOTHER OF GOD Meme Generator!

—

And with this last amazing insight, I thank Greg and David for the time they have dedicated to me and for this epic interview. I’m pretty excited by the convergence of AI, XR, and 5G and I’m happy that there are companies like NVIDIA that are working on it.

What are your thoughts on this? Are you excited like me? Let me know in the comments or on my social media channels!

See you at GTC 😉

Disclaimer: this blog contains advertisement and affiliate links to sustain itself. If you click on an affiliate link, I'll be very happy because I'll earn a small commission on your purchase. You can find my boring full disclosure here.