The DeanBeat: Nvidia CEO Jensen Huang Says AI Will Autofill Metaverse’s 3D Images

Do you want to know what’s next for the gaming industry? Join gaming executives to discuss emerging parts of the industry at GamesBeat Summit Next in October. Register today.


It takes AI species to create a virtual world. Nvidia Director Jensen Huang said during a Q&A at the GTC22 online event this week that AI will automatically fill the 3D images of the metaverse.

He believes AI will take the first step in creating the 3D objects that populate the vast virtual worlds of the metaverse — and then human creators will take over and refine them to their liking. And while that’s a very big claim about how smart AI will be, Nvidia has done research to back it up.

Nvidia Research announced this morning that a new AI model could contribute to the massive virtual worlds being created by a growing number of companies and make it easier for creators to populate a diverse array of 3D buildings, vehicles, characters, and more.

These kinds of everyday images represent an enormous amount of tedious work. Nvidia said the real world is full of variety: streets are lined with unique buildings, with various vehicles speeding by and diverse crowds passing by. Manually modeling a virtual 3D world that reflects this is incredibly time consuming, making it difficult to fill out a detailed digital environment.

Nvidia wants to make tasks like this easier with its Omniverse tools and cloud service. It hopes to make developers’ lives easier when it comes to creating metaverse applications. And auto-generating art — as we’ve seen this year with DALL-E and other AI models — is one way to lighten the burden of building a universe of virtual worlds like in snow crash or Ready Player One.

Jensen Huang, CEO of Nvidia, speaks at the GTC22 keynote.

I asked Huang in a press Q&A earlier this week what might make the metaverse faster. He alluded to the Nvidia Research work, though the company didn’t go out until today.

“First, as you know, the metaverse is created by users. And it’s either handcrafted by us, or it’s made by us using AI,” Huang said. or something like that. And it’s like this city, or it’s like Toronto, or it’s like New York City, and it creates a new city for us. And maybe we don’t like it. We can give it extra clues. Or we can just keep pressing “enter” until it automatically generates one that we would like to start from. And then from there, from that world, we’ll adjust it. And so I think the AI ​​for creating virtual worlds on this moment is realized.”

GET3D details

Nvidia GET3D is trained with only 2D images and generates 3D shapes with high fidelity textures and complex geometric details. These 3D objects are created in the same format used by popular graphics software applications, allowing users to instantly import their shapes into 3D renderers and game engines for further editing.

The generated objects can be used in 3D renderings of buildings, outdoor spaces or entire cities, designed for industries such as gaming, robotics, architecture and social media.

GET3D can generate an almost unlimited number of 3D shapes based on the data it has been trained on. Like an artist turning a lump of clay into a detailed sculpture, the model transforms numbers into complex 3D shapes.

“At the heart of that is exactly the technology I was just talking about, called large language models,” he said. “To be able to learn from all the creations of mankind and to imagine a 3D world. And so one day from words, through a large language model, triangles, geometry, textures and materials will emerge. And from there we would adjust it. And, and since none of it is pre-baked, and none of it is pre-rendered, all this simulation of physics and all of the simulation of light has to be done in real time. And that’s why the latest technologies we’re creating regarding RTX neuro-rendering are so important. Because we can’t do it by brute force. We need the help of artificial intelligence for that.”

Using a training dataset from, for example, 2D car images, it creates a collection of sedans, trucks, race cars and vans. When trained on animal statues, it comes up with creatures such as foxes, rhinoceroses, horses and bears. With chairs, the model generates various swivel chairs, dining room chairs and cozy armchairs.

“GET3D brings us one step closer to democratizing AI-powered 3D content creation,” said Sanja Fidler, vice president of AI research at Nvidia and leader of the Toronto-based AI lab that created the tool. “The ability to instantly generate structured 3D shapes can be a game-changer for developers, allowing them to quickly fill virtual worlds with varied and interesting objects.”

GET3D is one of more than 20 Nvidia-written articles and workshops accepted for the NeurIPS AI conference, taking place in New Orleans and virtually, Nov. 26-Dec. 4.

Nvidia said that while faster than manual methods, previous generative 3D AI models were limited in the level of detail they could produce. Even recent reverse rendering methods can only generate 3D objects from 2D images taken from different angles, forcing developers to build one 3D shape at a time.

GET3D can instead produce about 20 shapes per second when running inference on a single Nvidia graphics processing unit (GPU) – it acts as a generative hostile network for 2D graphics, while generating 3D objects. The larger and more diverse the training dataset from which it is learned, the more varied and
the export in detail.

Nvidia researchers trained GET3D on synthetic data consisting of 2D images of 3D shapes captured from different camera angles. It took the team just two days to train the model on about a million images using Nvidia A100 Tensor Core GPUs.

GET3D gets its name from its ability to generate explicit structured 3D meshes – meaning that the shapes it creates are in the form of a triangular mesh, such as a papier-mâché model, covered with a textured material. This allows users to easily import the objects into game engines, 3D modelers and movie renderers – and edit them.

Once creators export shapes generated by GET3D to a graphics application, they can apply realistic lighting effects as the object moves or rotates in a scene. Incorporating another AI tool from NVIDIA Research, StyleGAN-NADA, developers can use text prompts to add a specific style to an image, such as adjusting a rendered car to match a burnt car or a taxi, or from an ordinary house to a spooky one.

The researchers note that a future version of GET3D could use camera position estimation techniques, allowing developers to train the model on real-world data rather than synthetic data sets. It can also be improved to support universal generation – meaning developers can train GET3D on all kinds of 3D shapes at once, instead of having to train it on one object category at a time.

Prologue is Brendan Greene's next project.
Prologue is Brendan Greene’s next project.

So AI will generate worlds, Huang said. Those worlds will be simulations, not just animations. And to do all this, Huang foresees the need to create a “new type of data center around the world”. It’s called a GDN, not a CDN. It is a graphics delivery network, tested by Nvidia’s GeForce Now cloud gaming service. Nvidia has taken over that service and used it to create Omniverse Cloud, a suite of tools that can be used to create Omniverse applications anywhere, anytime. The GDN will host cloud games, as well as Omniverse Cloud’s metaverse tools.

This type of network could provide real-time computing needed for the metaverse.

“That’s interactivity that’s essentially instantaneous,” Huang said.

Are there any game developers asking for this? Well, in fact I know one that is. Brendan Greene, creator of battle royale game PlayerUnknown’s Productions, called for this kind of technology this year when he announced and then unveiled Prologue. Project Artemis, an attempt to create an Earth-sized virtual world. He said it could only be built with a combination of game design, user-generated content, and AI.

Well, holy shit.

The GamesBeat credo when talking about the game industry, is ‘where passion and business meet’. What does this mean? We want to tell you how important news is to you — not just as a decision maker in a game studio, but also as a game fan. Whether you’re reading our articles, listening to our podcasts, or watching our videos, GamesBeat helps you learn and have fun with the industry. Discover our briefings.