News center > News > Headlines > Context
Li Feifei’s spatial intelligence debut: AI generates a 3D world based on a single image, which is explorable and follows basic physical and geometric rules.
Editor
2024-12-03 10:01 2,594

Li Feifei’s spatial intelligence debut: AI generates a 3D world based on a single image, which is explorable and follows basic physical and geometric rules.

Source: Qubits

Just now, Li Feifei’s first space intelligence project was suddenly released:

An AI system that can generate a 3D game world with just one picture !

The key point is that the generated 3D world is interactive.

You can freely move the camera to explore this 3D world just like playing a game. Operations such as shallow depth of field and Hitchcock zoom are all available.

Enter any picture:

Except for the picture itself, everything in the explorable 3D world is Are AI generated:

These scenes are rendered in real-time in the browser, equipped with controllable camera effects and adjustable simulated depth of field (DoF).

You can even change the color of objects, dynamically adjust background light and shadow, and insert other objects into the scene.

In addition, most previous generative models predicted pixels, while this AI system directly predicts 3D scenes.

So the scene does not change when you look away and back, and follows basic 3D geometric physics rules.

Netizens immediately exploded, and the word "unbelievable" flooded the screen in the comment area.

There are many famous people such as Shopify founder Tobi Lutke who like it:

Many netizens think this is directly for VR opens up a new world.

The official stated that "this is just a microcosm of the future of 3D natively generated AI":

We are working hard to hand over this technology as soon as possible. into the hands of users!

Li Feifei herself also shared this result immediately and said:

No matter how you theorize this idea, use It is difficult to describe the experience of interacting with a 3D scene generated through a photo or a sentence in words. I hope you all like it.

The waiting list application is currently open, and some content creators have already used it.

Envy drooled down from the corners of his eyes.

The official blog post stated that today, World labs took the first step towards spatial intelligence:

Publish a single picture from An AI system that generates 3D worlds. Beyond the input image, all is generated.

And enter any picture.

And it is an interactive 3D world - users can control it through the W/A/S/D keysView up, down, left, and right, or drag the screen with the mouse to explore this generated world.

The official blog post contains many demos that can be tried.

I really recommend everyone to give it a try this time. The experience of getting started is very different from watching videos or animations.

(By convention, the through train is placed at the end of the article)

Okay, here comes the question, what else is worth exploring in the 3D world generated by this AI system? The details?

Camera effect

World Labs said that once generated, the 3D world will be rendered in real time in the browser, giving people the feeling of looking at a virtual camera.

Moreover, users can precisely control this camera.

The so-called "precision control" has two ways to play.

One is to simulate the depth of field effect, that is, it can only clearly focus on objects at a certain distance from the camera.

The second is that it can simulate sliding zoom (Dolly Zoom), which is the very classic Hitchcock zoom in film shooting techniques.

Its characteristic is that "the size of the subject in the lens remains unchanged, while the size of the background changes."

When traveling to Tibet and Xinjiang, many travel friends hope to use Hitchcock zoom to shoot videos, which have a strong visual impact.

In the World Labs display, the effect is as follows (but in this gameplay, there is no way to control the perspective):

3D effect

World Labs stated that unlike most generative models that predict pixels, our AI predicts 3D scenes.

The official blog post lists three benefits:

First, lasting reality.

Once a world is generated, it exists forever.

The scene from the original perspective will not change just because you look at it from another perspective.

Second, real-time control.

After the scene is generated, the user can control it through the keyboard or mouse and move freely in the 3D world in real time.

You can even carefully observe the details of a flower, or observe secretly somewhere, paying attention to every move in the world from God's perspective.

Third, follow correct geometric rules.

The world generated by this AI system abides by the basic rules of 3D set physics.

Although some AI-generated videos are very dream-like in effect, they do not have the depth of reality we have (doge).

The official blog post also writes that the easiest way to create a visual 3D scene is to draw a depth map.

The color of each pixel in the picture is determined by its distance from the camera.

Of course, users can use 3D scene structures to build interactive effects——

You can interact with the scene with a single click, including but not limited to suddenly shining a spotlight on the scene.

Animation effects?

That’s so easy too.

Enter the world of painting

The team also had some fun and experienced some classic works of art in a "new way".

Brand new, not only in the interactive mode, but also in the fact that just by inputting the picture, you can complete the parts that are not in the original picture.

Then it becomes a 3D world.

This is Van Gogh's "Cafe at Night":

This is Edward Hopper's "Nightwalker":

< p>

Creative workflow

The team stated that 3D world generation can be very naturally combined with other AI tools.

This allows creators to experience new workflow experiences using the tools they are already comfortable with.

For example:

You can first use the Vincent diagram model to move from the text world to the image world.

Because different models have their own style characteristics, the 3D world can migrate and inherit these styles.

Under the same prompt, input the pictures generated by different styles of Vincentian diagram models, and different 3D worlds can be born:

A vibrant cartoon-style teenage bedroom, bed There were colorful blankets, computers scattered on the table, posters hanging on the walls, and sports equipment scattered around. A guitar rests against the wall, with a cozy patterned rug in the middle. The light from the windows adds a touch of warmth and youthfulness to the room.

World Labs and Spatial Intelligence

The "World Labs" company was founded in April this year by Li Feifei, a professor at Stanford University and the godmother of AI.

This is also the first time she has been exposed to starting a business.

Her entrepreneurial direction is a new concept - spatial intelligence, that is:

Visualization becomes insight; seeing becomes understanding; understanding leads to action.

In Li Feifei’s view, this is “a key puzzle to solve the problem of artificial intelligence.”

In just 3 months, the company exceeded the valuation of US$1 billion and became a new unicorn.

Public information shows that a16z, NEA and Radical Ventures are the leading investors, and Adobe, AMD, Databricks, and Huang’s NVIDIA are also among the investors.

There are also big names among individual investors: Karpathy, Jeff Dean, Hinton...

In May this year, Li Feifei gave a public 15-minute TED speech.

She was eloquent and shared more thoughts on spatial intelligence, key points include:

Visual ability is believed to have triggered the Cambrian explosion - a large number of animal species entered the fossil record period. What started out as a passive experience, simply positioning to let in the light, quickly became more active, and the nervous system began to evolve...these changes gave rise to intelligence.

I have been saying for years that taking pictures and understanding are not the same thing. Today, I would like to add one more point: just looking is not enough. See, it's for action and learning.

If we want AI to go beyond its current capabilities, we not only want AI that can see and speak, we also want AI that can act. The latest milestone in spatial intelligence is teaching computers to see, learn, and act, and learn to see and act better.

With the accelerated progress of spatial intelligence, a new era is unfolding before our eyes in this virtuous cycle. This cycle is catalyzing robot learning, a key component of any embodied intelligence system that needs to understand and interact with the 3D world.

The company's target customers reportedly include video game developers and movie studios. In addition to interactive scenes, World Labs also plans to develop tools that will be useful to professionals such as artists, designers, developers, filmmakers, and engineers.

Now with the release of the first spatial intelligence project, what they want to do has gradually become more concrete.

But World Labs said that what is currently released is only an "early preview":

We are working hard to improve the scale and fidelity of the worlds we generate and try new way for users to interact with it.

Reference link:

[1]https://www.worldlabs.ai/blog

< p style="text-align: left;">[2]https://mp.weixin.qq.com/s/3MWUv3Qs7l-Eg9A9_3SnOA?token=965382502&lang=zh_CN

[3]https://x.com/theworldlabs/status/1863617989549109328

Keywords: Bitcoin
Share to: