With a valuation of 1 billion in 3 months, Li Feifei’s first space intelligence model was born!

Image source: Generated by Unbounded AI

The AI-generated 3D world has come true!

Just now, World Labs, founded by AI godmother Li Feifei, officially announced its “spatial intelligence” model for the first time, which can generate a 3D world with just one picture.

In Li Feifei's words, "No matter how you theorize this idea, it is difficult to describe in words the interactive experience of generating a 3D scene from a photo or a sentence."

This is Mai The first step towards spatial intelligence.

Interactive portal: https://www.worldlabs.ai/blog#footnote1

All scenes can be rendered in real time in the browser, and controllable camera effects can also be achieved , adjustable simulated depth of field.

In the future, the virtual world of game NPCs can be switched at will, and it can be generated in minutes.

Jim Fan, NVIDIA senior research scientist and Li Feifei’s disciple, concluded , "GenAI is creating increasingly high-dimensional snapshots of human experience. Stable Diffusion is a 2D snapshot; Sora is a 2D + time dimension snapshot; and World Labs is a 3D, fully immersive snapshot."

In April this year, it was revealed that Li Feifei had started her own business, focusing on spatial intelligence. The new company raised private financing and directly became a US$1 billion unicorn.

Until September, this company called World Lab officially debuted and raised US$230 million in a new round of financing, with the support of AI giants Geoffrey Hinton, Jeff Dean, former Google CEO Eric Schmidt and others. support.

The founder team of World Labs, from left to right, are Ben Mildenhall, Justin Johnson, Christoph Lassner and Li Feifei

After more than half a year of preparation, spatial intelligence has finally taken shape.

Netizens said excitedly that it’s so crazy, we are about to usher in a revolution like the 1980s and 1990s. This will allow many people to realize their ideas, hopefully reducing development costs and helping studios be more adventurous with new intellectual property.

This is the future of video games and movies.

VR has more possibilities from now on.

Explore a new world

Whether it’s Midjourney, FLUX, or Runway, DreamMachine, and most of the GenAI tools we are familiar with can only produce image/video 2D content.

IfIt is generated in 3D, and the controllability and consistency of the video are greatly improved.

This also means that the production of movies, games, simulators and other digital representations of the physical world will undergo earth-shaking changes.

The original intention of World Labs was to use spatial intelligence AI to model the world and reason about objects/locations/interactions in 3D space and time.

This time, they showed off this 3D generated world for the first time.

The following is a real-time rendering demonstration performed in the browser (Note: AI images are generated by FLUX 1.1 pro/Ideogram/Midjourney).

Input an image of an antique village generated by AI, and then you can get a 3D world.

Tip: This is a quaint village with cobblestone streets, thatched-roof cabins, a stone well in the central square, surrounded by flower beds

A magnificent palace , AI displays both light and shadow vividly.

An AI-generated origami picture immediately comes to life got up.

Or enter a museum photo, who knows? Can you imagine what the surrounding area is like?

AI helps you imagine everything, from entering and exiting to the next adjacent exhibition hall and exhibits...

For example, in this real-life picture, AI can also Visualize the world around you.

p>Camera effects

You can also reflect different camera effects. After the scene is generated, a virtual camera will be used for real-time rendering in the browser.

Through precise control of this camera, artistic photography effects can be achieved.

For example, simulating different depths of field to keep only objects within a specific distance from the camera clear:

You can also simulate sliding zoom (dolly zoom) by simultaneously adjusting the camera's position and Field of view to achieve this effect:

3D special effects

Most generative models predict pixels. Predicting 3D scenes has many benefits:

- Scene persistence: once a world is generated, it will exist stably. Even if you look away and look again, the scene doesn't change while you're out of sight.

- Real-time control: After generating a scene, you can move around it in real-time. You can look closely at the details of a flower or poke your head around a corner to see what's behind it.

- Geometric Accuracy: This generated world follows basic 3D geometric physics rules. They have a true sense of three-dimensionality and spatial depth, in sharp contrast to the illusory effect of some AI-generated videos.

The simplest way to visualize a 3D scene is to use a depth map. In a depth map, each pixel is colored according to its distance from the camera:

We can not only use 3D scene structures to create interactive special effects:

You can also create dynamic effects that run automatically to inject life into the scene:

The 3D world in famous paintings can also be interacted with in real time.

Enter Van Gogh’s outdoor cafe

Now, we You can experience iconic works of art in a whole new way!

There is nothing in the original painting, it is generated by the model.

Now, let us step into the world generated from the favorite works of Van Gogh, Hopper, Seurat and Kandinsky.

Creative Workflow

Now, 3D world generation can be very natural Combined with other AI tools, creators can use known tools to get an incredibly smooth new experience.

First, worlds can be created from text by generating images using a text-to-image model.

Different models have their own different styles, and the spatial intelligence world can inherit these styles.

The following are four variations of the same scene using different text-to-image models, all using the same prompt.

Tips: A vibrant anime-style teenage bedroom, with colorful blankets on the bed, a computer scattered on the desk, posters on the wall, and various sports equipment scattered around the room. . A guitar leans against the wall, and a cozy rug with a delicate pattern sits in the center of the room. The sunlight filtering in from the window creates a warm and energetic youthful atmosphere for the whole room.

Now, some creators have already tried it in advance.

For example, Eric Solorio usesUse this model to fill the gaps in your creative workflow, allowing the characters in the scene to go into battle, and even guide the precise movement of the camera.

Brittani Natail combined World Labs technology with tools such as Midjourney, Runway, Suno, ElevenLabs, Blender and CapCut to carefully design camera paths in the generated world.

As a result, different emotions were evoked in the three short films.

Now, the waiting list is open, so without further ado, hurry up and apply.

Spatial intelligence, the next frontier of computer vision

Previously, Li Feifei revealed in detail what "spatial intelligence" is for the first time in an event:< /p>Visualization becomes insight, seeing becomes understanding, and understanding leads to action.

She attributed human intelligence to two major intelligences, one is linguistic intelligence and the other is spatial intelligence. Although linguistic intelligence has attracted much attention, spatial intelligence will have a significant impact on AI.

In his public TED talk in April, Li Feifei also shared more of his thoughts on spatial intelligence, and also foreshadowed the goals of World Labs.

She said, "The ability to act that all spatially intelligent creatures have is innate. Because it can associate perception with action."

“If we want AI to surpass its current capabilities, what we need is not just an AI that can see and speak, but an AI that can act.”

Even NVIDIA senior computer scientist Jim Fan said, "Spatial intelligence is the next frontier of computer vision and physical intelligence."

As World Labs’ official blog explains, human intelligence encompasses many aspects.

Language intelligence allows us to communicate and connect with them through language. The most basic of them is spatial intelligence, which allows us to understand and interact with the world around us.

In addition, spatial intelligence has strong creativity and can present the pictures in our minds in reality.

It is with spatial intelligence that humans can reason, act and invent. From simple sandcastles to towering urban visualizations, you can't do without it.

In the latest interview with Bloomberg, Li Feifei said that human spatial intelligence has actually evolved over millions of years.

This is the ability to understand, reason, generate, and even interact in a 3D world. Whether you look at beautiful flowers, try to touch butterflies, or build a city, all of these are part of spatial intelligence.

This can be seen not only in humans, but also in animals.

So, how to make computers capable of spatial intelligence?What about strength? In fact, we have made tremendous progress, and the development of the AI field in the past ten years has been quite exciting.

A reminder, AI generates images and videos, and true knowledge can also tell stories. These models have reshaped the way humans work and live in entirely new ways.

And we have only seen the first chapter on the eve of the GenAI revolution.

Next step, how to surpass?

How to bring these capabilities to the 3D field. Because the real world is 3D, and human spatial intelligence is based on a very "native" ability to understand and operate 3D.

Today, a single image generates a 3D world model, giving us a preliminary understanding of spatial intelligence.

Online Consultation