There are already too many AI image-generating tools, but Google’s latest Whisk has found a very new way to play it, which even netizens who have seen it say it’s fun.
Just enter three pictures, subject, scene, and style, and Whisk can generate a picture that draws on everyone's strengths.
Image from: Google For example, the theme is the elderly, the scene is vines, the style is 90s retro animation, write the prompt word "Character riding a flying bicycle", wait for a while, a picture similar to Ji A new picture in the Bu Li style was born.
Picture from: Google Old Man is still the same old man, wearing a hat, a suit, and holding a book, but he rides the car in the prompt words, and the scene and style have also changed to that of the reference picture. This is the advantage of Whisk - it allows us to easily play with various styles while writing less and no prompt words. Mom no longer has to worry that I can't write prompt words.
Don’t write complicated prompt words, just bring up the pictures. Don’t look at just a few pictures. The gameplay of Whisk is simple, but endless. Upload three pictures - the theme picture, McDonald's French fries; the scene picture, Monet's painting "Water Lilies"; the style picture, the pixel style game "Stardew Valley". Without writing prompt words, it is generated directly. The result given by Whisk is that one picture is better than three.
In addition to uploading your own pictures, we can also roll the dice and let Whisk randomly generate themes, scenes, and styles.
In fact, the preset styles provided by Whisk are quite sufficient, including badges, stickers, embroidery, clay, American comics, mosaic collage, etc., with distinctive features and immediate effects.
As long as we have brains and imagination, without a word, just through the arrangement and combination of different pictures, we can continue to play cloze games - theme + scene + style, and not every blank All must be filled in.
1. Theme picture, smoked chicken; 2. Scene picture, Van Gogh's "Starry Night" painting; 3. Style picture, Japanese woodblock prints
1. Theme picture, "Wearing Pearls" Girl with an Earring"; 2. Scene pictures, stills from the movie "Spirited Away"; 3. Wind Grid picture, Mondrian abstract painting
1. Theme picture, WeChat "Death Smiley Face" emoticon pack; 2. Scene picture, stills from the movie "Interstellar"; 3. Style picture, Snoopy comic screenshot
1. Theme picture, surfing default avatar pink dinosaur momo; 2. Style pictures, Jellycat doll In addition, for each generation of Whisk, you can only choose one reference picture for scene and style, but you can choose multiple themes. what does that mean? We can have multiple characters in the same frame! For example, let Musk, Ultraman, and Zuckerberg all become enamel badges.
The costumes of the three, decoration, and expressions are all restored very well. Zuckerberg's microphone and necklace are not missing, but the faces cannot maintain consistency, and they all become popular faces. While Whisk reduces the need to write prompt words, Whisk also encourages writing if you need to. Add the sentence "The characters are holding a sign that says AGI" to the dialog box, and the badge villains easily followed the prompt word.
What if we need a certain scene or a certain style, but can’t find a reference picture at the moment, and Whisk’s presets don’t provide it? The solution is simple, if you don’t have a picture, just write a prompt word and let Whisk create one on the spot.
Just like I need a pixel-style base for the character to stand on as a scene, I let Whisk generate it for me.
Then, use the cat emoticon package as the theme image and the pixel chick as the style image to get a pixel cat with a base.
In short, Whisk is very free, like plasticine, you can shape it however you want. It can both generate and understand images, and package complex workflows into interesting "egg whisks". Whisk is actually a way for Google's multi-modal models to show off their muscles. To help us write fewer prompt words, Whisk integrates visual understanding and image generation capabilities. The Gemini model recognizes images and automatically generates detailed descriptions, which are then fed into Google's image generation model Imagen 3, which generates images. Whisk is like this. Users only need to upload and generate images, but it has many things to consider.
Every picture in Whisk, whether uploaded or generated, has a long underlying prompt word written in it, and it is not hidden. We can click on the picture to see it, and we can also modify it. . If you take a person as the subject picture, Whisk will describe his appearance in detail, and the scene picture will be similar.
Whisk's description of Ultraman: "A light-skinned man with short, dark brown curly hair, displayed from the chest up. He has light-colored eyes. He wears a light beige shirt Knitted crewneck sweater. The background is a mottled gray concrete wall. The man's expression is serious and neutral. The light is slightly dim, and his right face has a slight shadow. The style picture is slightly different, if you take a screenshot of the animation. Reference, Whisk I won’t say that there are three people in the picture, but describe the color, light, and lines of this painting...
Whisk’s description of Snoopy’s painting style: “This picture is painted in a cartoon style Rendered, with bold outlines and flat shading. The color palette is limited, using mostly primary colors and muted secondary colors. The light is even and lacks strong shadows or highlights, giving it a simple, almost childlike texture. and consistent, with a slightly uneven texture that suggests a hand-painted effect. The overall aesthetic is reminiscent of classics.Classic comic strips or children's animations. ” Therefore, Whisk does not accurately copy the picture, but extracts the characteristics and essence of the picture, and naturally integrates the theme, scene, and style, each performing its own duties without interfering with each other. At the same time, Whisk is also overlapping - it only extracts a small number of key features from the image, and the results may be different from expected. This also explains why Whisk cannot accurately restore human faces. Therefore, even if you choose a less abstract retro film style, the faces of the three bosses are not next to each other, but other details are accurate.
The same goes for objects. Tesla’s Cybertruck becomes very ordinary after feature extraction and regeneration.
But if it is a super IP with rich materials like McDonald’s French fries, the effect is not bad and can be used as an advertising image. I have tried some Disney characters, and Whisk reproduced them exactly as they were, but I won’t post the pictures.
In addition, Whisk also has a problem - it cannot make very detailed style references and cannot imitate a specific style of painting. When I asked Whisk to generate a Lego minifigure of the Mona Lisa, the result made me black-eyed. But with an additional prompt, "Make the character more like a Lego character," Whisk was able to imitate it 70% to 80%.
The painting style of a certain cartoonist is even more difficult to imitate. When you upload a screenshot of a cartoon for Whisk to refer to, it finally gives you a very ordinary cartoon style picture. Even if you use prompt words to emphasize the work, characters, and Cartoonists also play no role. In fact, Whisk is fun enough. It is more suitable for creative exploration that does not pursue precision, commonly known as complete work. Whisk can be translated as "stirring" or "egg whisk." The name Google has a strong visual sense. Isn't it just mixing and matching ingredients? Whisk's imprecision also positions it differently from traditional image editors and is more of a creative tool. If you have any ideas, use it to achieve rough visual effects.
Whisk generation, 1. Theme picture, screenshot of "Naruto"; 2. Style picture, plush toy. In the past, the stylized effect of Whisk was achieved, and through the entire image generation process, we may Workflow needs to be built in ComfyUI. But now with Whisk, it seems like you are playing a card drawing game or opening a blind box, and it is currently free as long as you can log in (only in the United States). Experience guide 👇 https://labs.google/fx/zh/tools/whiskGoogle Although leading model capabilities are the premise and foundation, designing products that are needed by everyone still requires creativity and aesthetics. I really like Whisk’s slogan: “prompt less, play more.” (Write less prompts, play more.)
Whisk comes from Google labs, the previously popular AI podcast NotebookLM also came from here, and later gradually developed into a mature project. This laboratory itself is the best annotation of this slogan. With powerful model capabilities, innovative products, and an open mind, Google, which once seemed threatened by OpenAI, has calmly shown the return of the king.