We believe you shouldn’t have to “learn how to prompt” to create images. It should be easy to try things out, iterate, refine and remix ideas visually. Like you’d do with a friend. So we’re trying something new!
Whisk is labs.google/fx’s latest generative imagery experiment, focusing on fast visual ideation without the need to deeply understand prompting!
Just throw in a couple of images for light guidance (scene, subjects, styles) and Whisk will try to capture their essence to suggest some images for you to keep ideating on.
Behind the scenes, the Gemini model automatically writes a detailed caption of your images. It then feeds those descriptions into Google’s latest image generation model, Imagen 3.
Whether it’s turning a drawing into a plushie, creating an epic holiday card, or visualizing the beginning of a story… We’re excited to see where Whisk takes you.
Prepare
Bring in visual elements for Whisk to analyze and combine. Drag and drop an image, upload it from a folder. You can also create a simple reference from a text prompt, … or have us seed a couple ideas by selecting “inspire me” or using the “roll the dice” features.
Behind the scenes: these assets go through Gemini’s visual understanding for captioning. These text descriptions are what Whisk uses. Click edit to see if we got it right and refine as needed!
Explore
Time to Whisk things up! You can select assets (1 or more subjects, 1 scene, 1 style) and put them to work. The system will bring those together in creative remixes.
See what Whisk comes up with, and keep riffing! You can also throw in some light guidance to play around with details and keep your imagination going.
“Make the characters eat ice-cream” “The dinosaur and the cat are high fiving!” “Make sure the enamel pin is round.” “Adjust the color scheme to follow a pastel palette”
Behind the scenes: Gemini composes prompts from the captions + your guidance to create the prompt for you. Click edit to see what it’s been whispering to Imagen 3.
Refine
See an image you like but maybe that hat should be blue? Or is it missing a sunset in the background? Enter refine mode and ask for smaller to medium changes that stay directionally close to the original.
Behind the scenes: Gemini updates the prompt based on your guidance! We still regenerate all the pixels from that prompt, but ask the model to stay close.
Diagnose
Let’s be honest, things might go in wild directions! Maybe some elements were dropped? Maybe that exact thing you’re looking for just doesn’t match?
At any stage above you can diagnose the underlying prompts by clicking the prompt button / icon and edit them, add in those critical details manually and ask the model to generate more options. Ultimately, you’re in control :-)
Subject
That’s what the image is about! Character, objects or a combination of such. An old rotary phone! A cool chair! A cardboard movie display. A mysterious renaissance vampire. You can also throw yourself in as a directional reference and see what comes out :-)
Scene
Where the subjects will show up. A fashion runway? A pop up holiday card? You can bring characters in the scene alongside the ones already there; or maybe you can swap them in? Worth trying out.
Style
Maybe you want to throw more guidance on the aesthetic, material or technique used to represent the above. Style is for that. Feel free to specify what you care about most in the main prompt box to reinforce that guidance.
You can refer to them in natural language when you add more details (e.g. “our subjects having a birthday dinner”), and Whisk will try to weave that in.
We’re included several ways for you to get a sense for how this works natively in the tool.
Playground: our landing page is a simplified experience of the tool for you to feel the magic in one action. Drop in an image and see it transform into a plushie! (or sticker! or enamel pin!)
Inspire me flow: this button will show when you click “start from scratch”. It’ll pre-populate some assets. Suggest guidance and guide you through the key areas of the main UI to generate your first outputs. Easy!
Dice roll: located at the top of the left panel, it is here to quickly add a few subject, scene, style suggestions to get going… or keep riffing!
In order to whisk elements from different images together, we first need to develop an understanding of each image you reference. This is where Gemini’s multi-modal understanding comes in! When you upload an image, Whisk uses Gemini to visually understand those images and generate text descriptions (or captions) about them. Or in other words, translate that image to text (I2T). These descriptions are meant to capture the essence of your references, not to replicate the original, to facilitate remixing ideas.
These captions are then used to write a detailed prompt to generate an image based on your guidance using our latest and most powerful image generation model, Imagen 3. Or in other words, translating text back to image (T2I).
This process above helps Whisk better understand and represent the ideas you’re forming, and iterate while conversing with you.
This is intentional. In our experiment, Whisk extracts only a few key characteristics from the image you provide to guide the model. Our goal is not to create an exact replica, but rather to capture the essence of the subject.
Therefore, the generated image may differ in appearance. For example, the generated subject might be of a different height, weight, or have a different hairstyle or skin tone. We understand that these features may be crucial to the unique identity of your character. To achieve a result closer to your vision, we encourage you to provide more detailed prompts and refine your instructions.
You can use the top right menu to send us feedback
Whisk is currently only available in the US, using text inputs in english. We’re working on expanding to more countries soon!
Yes, just click on the download icon to save and share. We’d also love to see what you create, please share with us through our discord channel too!
For information about your user data, user history, our generative policies, how to send feedback and more, please check-out labs.google/fx’s FAQ