我们认为,不应该让用户为了生成图片而“学习如何给出提示”。工具应该简单易用,让用户能够直观轻松地尝试、不断调整、优化和重新组合创意,就像在与朋友分享自己的奇思妙想一样。因此,我们正在尝试一些新的东西!
Whisk 是 labs.google/fx 最新的生成式图像实验,专注于帮助用户快速进行视觉构思,而无需深入理解如何给出提示!
只需上传几张图片作为简单指引(场景、主体、风格),Whisk 就会尝试捕捉它们的精髓,并生成一些图片供您继续构思。
Gemini 模型会在后台自动为图片编写详细的图片说明,然后将这些说明输入到 Google 最新的图片生成模型 Imagen 3 中。
无论是将一幅画变成毛绒玩具、制作一张精美的节日贺卡,还是以图像形式描绘故事开头…我们都很期待看到您运用 Whisk 创作的内容。
Whisk Animate is a new feature for Google One AI Premium subscribers in supported countries that lets you transform generated images into short videos with Veo 2. Subscribers have the ability to generate 100 videos a month – credits do not accumulate.
准备工作
提供图像元素供 Whisk 分析和组合。您可以拖放图片或从文件夹中上传图片,还可以使用文本提示创建简单的参考…或者选择“给我灵感”或使用“掷骰子”功能,让我们为您提供一些创意灵感。
幕后揭秘:Gemini 的视觉理解功能会根据这些素材资源编写图片说明。Whisk 会使用这些说明生成图片。点击“修改”,即可查看生成结果是否符合预期,并根据需要进行调整!
探索
合成时间到!您可以选择素材资源(1 个或多个主体、1 个场景、1 种风格),然后把它们交给 Whisk 处理。系统会把这些素材资源融合成富有创意的混搭作品。
看看 Whisk 生成的结果,然后根据需求进行调整!您也可以提供一些简单的指引,对细节进行打磨,让您的想象力自由驰骋。
“让角色吃冰淇淋”
“恐龙和猫咪在击掌!”
“确保珐琅胸针是圆形的。”
“将配色方案调整为柔和色调”
幕后揭秘:Gemini 会根据图片说明和您的指引,为您生成提示。点击“修改”即可查看向 Imagen 3 提供的提示。
优化
生成的图片符合您的预期,但或许您想把帽子换成蓝色的,或是想在背景中添加日落。您可以进入“优化”模式,要求系统进行轻微到中度的改动,同时尽量贴近原图。
幕后揭秘:Gemini 会根据您的指引更新提示!我们仍会根据提示重新生成所有像素,但会要求模型尽量生成与原图相似的图片。
诊断
说实话,结果可能会出人意料!或许某些元素被遗漏了?或许无法生成您想要的图片?
在上述任何阶段,您都可以通过点击提示按钮/图标来诊断底层提示,并对其进行修改,手动添加关键细节,然后要求模型生成更多图片供您选择。最终,一切由您掌控 :-)
主体
主体是图片中的重点内容。它可以是角色、物体,或二者兼具。例如,可以是一部老式拨号盘电话、一把酷炫的椅子、一个纸质电影立牌、一个神秘的文艺复兴吸血鬼!您也可以提供自己的照片作为参考依据,看看会生成怎样的图片 :-)
场景
场景是主体出现的地方。它可以是时装秀 T 台,也可以是立体弹出式节日贺卡。您可以将新角色带入场景中,放在现有角色旁边,或者将现有角色替换掉,试试看效果如何。
风格
如果您想就主体和场景采用的美学形式、材质或表现技巧提供更多指引,请选择“风格”。例如,可在主提示框中指定您最关心的内容,让指引更加明确。
您可以使用自然语言添加更多细节(例如,“主体正在享用生日晚餐”),Whisk 会尝试将这些细节融入图片中。
In order to remix elements from different images together, we first need to develop an understanding of each image you reference. This is where Gemini’s multi-modal understanding comes in! When you upload an image, Whisk uses Gemini to visually understand those images and generate text descriptions (or captions) about them. Or in other words, translate that image to text (I2T). These descriptions are meant to capture the essence of your references, not to replicate the original, to facilitate remixing ideas.
These captions are then used to write a detailed prompt to generate an image based on your guidance using our latest and most powerful image generation model, Imagen 3. Or in other words, translating text back to image (T2I).
Whisk Animate lets you transform generated images into short videos with Veo 2 by letting you specify motion guidance for your Whisk generated images.
This process above helps Whisk better understand and represent the ideas you’re forming, and iterate while conversing with you.
Outputs will only resemble your uploads, not be an exact copy. In our experiment, Whisk extracts only a few key characteristics from the image you provide to guide the model. Our goal is not to create an exact replica, but rather to capture the essence of the subject.
Therefore, the generated image may differ in appearance. For example, the generated subject might be of a different height, weight, or have a different hairstyle or skin tone. We understand that these features may be crucial to the unique identity of your character. To achieve a result closer to your vision, we encourage you to provide more detailed prompts and refine your instructions.
You can use the top right menu to send us feedback.
We’re working to bring our tools to as many people as possible. Whisk is available to 18+ users in all labs.google/fx countries except the UK.
Whisk Animate is now available to Google One AI Premium subscribers for 100 video generations per month in the following countries: American Samoa, Angola, Antigua and Barbuda, Argentina, Australia, Bahamas, Belize, Benin, Bolivia, Botswana, Brazil, Burkina Faso, Cabo Verde, Cambodia, Cameroon, Canada, Chile, Côte d'Ivoire, Colombia, Costa Rica, Dominican Republic, Ecuador, El Salvador, Fiji, Gabon, Ghana, Guam, Guatemala, Honduras, Jamaica, Japan, Kenya, Laos, Malaysia, Mali, Mauritius, Mexico, Mozambique, Namibia, Nepal, New Zealand, Nicaragua, Niger, Nigeria, Northern Mariana Islands, Pakistan, Palau, Panama, Papua New Guinea, Paraguay, Peru, Philippines, Puerto Rico, Rwanda, Senegal, Seychelles, Sierra Leone, Singapore, South Africa, South Korea, Sri Lanka, Tanzania, Tonga, Trinidad and Tobago, Türkiye, U.S. Virgin Islands, Uganda, United States, Uruguay, Venezuela, Zambia, and Zimbabwe. This lets you provide motion guidance to your Whisk image creations, making them come to life!