Google's Nano Banana: Unlocking Character Consistency in Visual AI (2025)

Unleashing the Power of Visual AI: How Google's Nano Banana Revolutionized Character Consistency

In a captivating conversation, Nicole Brichtova and Hansa Srinivasan, the masterminds behind Google's groundbreaking Nano Banana image model, unveil the secrets behind its creation and its impact on the future of visual AI. From technical wizardry to human evaluation and accessible design, they reveal how these elements transformed a powerful capability into a viral consumer sensation, opening up new avenues for utility.

Character Consistency: The Obsessive Attention to Detail

While Gemini's multimodal foundation and long context window laid the groundwork for new possibilities, the true breakthrough in achieving lifelike faces came from meticulous data curation. The team's obsession with specific problems, such as text rendering and identity preservation, played a pivotal role. It's not just about scale; it's about the quality of the data and the dedication to solving these intricate challenges.

Human Evaluation: Capturing the Emotional and Qualitative

The team discovered that character consistency is incredibly subjective and almost impossible to evaluate quantitatively. Only you can truly judge if an image resembles you. This led to the development of robust human evaluation processes, including internal artist testing and executive reviews, to capture the emotional and qualitative aspects that benchmarks often miss. Human evaluation is essential for these subjective capabilities, ensuring the model feels right and resonates with users.

Fun as a Gateway to Utility: Lowering Barriers with Playfulness

The playful name, Nano Banana, and the red carpet selfie use cases served as a clever strategy to lower barriers, especially for older users intimidated by AI. By making the model accessible and fun, people discovered practical applications like photo editing, solving math problems, and visualizing information. This approach demonstrates that fun can be a powerful gateway to utility, encouraging users to explore and discover the model's capabilities.

The Craft of AI: Detail-Oriented Excellence

Small design decisions throughout the development process, from inference speed to conversational editing, and the shift towards generalization over narrow optimization, culminated in capabilities that felt magical. The team emphasizes that the "detail-orientedness of high quality" is what separates good models from breakthrough ones. It's not just about the architecture; it's the meticulous craft and attention to detail that elevate the model's performance.

Specialized Models: Proving Grounds for Multimodal Systems

Image generation advances faster than video due to the lower cost of training and serving single frames, providing a preview of capabilities coming to other modalities. Specialized models like Veo and Nano Banana serve as stepping stones towards the ultimate goal: a single model that can transform any input into any output. These specialized models act as proving grounds, pushing the boundaries of multimodal systems and paving the way for unified capabilities.

Introduction

Hansa Srinivasan: There's something truly captivating about visual media. It's not just fun; it's intuitive and exciting. Visuals are a significant part of how we experience life, and it's incredible to see how much it moves people.

Nicole Brichtova: We're empowering people to tell stories they never could before. The camera democratized reality capture, and now we're capturing people's imagination. We're giving them the tools to visualize their ideas, bringing their thoughts to life in ways they couldn't before.

Stephanie Zhan: Today, we delve into the story of Google's Nano Banana image model with Nicole Brichtova and Hansa Srinivasan. From its humble beginnings as a late-night code name to its cultural phenomenon status, they guide us through the technical advancements, the role of high-quality data and human evaluation, and the importance of craft and infrastructure. We explore the balance between pushing the frontier and broad accessibility, and glimpse into the future of multimodal creation, personalized learning, and specialized UIs.

Main Conversation

Stephanie Zhan: Nicole and Hansa, thank you for joining us. Nano Banana has taken the world by storm, and we're eager to delve into its story. Let's start with a fun question: What are some of your personal creations or the most creative things you've seen from the community?

Hansa Srinivasan: One of the most exciting things I've seen is the use of Nano Banana with video models to achieve consistent cross-scene character and scene preservation. It's remarkable how people have combined tools to create smoother, more natural-feeling videos.

Pat Grady: How accessible is this workflow today?

Hansa Srinivasan: People are mixing different video models from various sources, so it's not entirely seamless yet. There are products trying to integrate multiple models, but the difference before and after Nano Banana's launch is noticeable. The scene cuts feel more natural, and it's a testament to the model's impact.

Pat Grady: [laughs]

Nicole Brichtova: One unexpected way people are using Nano Banana is for learning and information digestion. I met someone who creates sketch notes for various topics, feeding Gemini lectures to understand his father's work, a university chemist. The model's ability to render text coherently, despite not being its primary strength, has enabled this unique application.

Stephanie Zhan: Wow! That's incredible.

Nicole Brichtova: It's a heartwarming story of father-son connection through AI-assisted learning.

Pat Grady: That's a beautiful use case.

Hansa Srinivasan: People are finding creative ways to work with the model, improving its performance and unlocking mind-blowing capabilities. It's amazing to see the unexpected applications.

Pat Grady: Was there an "aha" moment during development when you realized Nano Banana's potential?

Nicole Brichtova: We had an internal demo where we played with the model. I took a selfie and prompted it to put me on the red carpet. It looked like me, and that's when I knew we had something special.

Stephanie Zhan: Wow! That must have been an exciting moment.

Nicole Brichtova: People saw the potential immediately. It's about turning yourself into a 3D figurine, expressing yourself in new ways, and enhancing your identity. It's a fun and powerful tool.

Stephanie Zhan: What made Nano Banana's red carpet experience so superior?

Nicole Brichtova: It looked like me, and that's a challenging feat. Judging character consistency on unknown faces is difficult. We now have team members evaluating their own faces, as it's the best way to judge if the model captures their likeness.

Hansa Srinivasan: Familiar faces are crucial. When we evaluated ourselves, the difference was significant. Preserving identity is fundamental to the model's utility, but it's a tricky challenge.

Pat Grady: How did you achieve character consistency, and was it a goal from the start?

Hansa Srinivasan: It was definitely a goal, and we had the right recipe with the model architecture and data. But until you see the model in action, you don't know how close you'll get. We were surprised by its excellence.

Nicole Brichtova: Consistency is essential for editing. Prior models struggled, making them less useful professionally and for character consistency. We knew there was a demand, and we felt we had the solution.

Hansa Srinivasan: Editing expectations are high, and it's technically challenging to preserve untouched elements. It's a basic expectation, but surprisingly difficult to achieve.

Pat Grady: How do you quantitatively evaluate character consistency?

Hansa Srinivasan: Face consistency is challenging to evaluate quantitatively. Human evals are crucial, especially for subtle aspects like aesthetic quality.

Nicole Brichtova: We use human evals, "eyeballing" model results, and community testing. Internal artists, execs, and community feedback help build the narrative around the model's awesomeness. It's about capturing the emotional aspect, like seeing yourself in new ways.

Hansa Srinivasan: Human evals are especially important for visual media, where subjectivity reigns. It's challenging to quantify, but human evaluation is a game-changer.

Stephanie Zhan: Achieving character consistency from a single 2D image is incredibly difficult. What were the technical breakthroughs?

Hansa Srinivasan: Good data that teaches models to generalize is key. Nano Banana is a Gemini model, a multimodal foundational model with excellent generalization capabilities. It's the secret sauce.

Nicole Brichtova: The long context window in Gemini allows for multiple images and iterative conversations with the model. Previously, it took 20 minutes to generate something resembling you, but now it's snappy and accessible.

Hansa Srinivasan: Attention to detail and high-quality data are crucial. Small design decisions matter, and having team members obsessed with specific problems, like text rendering, improves the model's performance.

Nicole Brichtova: It's about the craft of AI, which is often overlooked but essential.

Pat Grady: How big was the team, and how did you approach development?

Nicole Brichtova: It took a village to ship Nano Banana. We had a core modeling team and close collaborators across various surfaces.

Hansa Srinivasan: We ship across many products, so it's a collaborative effort. The model team is smaller, but infrastructure teams optimized the stack to handle the demand.

Pat Grady: Did you build Nano Banana with specific personas or use cases in mind?

Nicole Brichtova: It's a blend of both. We had an idea of the capabilities and design decisions, like inference speed, which influenced the target persona.

Hansa Srinivasan: We've shifted our philosophy. Previously, we worked on the Imagine line, straight image generation. With Gemini, generalization is a foundational capability, allowing for emergent capabilities like math reasoning.

Stephanie Zhan: How does Nano Banana fit into the Gemini ecosystem, and where do you see it going?

Nicole Brichtova: Our goal is to build the single most powerful model that can transform any modality. Specialized models like Imagen and Veo provide great results in specific domains, but we're moving closer to the vision of a unified model.

Image generation is ahead of the curve due to the lower cost of training and inference. We expect video developments to follow suit within six to twelve months.

Hansa Srinivasan: Specialized models are testing grounds, but over time, we expect Gemini to encompass all these capabilities.

Stephanie Zhan: Let's talk about the name. Was it a happy accident or a genius move?

Hansa Srinivasan: It was a happy accident. We needed a code name for the LMArena, and someone on the team suggested Nano Banana at 2:00 am. It was fun, easy to remember, and had an emoji, which is crucial for branding.

Nicole Brichtova: It was a stroke of genius. Once it went live, people embraced it, and it felt organic. It helped people find the model within the Gemini app, and now there are bananas everywhere.

Hansa Srinivasan: People were asking, "How do I use Nano Banana?" It played on Google's fun brand image and made the model more accessible and unintimidating.

Nicole Brichtova: Fun is a gateway to utility. Nano Banana is a fun entry point, but people discover its practical applications, like studying and solving math problems.

Hansa Srinivasan: My parents and their friends are using it. It's easy, fun, and non-intimidating. The chatbot naturalness breaks barriers, and the fun name and reputation make it more accessible.

Stephanie Zhan: Where do you see Nano Banana going from here, in terms of model and product development?

Nicole Brichtova: On the consumer side, we need to make it easier to use. Many prompts are lengthy, and people copy-paste them into the app. We need to move beyond prompt engineering.

On the professional side, we need more precise control and reproducibility for actual workflows. We're good at editing consistency, but we need to be perfect.

I'm excited about visualizing information. Nano Banana's use case for sketch notes is a glimpse into the future of LLMs helping us digest and visualize information naturally.

Stephanie Zhan: Are you considering vertical integration and building more product around Nano Banana? And are you alluding to a shift towards UI interactions?

Nicole Brichtova: Chatbots are an easy entry point, but for visual modalities, we need to think about the future creation canvas. It's about building products that don't overwhelm users and provide clear constraints.

At Google, we have the Labs team, led by Josh Woodward, who experiments with frontier thinking. They've built products like NotebookLM and Flow, and I'm excited about Flow's potential as a creation platform.

Hansa Srinivasan: In the short term, we need to improve consistency and seamlessness. Long-term, it's about rich multimodal generation, where text and images flow naturally. We need seamless generalization between modalities.

Nicole Brichtova: As we move towards proactive models that pull in relevant content, we're excited about the potential for agentic behaviors. For some use cases, you want the model to do the work, like creating slide decks with appropriate visuals. For creative workflows, you want to be involved in the process.

Hansa Srinivasan: It's about giving users fine-grained control and having the model understand and anticipate user needs, doing the intervening work.

Nicole Brichtova: It's like hiring a professional. These models should be able to understand user requests and deliver expert-level work.

Pat Grady: What's the next competitive battleground in this space?

Nicole Brichtova: Making models more capable and driving adoption through user interfaces. We still rely heavily on chatbots, but we need to think about who the users are and what they need.

Pat Grady: Will the frontier advance as quickly in the next five to ten years?

Nicole Brichtova: It feels like twenty years from now. The space is moving incredibly fast, and it's only accelerating.

Pat Grady: How does Google handle deepfakes and misuse of the model?

Nicole Brichtova: It's an evolving frontier. We want to give users creative freedom and control while preventing harm. We use visible and invisible watermarks to verify AI-generated content.

Hansa Srinivasan: It's a hard balance to strike. SynthID is an important technology that lets us release capabilities while verifying and combating misinformation risks. It's a tricky conversation, but one we take seriously.

Stephanie Zhan: Is SynthID the industry standard?

Nicole Brichtova: It's a Google standard. Every Google model, like Imagen and Veo, has SynthID when used in any product surface.

Pat Grady: Looking one to three years ahead, what will be possible, and how will it change our lives?

Nicole Brichtova: I hope we can achieve personalized tutors and textbooks. We should be able to learn in ways tailored to our learning styles and starting points.

Hansa Srinivasan: Working with these technologies has already changed how we work. I'm getting married, and we made our "save the dates" with our model. These models drastically increase our productivity.

I think we'll see people empowered to do more in the same amount of time. It's about changing the amount of work an individual can get done, not replacing them.

Stephanie Zhan: Are there areas startups should explore that Google might not?

Nicole Brichtova: There's room for creativity in UI design and bringing everything together. People work across LLMs, image, video, and music, but they have to use separate tools. We see opportunities for workflow-based tools in various verticals.

Hansa Srinivasan: Startups can make this technology useful for specific workflows. I'm excited about the number of people excited by Nano Banana.

Stephanie Zhan: My kids love it. My three-year-old son tied a rope around himself and turned into a warrior superhero. It makes him feel superhuman.

Hansa Srinivasan: That's the power of visual media. It's exciting and intuitive, and it moves people emotionally.

Nicole Brichtova: We're empowering people to tell stories they never could before. The camera captured reality, and now we're capturing imagination. We're giving people the tools to visualize their ideas.

Stephanie Zhan: Thank you so much for joining us.

Nicole Brichtova: Thank you for having us.

Google's Nano Banana: Unlocking Character Consistency in Visual AI (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Dr. Pierre Goyette

Last Updated:

Views: 6074

Rating: 5 / 5 (50 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Dr. Pierre Goyette

Birthday: 1998-01-29

Address: Apt. 611 3357 Yong Plain, West Audra, IL 70053

Phone: +5819954278378

Job: Construction Director

Hobby: Embroidery, Creative writing, Shopping, Driving, Stand-up comedy, Coffee roasting, Scrapbooking

Introduction: My name is Dr. Pierre Goyette, I am a enchanting, powerful, jolly, rich, graceful, colorful, zany person who loves writing and wants to share my knowledge and understanding with you.