How Does the New ChatGPT Image Generator Compare to the Gemini Nano Banana Pro?

Following the massive image editing improvements added to Google Gemini in August under the whimsical codename Nano Banana, it’s now OpenAI’s turn to significantly expand the capabilities of ChatGPT’s image processing tools. The new update, called GPT Image 1.5, is now available to all users .

One of the key improvements, as with Nano Banana, is that ChatGPT can now edit a specific part of an image while keeping the rest intact. You can add or remove elements, change the color or style of something, without ending up with a completely different image.

Another feature ChatGPT borrowed from Gemini is the ability to combine multiple images into a single scene. Want to see yourself and your best friend in front of the Sydney Harbor Bridge? No problem—just provide the source photos, and the AI ​​will do the rest. You can also change visual styles while maintaining consistency in detail.

You may also like

OpenAI claims its new image editor and generator can “more reliably” follow instructions and process images up to four times faster than before. Text can be more varied in style and size, and images overall should be more realistic and error-free—though OpenAI also acknowledges that there’s still room for improvement.

This is the best image generation tool we’ve ever seen from ChatGPT, and it looks impressive at first glance—but how does it compare in practice to Gemini and Nano Banana? I tested both models using the $20/month plan on both platforms (ChatGPT Plus and Google AI Pro, respectively) to compare their results.

Image visualization and editing

Open ChatGPT on the web or mobile, and you’ll see a new “Images” tab in the left navigation bar. It takes you to your library of existing images and also includes several new image suggestions. You’ll receive several suggestions for each suggestion, as well as a set of preset portrait image styles you can apply.

The journalist, lamp, and countryside are courtesy of Gemini. Source: Gemini
Journalist, lamp, and countryside — images courtesy of ChatGPT. Source: ChatGPT

I tested the new GPT Image 1.5 model by asking ChatGPT to generate an image of a busy tech journalist, a lamp in the middle of an empty warehouse, and a cartoon-style hilly landscape shrouded in fog. I then asked Gemini to generate the same images with the same cues. While the results varied considerably, they were roughly equal in terms of quality and realism—there were occasional issues with odd physics and repetition, but nothing too serious.

Both ChatGPT and Gemini are now quite capable of high-quality image processing: both AI bots seamlessly replaced the journalist’s clothing with a shirt and tie, without affecting other parts of the photo. Manually, this would have taken a significant amount of time, even for an experienced Photoshop user, demonstrating the dramatic change AI-powered image processing can make.

The color changes were executed flawlessly, but the AI ​​struggled a bit with perspective shifts when I asked it to show the same photo from a different angle. In these cases, instructions were less accurately followed, and the resulting images were less consistent (since new areas had to be rendered), although ChatGPT performed slightly better than Gemini in terms of producing good results.

Now you can change clothes in seconds (Gemini version). Source: Gemini
Now you can change clothes in seconds (ChatGPT version). Source: ChatGPT

The classic “remove the object from this image” problem was brilliantly solved: Gemini and ChatGPT were able to surgically remove a cottage from a rural landscape while leaving everything else intact. Again, this is the kind of labor-intensive image processing that previously required considerable effort, but can now be accomplished in seconds.

What do you think at the moment?

Gemini’s attempt to demolish the cottage. Photo: Gemini
ChatGPT’s attempt to demolish a cottage. Source: ChatGPT

Combining and processing images

Another advantage of ChatGPT and Gemini is the ability to merge images. You can have separate photos of yourself and your parents, combine them into a single image, and then add any background you like. You can get perfect family photos without gathering your relatives or traveling anywhere.

In this regard, Gemini and ChatGPT fell a bit short: the editing skills were still impressive, but the results didn’t always look like a cohesive, unified scene. The lighting was sometimes off, or elements from different images were displayed at different scales, requiring constant tweaking, editing, and questioning to get everything just right.

ChatGPT performed slightly better at combining different images and elements, as well as changing the overall look of the image. When I tried to force the AI ​​to blend all my images together into a dark, noir-style shot, ChatGPT produced a fairly consistent result, while Gemini’s work resembled more of a copy-and-paste operation.

Constantly remixing photos—adding new people, changing the weather, changing locations—can be a lot of fun, and both of these bots are now capable of incredible results. Remixing photos of family and friends will be popular, but it’s not that easy: with familiar people, any generative AI added tends to look wrong, because neither ChatGPT nor Gemini know exactly what these people look like, how they smile, what their body shape is, or how they typically stand or sit.

Gemini can merge images, but they will appear as separate images. Source: Gemini
ChatGPT did a better job of creating a new image that looked correct. Source: ChatGPT

Comparing ChatGPT and Gemini, both models are currently at a high level—a level that gives everyone access to advanced Photoshop-style image editing capabilities. If either AI model currently has an advantage, it’s ChatGPT, but the difference is small. It will also be interesting to see how image editing capabilities develop in the future.

More…

Leave a Reply