ChatGPT Just Received a Huge Update to Its Image Generation System
OpenAI has significantly improved ChatGPT’s image generation capabilities with an update to the GPT-4o model introduced last May . The new and improved AI generator is now rolling out to all ChatGPT users on both paid plans and the free tier (though free users are more limited in how they can use it).
It’s been possible to generate images through the ChatGPT interface for some time now, although behind the scenes that work has been handed over to the DALL-E 3 image model. Now everything will be handled by GPT-4o for a more consistent and natural experience.
There are plenty of improvements here that cover some of the areas that AI image creation tools typically struggle with: rendering text, keeping characters consistent across images, and drawing diagrams. OpenAI says you can now expect more “accurate, accurate and photorealistic” results from your suggestions.
More realistic and accurate images
AI-generated images often have an artificial sheen to them that suggests they were thought up by algorithms, and this should be less obvious with GPT-4o images. One of the demo images provided by OpenAI shows a woman writing on a whiteboard and captures the view – all quite realistic, although note the small caption at the bottom that says this was the best of the eight attempts ChatGPT made on the command line.
AI arts users should also follow given prompts more closely, OpenAI said. So, if you need specific objects in certain places, or you need people in certain positions, then those instructions are likely to be followed more faithfully. One of the most impressive images shows a four-panel comic created by ChatGPT, without any obvious errors or inconsistencies.
I tried getting ChatGPT to turn Austen’s novel into a comic and create a photorealistic rendering of a stately home and garden, and the results were impressive, if not quite perfect. They are certainly significantly better than the images ChatGPT produced previously, although they take longer to render (usually minutes rather than seconds).
Text and diagrams have been greatly improved.
Trying to get AI to accurately display text and diagrams has long been a challenge: the way these tools are built means they are much better at inventing and remixing the images they were trained on, rather than reproducing an exact copy of the alphabet or a series of boxes and arrows.
The new GPT-4o model can display text and charts with a high level of detail and accuracy, so you won’t see as many weird errors and inconsistencies. OpenAI’s showreel included a menu, invitation, boarding pass , and diagram explaining Newton’s prism experiment, all generated from a single text prompt.
When I asked ChatGPT to create an infographic explaining DNA in simple terms, and a book cover with the title and author provided, everything was very accurate to the brief – the graphics were simple but to the point (as prompted), and the book cover looked like something you might see in a store. Equally important, there were no strange artifacts or inconsistencies in the images.
Consistency and Editing
I’ve already written about the limitations of ChatGPT image editing , and this is another area that has been updated. It’s now easier to keep characters and scenes consistent between images, adjust only parts of an image while leaving the rest untouched, and create different layers of an image. If necessary, you can even create a transparent background or specify colors using hexadecimal codes.
Other improvements come from the fact that ChatGPT can accept and mix your own images, as well as incorporate other information (from the web and training data): so one of OpenAI’s demo images was created based on the prompt “create a visual infographic describing why SF is so foggy”, and ChatGPT did exactly that (well, the best of the three).
In my own tests, I found ChatGPT to be much better at image editing and quite competent at mixing images in different styles. He still has some difficulty maintaining consistency between images, especially with complex objects and characters. It’s definitely better than it was before, but there’s still a tendency to overdo the editing, making the AI less useful for tweaking images or creating a series of multiple images that need to match.
Copyright and Security Issues
As with any claim about generative AI, issues related to copyright, misuse, and energy consumption again come to mind. OpenAI has officially stated that it is not possible to create these tools without training in working with copyrighted images, although it has recently begun signing content agreements with providers such as Shutterstock. Brad Lightcap, OpenAI’s chief operating officer, told the Wall Street Journal that the GPT-4o image generator will reject requests to imitate the work of any living artist.
On the security front, OpenAI says all generated images contain C2PA metadata to identify them as AI-generated, although this metadata can easily be removed with something as simple as a screenshot. The AI generator is also designed to fend off any attempts to create “child sexual abuse material and sexual spoofing,” OpenAI says, as well as other clues that violate its content policies.
This is certainly a big step forward for AI depictions: the upgraded technology is simply mind-blowing at times, and many of the telltale signs of AI and the mistakes made by this technology are disappearing. But it raises some big questions about the future we’re all striving for, one where it’s so easy to make fakes, where creative work is done by robots rather than humans, and where we’re collectively losing the ability to draw pictures, form sentences, or write lines of code. How then can generative AI find more training data?