People Change Location When Searching for Photos on ChatGPT, and It Really Works
This week OpenAI announced its latest models : o3 and o4-mini. These are models of reasoning in which a clue is broken down into several parts, which are then processed one at a time. The goal is for the bot to “think” about the query more deeply than other models and come up with a deeper and more accurate result.
While there are many possible features of OpenAI’s “most powerful” reasoning model, one application that has made some waves on social media is geo-guessing—the process of determining location by analyzing only what you can see in an image. As reported by TechCrunch , X users are reporting their experiences asking o3 to identify locations from random photos and showing brilliant results. The bot will guess where it thinks the photo was taken and identify the reasons why it thinks so. For example, he might say that he focused on a license plate of a certain color representing a certain country, or that he noticed a certain language or writing style on the plate.
According to some of these users, ChatGPT doesn’t use any metadata hidden in images to help determine location: some testers remove this data from photos before sharing it with the model, so in theory it only works based on reasoning and web searches.
On the one hand, implementing ChatGPT is a fun challenge. Geotargeting is all the rage on the internet right now, so making the practice more accessible would be a good idea. On the other hand, there are clear privacy and security implications: someone with access to the o3 ChatGPT model could use the reasoning model to determine where someone lives or is located based on an anonymous image of themselves.
I decided to test out the o3’s geo-guessing capabilities with some footage from Google Street View to see if it lives up to the online hype. The good news is that, in my own experience, it is far from a perfect tool. In fact, it doesn’t seem to do much better at this task than non-reasoning OpenAI models like 4o .
Testing o3’s geo-guessing skills
The o3 handles clear landmarks with relative ease: I first tested the view from a highway in Minnesota, looking at the Minneapolis skyline in the foreground. It only took the bot a minute and six seconds to locate the city and realize we were looking at I-35W. He also instantly identified the Pantheon in Paris, noting that the screenshot was taken during its renovation in 2015. (I didn’t know this when I sent it!)
Next, I wanted to try out little-known attractions and locations. I found a random street corner in Springfield, Illinois, where the city’s Central Baptist Church stood, a red brick building with a spire. And this is where things got interesting: o3 cropped the image into several parts, trying to find distinctive characteristics in each of them. Since it’s a reasoning model, you can also see what it looks for in certain cultures. As in other cases where I tested reasoning models, it is strange to see a bot “thinking” in human interjections. (e.g., “Hmm,” “but wait,” and “I remember.”) It’s also interesting to see how he highlights specific details, such as noting the architectural style of a part of a building or where in the world a certain park bench is most likely to be seen. Depending on where the bot is in its thought process, it may start searching the Internet for more information, and you can click those links to figure out what it’s referring to yourself.
Despite all this reasoning, this location stumped the bot and it was unable to complete the analysis. After three minutes and 47 seconds, the bot seemed close to figuring it out and said, “The location at 400 E Jackson Street in Springfield, Illinois may be next to St. Paul’s Cathedral Church. My photo didn’t capture the entire board, so I need to adjust the coordinates and check the bounding box. Alternatively, architecture may help identify this – a red brick Greek Revival building with a white spire combined with high-rise building that could be an embassy.” Plaza. The term “Redeemer” may refer to the “Lutheran Church of the Redeemer”. I will search my memory for more detailed information about attractions near this address.”
The bot correctly identified the street and, more importantly, the city itself. I was also impressed by the analysis of the church. While he was trying to identify a specific church, he was able to analyze its style, which could set it on the right path. However, the analysis quickly fell apart. The next “thought” was what this place might be like in Springfield, Missouri or Kansas City. It was the first time I’d seen anything about Missouri, and it made me wonder if the bot was hallucinating between the two Springfields. From there, the bot lost the plot, wondering if the church was in Omaha, or maybe it was the Governor’s Mansion of Topeka (which doesn’t look like a church at all).
He continued to think for a couple more minutes, wondering about other places the block might be, before stopping his analysis altogether. This coincided with a subsequent experience I had while testing a random city in Kansas: after three minutes of deliberation, the bot decided that my image was from Fulton, Illinois, although, to its credit, it was pretty sure the image was from somewhere in the Midwest. I asked him to try again, and he thought for a while, again guessing completely different cities in different states, before pausing the analysis forever.
Now is not the time to be afraid
The point is that GPT-4o seems to be about equal to o3 when it comes to location recognition. He was able to instantly identify the Minneapolis skyline and immediately guessed that the photo from Kansas was actually in Iowa. (Of course, this was wrong, but he jumped the gun on it.) This seems to be consistent with other people’s experiences with the models: TechCrunch was able to get o3 to identify one location and 4o couldn’t, but otherwise the models matched evenly.
While there are certainly some privacy and security concerns associated with AI in general, I don’t think o3 in particular should be singled out as a specific threat. Of course, it can be used to correctly guess where an image was taken, but it can also easily be wrong or fail completely. Considering the 4o is capable of delivering the same level of accuracy, I’d say there’s just as much concern today as there has been in the last year or so. It’s not great , but it’s not terrible either. I’d save the panic for the AI model, which almost always gets it right, especially when the image is unclear.
Regarding privacy and security concerns, OpenAI shared the following with TechCrunch: “OpenAI o3 and o4-mini bring a visual rationale to ChatGPT, making it more useful in areas such as accessibility, research, or identifying locations in emergency response. “We have worked to train our models to refuse requests for private or sensitive information, added safeguards to prevent models from identifying private individuals in images, and proactively monitor and take action against abuses of our privacy policies.”