Don’t Trust Google Trends Map

Google Trends went viral last week with the map with the most spelling errors in every state in the United States:

This was quickly emulated by America’s other internet titan, Pornhub:

These cards are innocent entertainment! Typos are fun (say “porm” out loud) and infographics look like candy and take learning. But the webcomic xkcd claims they are mostly fictional too:

Unlike terms related to climate or geography, the most misspelled words probably don’t differ much from state to state. So, in order to find some significant differences, data analysts have to dig into the most common elements that do differ. By the time they find something interesting, they’ve probably already processed irrelevant data. But the maps above do not contain any of these caveats.

Google Trends and Pornhub did not respond to requests for comment. Data scientist Roban Kramer gave some context in an email:

I think the main point of the xkcd cartoon is: “Half the time you’re just sampling random noise because the underlying data doesn’t change much from state to state.” It seems quite likely to me without looking at this particular dataset.

Basically, you should never take inferences seriously from data or any visualization that does not give you information about uncertainty or noise.

Before I get serious about this kind of maps, I would like to know things such as, for example, how big are the differences in the count of the first and second misspelled words, how much the frequency of these words actually differ from state to state, and how stable ratings are in within the state from month to month.

Therefore, if you cannot analyze the data yourself, these cards are for entertainment purposes only. In itself, this doesn’t seem like a big problem. These cards are not really useful, so all they do is create fake little things.

But playing with data quickly and fluently often has more serious implications. This is one of the reasons we receive so many conflicting dietary recommendations. As FiveThirtyEight has demonstrated , it’s easy to analyze enough data to find “statistically significant” correlations, say, between potato chips and high math scores. This is one of the reasons we are inundated with conflicting studies that say coffee, milk, and wine cause and prevent cancer. When these results are reported by the media without the proper qualifications of false positives, they jeopardize public confidence in science .

So the next time you hear that some research has found a dramatic and shocking correlation, check out their methods. When there is good reason to doubt it, treat it like a brightly colored search map: just for fun.

More…

Leave a Reply