Tumblr and WordPress Are Selling Your Data to Artificial Intelligence Companies

If you have a WordPress website or a Tumblr blog, you’ve probably created and published a significant amount of content there. While we all know that the Internet is not “private,” you probably posted those texts and images thinking that they belonged to you and would not be stolen by the very companies you relied on to post them.

By the way, WordPress and Tumblr are preparing to do just that. As first reported by 404 Media , both sites’ parent company, Automattic, has struck a deal to sell user data from Tumblr and WordPress to artificial intelligence companies like Midjourney and OpenAI. Artificial intelligence companies intend to use data to train their systems.

As if that weren’t enough, the preparation for the sale went poorly, and it looks like there were still large categories of Tumblr posts thrown into the mix that weren’t supposed to sell. This data includes:

  • Private messages from public accounts

  • Messages about deleted or blocked accounts

  • Unanswered questions

  • Private answers

  • Explicit messages

  • Posts from affiliate accounts, such as advertising campaigns, that Tumblr does not own the rights to. (Apple is specifically mentioned here.)

It is possible that this data was not actually submitted to OpenAI and Midjourney, but was simply identified and cleaned for this use. However, 404 Media could not confirm this. However, they were able to confirm that password-protected messages, direct messages, and media identified as CSAM were not included in this group. So… that’s good.

This may not be all WordPress sites.

Automattic points out that data collection only affects WordPress.com sites, and not content created on the WordPress CMS, which you may use with a site hosted elsewhere. In theory, your WordPress CMS sites not hosted on Automattic should be protected from these actions.

However, 404 Media could not confirm whether using Automattic plugins such as JetPack would result in unfair Automattic data sharing policies.

You do not have to agree to Automattic selling your data.

A source told 404 Media that Automattic will add a new setting to its properties on Wednesday allowing users to opt out of selling and sharing data with third-party companies. The publication obtained a copy of the new FAQ section, which details that this opt-out option will block crawlers from accessing your sites if you enable it “from the start.” If you decide to opt out later, Automattic will contact partners and “ask” them to remove your content from their datasets and training.

This formulation is not particularly reassuring. However, whenever Automattic releases this opt-out option, I still suggest you use it on your Tumblr and WordPress sites.

Following the 404 Media post, Automattic released a statement saying it was blocking major AI platform scanners and updating its listings to add new ones; has features that allow you to block search engines from indexing your sites, which can also prevent AI from crawling them; and that they only share public content hosted on WordPress and Tumblr from sites that have not opted out. However, they acknowledge that there are no laws preventing crawlers from honoring these preferences, and that they work with certain artificial intelligence companies “as long as their plans align with what our community cares about: attribution, opt-out, and control.” . “

What will artificial intelligence companies do with this data?

Companies like Midjourney and OpenAI require huge data sets to train their AI systems. Programs like Midjourney and ChatGPT would not be possible without the transfer of enormous amounts of information: this is how they “learn” to do what they do.

So your WordPress blog posts filled with your favorite recipes can be fed to generative AI models to teach them to “talk” about food (or anything at all); Your Tumblr photo dumps can teach models to recognize objects like a car or a bird. Data from all your sites, as well as those of millions of other users, is invaluable to artificial intelligence companies, which means it is extremely valuable to the companies that own those sites and can sell them. Automattic will likely make a ton of money from this deal, just as Reddit will likely make a ton of money from its own AI content licensing deal with Google .

Publishing and sharing online is fun, but maybe it’s time to take back what’s yours: If you don’t own the platform you share your original ideas on, consider moving them to the one you do have before your ideas become training wheels for artificial intelligence.

More…

Leave a Reply