OpenAI Barred from Accessing The Guardian’s Content for AI Development

The Guardian joins a growing list of publishers restricting OpenAI from sourcing their content.

By Dan Milmo, Global Technology Editor Fri 1 Sep 2023 17.54 BST

The Guardian has taken steps to prevent OpenAI from leveraging its content for AI products like ChatGPT. This move follows rising concerns over OpenAI’s potential use of unlicensed material, leading to lawsuits from writers and calls from the creative sector for stronger intellectual property protection.

The Guardian’s decision comes in the wake of the public’s fascination with generative AI technologies, such as ChatGPT, which produce realistic text, images, and audio based on human prompts. These technologies rely on extensive data from the internet, including news articles, to anticipate the next word or phrase after a user’s input.

While OpenAI remains tight-lipped about the specific data sources for ChatGPT, they recently introduced an option for website owners to block its web crawler. However, this doesn’t permit the removal of content from existing datasets. Several publishers and websites have since restricted access to the GPTBot crawler.

A Guardian News & Media representative stated, “Commercially exploiting intellectual property from The Guardian’s website has always violated our terms. We maintain numerous beneficial partnerships with global developers and anticipate forging more in the future.” reports that other major outlets blocking GPTBot include CNN, Reuters, Washington Post, Bloomberg, New York Times, and the Athletic. Sites like Lonely Planet, Amazon, Indeed, Quora, and have also imposed restrictions.

Recently, UK book publishers appealed to Rishi Sunak to prioritize the intellectual property rights of creative sectors in the upcoming UK AI safety summit. The Publishers Association emphasized the importance of respecting intellectual property laws during AI development.

In related news, Elon Musk, after addressing “extreme data scraping” on his rebranded Twitter platform, X, revealed plans to use public tweets for his new AI venture, xAI. Google now discloses its collection of public data for AI training, including its Bard chatbot. Additionally, Meta, Facebook and Instagram’s parent company, rolled out a policy allowing users to opt out of personal data usage for AI training.

OpenAI has yet to comment on the matter.

Write a Comment

Your email address will not be published. Required fields are marked *