15% Of World’s Most Popular Websites Block ChatGPT Data Collection

OpenAI has unveiled a new tool named GPTbot.

This revolutionary web crawler has been crafted to accumulate data from all corners of the internet, amplifying the precision and capabilities of AI models.

OpenAI says that granting GPTbot access to websites can play a pivotal role in refining AI models鈥 accuracy, increasing their overall potential, and enhancing safety measures. However, it has come to light that a substantial 15% of the world鈥檚 top 100 websites have opted to block GPTbot鈥檚 access.

GPTbot鈥檚 Impact and Adoption

Originality.AI has released data that reveals that within the initial fortnight following the launch of GPTBot鈥檚 documentation, nearly 10% of the globe鈥檚 most prominent 1000 websites chose to prevent GPTbot鈥檚 intrusion.

Notable sites such as Amazon, Quora, Wikihow, and several international news outlets have taken measures to thwart GPTbot鈥檚 presence on their platforms. This brings into question the potential accuracy and limitations of ChatGPT.

The Mechanism Behind GPTbot

GPTbot operates through a structured process starting with the identification of potential data sources. This step involves web crawling where the tool scours the internet to pinpoint websites containing relevant information. Once an appropriate source is found, GPTbot extracts relevant data from the identified website.

The collected information is then catalogued within a database, used for the training of AI models.

Versatility in Data Extraction

One of GPTbot鈥檚 standout attributes is its ability to extract data from an array of sources, spanning text, images, and code. In terms of textual content, GPTbot extracts information from websites, articles, books, and diverse documents.

Furthermore, its ability extends to image-based data, allowing it to discern objects depicted within images and decipher textual content. Impressively, GPTbot can even extract code from repositories hosted on GitHub, as well as other code sources scattered across the internet.

The Nexus with AI Models

OpenAI鈥檚 flagship product, ChatGPT, and similar generative AI tools draw information from the data culled from websites to fuel their training processes. Even prominent figures like Elon Musk, in a previous iteration of the social media platform now known as Twitter, had intervened to halt OpenAI鈥檚 data scraping from the platform.

The creation of GPTbot represents a leap forward in AI advancement. By capturing data from the expansive digital landscape, GPTbot is poised to usher in a new era of AI proficiency.

The decision of some top websites to bar GPTbot鈥檚 access showcases the complexities around data usage rights. As OpenAI continues its stride toward AI excellence, the interplay between data, innovation, and legal considerations remains a central point.