The transfer comes at a time when synthetic intelligence corporations have been accused of plagiarizing content material from publishers to create AI-generated summaries with out giving credit score or asking for permission.
Reddit stated that it might replace the Robots Exclusion Protocol, or “robots.txt,” a broadly accepted normal meant to find out which elements of a web site are allowed to be crawled.
The corporate additionally stated it should preserve rate-limiting, a way used to regulate the variety of requests from a specific entity, and can block unknown bots and crawlers from knowledge scraping – amassing and saving uncooked data – on its web site.
Extra lately, robots.txt has develop into a key device that publishers make use of to stop tech corporations from utilizing their content material free-of-charge to coach AI algorithms and create summaries in response to some search queries.
Final week, a letter to publishers by the content material licensing startup TollBit stated that a number of AI corporations have been circumventing the net normal to scrape writer websites.
This follows a Wired investigation which discovered that AI search startup Perplexity doubtless bypassed efforts to dam its Internet crawler through robots.txt.
Earlier in June, enterprise media writer Forbes accused Perplexity of plagiarizing its investigative tales to be used in generative AI techniques with out giving credit score.
Reddit stated on Tuesday that researchers and organizations such because the Web Archive will proceed to have entry to its content material for non-commercial use.
© Thomson Reuters 2024
Discover more from Trending world updates
Subscribe to get the latest posts to your email.