Broadcast United

Reddit to update web standards to prevent automated site scraping

Broadcast United News Desk
Reddit to update web standards to prevent automated site scraping

[ad_1]

Social media platform Reddit said this week it would update web standards used by the platform to block automated scraping of data from its site following reports that artificial BroadCast Unitedligence startups were circumventing the rules to collect content for their systems.

Currently, AI companies are being accused of plagiarizing content from publishers, creating AI summaries, and failing to cite the source or ask for permission.

Reddit said it will update the Robots Exclusion Protocol, or “robots.txt,” a widely accepted standard for determining which parts of a website are allowed to be crawled.

The company also said it will continue to use rate limiting technology, which controls the number of requests from a specific entity, and will block unknown robots and crawlers from scraping data (collecting and saving raw information) on its website.

More recently, robots.txt has become a key tool used by publishers to prevent tech companies from using their content for free to train artificial BroadCast Unitedligence algorithms and create snippets for certain search queries.

Last week, content licensing startup TollBit wrote a letter to publishers saying that several artificial BroadCast Unitedligence companies are circumventing web standards to scrape content from publishers’ websites.

This comes after an investigation by Wired magazine found that AI search startup Perplexity likely circumvented efforts to block its web crawlers via robots.txt.

In early June this year, business media publisher Forbes accused Perplexity of plagiarizing its investigative reporting and using it in a generative AI system without attribution.

Reddit said on Tuesday that its content would continue to be accessible to researchers and organizations such as the Internet Archive for non-commercial use.

[ad_2]

Source link

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *