Broadcast United

Inbreeding, nonsense, or madness? People sound warnings about artificial intelligence models

Broadcast United News Desk
Inbreeding, nonsense, or madness? People sound warnings about artificial intelligence models

[ad_1]

(Getty Images/Jaque Silva/SOPA Images/LightRocket)

(Getty Images/Jaque Silva/SOPA Images/LightRocket)


Last year, when scholar Jathan Sadowski was trying to come up with an analogy to describe how artificial intelligence programs can decay, he came up with the term “Habsburg AI.”

The Habsburgs were once one of the most powerful dynasties in Europe, but after centuries of inbreeding, their entire family bloodline eventually fell apart.

Recent research suggests that the AI ​​programs that products like ChatGPT rely on can experience similar breakdowns when they are repeatedly fed their own data.

“I think the term ‘Habsburg AI’ is outdated,” Sadovsky told AFP, saying the phrase he coined “has become more relevant for how we think about AI systems.”

The ultimate worry is that AI-generated content could take over the web, which in turn could render chatbots and image generators useless and throw a trillion-dollar industry into disarray.

But other experts believe the problem is overblown or can be solved.

Many companies are keen to use so-called synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content, but more predictable.

“An open question for researchers and companies building AI systems is how much synthetic data is too much,” said Sadowski, a lecturer in emerging technologies at Monash University in Australia.

Mad Cow Disease

Training AI programs, known in the industry as large language models (LLMs), involves scraping large amounts of text or images from the internet.

This information is broken down into trillions of tiny, machine-readable chunks called tokens.

When asked a question, a program like ChatGPT selects and assembles the sequence of tokens that its training data tells it is most likely to fit the query.

But even the best AI tools can produce falsehoods and nonsense, and critics have long worried about what would happen if models were fed their own output.

In late July, a paper published in Nature magazine titled “AI models collapse when trained on recursively generated data” became the focus of discussion.

The authors describe how the model quickly discarded the rarer elements of the original dataset, and the output degenerated into “gibberish,” as reported in Nature.

A week later, researchers from Rice University and Stanford University published a paper titled “A Self-Depleting Generative Model Goes Mad” that reached similar conclusions.

They tested an image-generating AI program and showed that as they added AI-generated data to the underlying model, the output became more generic and filled with undesirable elements.

They called the model collapse “model autophagy disorder” (MAD) and compared it to mad cow disease, a deadly disease caused by feeding the remains of dead cows to other cattle.

Doomsday scenario

These researchers are concerned that AI-generated text, images, and videos are clearing out the human-made data available online.

“One doomsday scenario is that if MAD is left unchecked for generations, it could poison the quality and diversity of data across the entire internet,” Richard Baraniuk, one of the authors of the study from Rice University, said in a statement.

However, industry data remains unmoved.

Anthropic and Hugging Face, two leading companies in the field that pride themselves on taking an ethical approach to technology, both told AFP they use AI-generated data to fine-tune or filter their data sets.

Anton Lozhkov, a machine learning engineer at Hugging Face, said the Nature paper offered interesting theoretical insights, but its disaster scenario was not realistic.

“In reality, it’s simply not possible to train with multiple rounds of synthetic data,” he said.

However, he said researchers are as frustrated as anyone about the state of the internet.

“A large portion of the internet is garbage,” he said, adding that Hugging Face has made huge efforts to clean up the data — sometimes discarding 90 percent of it.

He hopes netizens will help clean up the internet by not participating in the content generated.

“I firmly believe that humans will see the impact and capture the data generated before the models do,” he said.

[ad_2]

Source link

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *