Lost in Transcription AI: Adult words creep into kids’ YouTube videos

How can “beach” turn to “buster” or “buster” to “bastard” or “combo” to “condom”?

It happens when Google Speech-To-Text and Amazon Transcribe, both automatic speech recognition (ASR) systems, mistakenly give age-inappropriate subtitles on YouTube Kids’ videos.

This is the main finding of a study titled “From Beach to Bitch: The Unintended and Unsafe Copying of Children’s Content on YouTube” which covered 7,013 videos from 24 YouTube channels.

Ten percent of those videos contained at least a “severely forbidden word” for children, says Ashik Khuda Bukhsh, an assistant professor in the US-based department of software engineering at the Rochester Institute of Technology.

Khuda Bakhsh, Associate Professor Sumit Kumar of the Indian School of Business in Hyderabad and Krithika Ramesh of Manipal University, who conducted the study, described the phenomenon as “the hallucination of inappropriate content”.

“It boggled the mind because we knew that these channels were watched by millions of kids. We understand that this is an important issue because it tells us that inappropriate content may not be at the source but can be served by an AI (artificial intelligence) application. So on a philosophical level Broader, people generally have checks and balances on the source, but now we have to be more vigilant about having checks and balances if an AI application modifies the source.Khuda Bakhsh, who has a PhD in machine learning from Kalyani in West Bengal, told the newspaper. Sunday Express:

Inappropriate content hallucinations were found on channels that have had millions of views and subscribers, including Sesame Street, Ryan’s World, Barbie, Moonbug Kid and Fun Kids Planet, according to the study.

Closed captions on YouTube videos are created by Google Speech-To-Text while Amazon Transcribe is one of the best commercial ASR platforms. Creators can use Amazon Transcribe to include subtitles in their videos and import them to YouTube when uploading the file.

The study was presented and accepted at the 36th annual conference of the Association for the Advancement of Artificial Intelligence in Vancouver in February.

“These patterns tell us that whenever you have a machine language model trying to predict something, the predictions are affected by the type of data being trained on. Most likely, they probably won’t have enough examples of children’s speech or children’s speech in the data they trained on.”

The study indicates that most English subtitles are disabled on YouTube Kids but the same videos can be viewed with subtitles on YouTube.

“It is not clear how often children are restricted to YouTube Kids only while watching videos and how often parents (or guardians) simply allow them to view children’s content from public YouTube. Our findings suggest the need for stronger YouTube integration. in general and YouTube kids to be more vigilant about children’s safety,” the study states.

When asked about the accuracy of automatic captions, a YouTube spokesperson said in a statement, “YouTube Kids offers fun and engaging content for kids and is our recommended experience for children under 13. Caption tools on our main YouTube website allow channels to reach a wide audience and improve accessibility. Everyone on YouTube. We are constantly working on improving automatic captions and reducing errors.”

Another example of a word being misinterpreted in a popular video like: “You should also find porn.” The actual dialogue ended with “the corn”.

Khadi Bakhsh said that these errors may be due to the data being fed to the ASR systems during training. Watching “I love porn” is a more likely sentence than “I love corn” when two adults are talking. One reason some of these adult words creep into transcription is that perhaps the ASR is trained more on speech examples coming from adults.

Khedi Bakhsh said that introducing a human element into the transcription process could be one way to prevent these inappropriate words from being broadcast on television to millions of young viewers. “We could have a human in the loop to check for transcription errors. We could watch someone and manually confirm whether they were in the video or not.

This is not the first time that Khoda Bakhsh has reported the faultiness of AI systems. Last year, he and a student conducted a six-week experiment that showed that words like “black,” “white” and “attack” — common to those who comment on chess — can trick an artificial intelligence system into declaring certain chess conversations as racist. This was shortly after Agadmator, a popular chess channel on YouTube with over 1 million subscribers, was banned for not adhering to the “Community Guidelines” while broadcasting chess.

Khoda Bakhsh, who conducted the research at Carnegie Mellon University in Pittsburgh, said the findings opened an eye to potential pitfalls for social media companies that rely solely on artificial intelligence to locate and shut down hate speech sources.

.

[ad_2]

Related posts

Leave a Comment