AsiaTEFL Logo        The Journal of AsiaTEFL
The Journal of AsiaTEFL
Articles By Subject
Current Issue
Past Issues
Special Issue
Information of the Journal
Editorial Board
Submission Guidelines
Ethical Guidelines
Manuscript Submission
Journal Order
User Information
Today 757
Total 402,228
Current Issue
Go List

Volume 17 Number 1, Spring 2020, Pages 1-318   

PDF Download

An Analysis of the Errors in the Auto-Generated Captions of University Commencement Speeches on YouTube

    Jeong-Hwa Lee & Kyung-Whan Cha

Auto-generated captions on YouTube have proven useful in helping viewers better understand the words being spoken. However, at times they fail to contain accurate captions. In these cases, they lead to confusion. The aim of this paper is to identify and analyze errors in the auto-generated captions of 20 commencement speeches on YouTube. These speeches were presented over a period of 12 years by speakers from different walks of life. The researchers selected ten male and ten female icons. Only the first 10 minutes of the speeches were utilized for this investigation. All the captioned errors were collected and analyzed. Upon completion of the analysis, it was discovered that the frequency of errors in each speech ranged between 10 and 46 cases, with an average of one error occurring about every 26 seconds. Among the different error categories, nouns record the highest number with 144 cases (31.3%). The second is verbs with 93 cases (20.2%), then prepositions with 37 cases (8.1%). Among the four subcategories, namely omission, addition, substitution, and word order, substitution recorded the highest amount of errors with 357 cases (77.6%). Furthermore, the errors were classified into two major groups. The first, involving function words, appeared in 169 cases (36.7%). The second, involving content words, appeared in 291 cases (63.3%). The results of this research suggest that a continuous development of the voice recognition software that automatically generates captions is necessary for more efficient and accurate data that will help viewers and listeners better comprehend the video contents.

Keywords: auto-generated caption errors, YouTube, university commencement speeches, function words, content words, omission, addition, substitution, word order