Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications (Short Paper)

Wills, Simone; Bai, Yu; Tejedor-García, Cristian; Cucchiarini, Catia; Strik, Helmer

doi:10.4230/OASIcs.SLATE.2023.7

Abstract

Voicebots have provided a new avenue for supporting the development of language skills, particularly within the context of second language learning. Voicebots, though, have largely been geared towards native adult speakers. We sought to assess the performance of two state-of-the-art ASR systems, Wav2Vec2.0 and Whisper AI, with a view to developing a voicebot that can support children acquiring a foreign language. We evaluated their performance on read and extemporaneous speech of native and non-native Dutch children. We also investigated the utility of using ASR technology to provide insight into the children’s pronunciation and fluency. The results show that recent, pre-trained ASR transformer-based models achieve acceptable performance from which detailed feedback on phoneme pronunciation quality can be extracted, despite the challenging nature of child and non-native speech.

Cite As Get BibTex

Simone Wills, Yu Bai, Cristian Tejedor-García, Catia Cucchiarini, and Helmer Strik. Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications (Short Paper). In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 7:1-7:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/OASIcs.SLATE.2023.7

Author Details

Simone Wills

Radboud University, Nijmegen, The Netherlands

Yu Bai

Radboud University, Nijmegen, The Netherlands
NovoLearning, Nijmegen, The Netherlands

Cristian Tejedor-García

Radboud University, Nijmegen, The Netherlands

Catia Cucchiarini

Radboud University, Nijmegen, The Netherlands

Helmer Strik

Radboud University, Nijmegen, The Netherlands

Funding

The project ST.CART is funded by the European Regional Development Fund (ERDF).

Acknowledgements

Special thanks go to all the children who participated, their parents, their teachers, and the schools.

References

Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33:12449-12460, 2020.
Yu Bai, Ferdy Hubers, Catia Cucchiarini, and Helmer Strik. ASR-Based Evaluation and Feedback for Individualized Reading Practice. In Proc. Interspeech 2020, pages 3870-3874, 2020. URL: https://doi.org/10.21437/Interspeech.2020-2842.
Max Bain, Jaesung Huh, Tengda Han, and Andrew Zisserman. Whisperx: Time-accurate speech transcription of long-form audio. arXiv preprint arXiv:2303.00747, 2023.
Mohamed Benzeghiba, Renato De Mori, Olivier Deroo, Stephane Dupont, Teodora Erbes, Denis Jouvet, Luciano Fissore, Pietro Laface, Alfred Mertins, Christophe Ris, et al. Automatic speech recognition and speech variability: A review. Speech Communication, 49(10):763-786, 2007. Intrinsic Speech Variations. URL: https://doi.org/10.1016/j.specom.2007.02.006.
Catia Cucchiarini, Ambra Neri, and Helmer Strik. Oral proficiency training in dutch l2: The contribution of asr-based corrective feedback. Speech Commun., 51:853-863, 2009.
Catia Cucchiarini, Helmer Strik, and Lou Boves. Quantitative assessment of second language learners' fluency: comparisons between read and spontaneous speech. The Journal of the Acoustical Society of America, 111 6:2862-73, 2002.
Catia Cucchiarini, Hugo Van hamme, Olga van Herwijnen, and Felix Smits. Jasmin-cgn: Extension of the spoken dutch corpus with speech of elderly people, children and non-natives in the human-machine interaction modality. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC'06), Genoa, Italy, May 2006. European Language Resources Association (ELRA). URL: http://www.lrec-conf.org/proceedings/lrec2006/pdf/254_pdf.pdf.
Joost Doremalen, Catia Cucchiarini, and Helmer Strik. Optimizing automatic speech recognition for low-proficient non-native speakers. EURASIP Journal on Audio, Speech, and Music Processing, 2010, January 2010. URL: https://doi.org/10.1155/2010/973954.
Roberto Gretter, Marco Matassoni, Daniele Falavigna, A Misra, Chee Wee Leong, Katherine Knill, and Linlin Wang. Etlt 2021: Shared task on automatic speech recognition for non-native children’s speech. In Interspeech, pages 3845-3849, 2021.
Denis Liakin, Walcir Cardoso, and Natallia Liakina. Learning l2 pronunciation with a mobile speech recognizer: French/y/. Calico Journal, 32(1):1-25, 2015.
Ikuyo Masuda-Katsuse. Pronunciation practice support system for children who have difficulty correctly pronouncing words. In Fifteenth Annual Conference of the International Speech Communication Association, 2014.
Susana Perez Castillejo. Automatic speech recognition: Can you understand me? Research-publishing. net, 2021.
Martin Raab, Rainer Gruhn, and Elmar Noeth. Non-native speech databases. In 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), pages 413-418, 2007. URL: https://doi.org/10.1109/ASRU.2007.4430148.
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356, 2022.
Martin Russell and Shona D’Arcy. Challenges for computer recognition of children’s speech. In Workshop on speech and language technology in education, 2007.
Cristian Tejedor-García, David Escudero-Mancebo, Valentín Cardeñoso-Payo, and César González-Ferreras. Using challenges to enhance a learning game for pronunciation training of English as a second language. IEEE Access, 8:74250-74266, 2020. URL: https://doi.org/10.1109/ACCESS.2020.2988406.
Shelley Shwu-Ching Young and Yi-Hsuan Wang. The game embedded call system to facilitate english vocabulary acquisition and pronunciation. Journal of Educational Technology & Society, 17(3):239-251, 2014.

Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications (Short Paper)

Authors Simone Wills, Yu Bai, Cristian Tejedor-García , Catia Cucchiarini, Helmer Strik

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message