Adapting Speech Recognition in Augmented Reality for Mobile Devices in Outdoor Environments

Pascoal, Rui; Ribeiro, Ricardo; Batista, Fernando; de Almeida, Ana

doi:10.4230/OASIcs.SLATE.2017.21

File

Subject Classification

Keywords

Speech Recognition
Natural Language Processing
Sphinx for Mobile Devices
Augmented Reality
Outdoor Environments

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

Document

0

Metadata

Abstract

This paper describes the process of integrating automatic speech recognition (ASR) into a mobile application and explores the benefits and challenges of integrating speech with augmented reality (AR) in outdoor environments. The augmented reality allows end-users to interact with the information displayed and perform tasks, while increasing the user’s perception about the real world by adding virtual information to it. Speech is the most natural way of communication: it allows hands-free interaction and may allow end-users to quickly and easily access a range of features available. Speech recognition technology is often available in most of the current mobile devices, but it often uses Internet to receive the corresponding transcript from remote servers, e.g., Google speech recognition. However, in some outdoor environments, Internet is not always available or may be offered at poor quality. We integrated an off-line automatic speech recognition module into an AR application for outdoor usage that does not require Internet. Currently, speech interaction is used within the application to access five different features, namely: to take a photo, shoot a film, communicate, messaging related tasks, and to request information, either geographic, biometric, or climatic. The application makes available solutions to manage and interact with the mobile device, offering good usability. We have compared the online and off-line speech recognition systems in order to assess their adequacy to the tasks. Both systems were tested under different conditions, commonly found in outdoor environments, such as: Internet access quality, presence of noise, and distractions.

Cite As Get BibTex

Rui Pascoal, Ricardo Ribeiro, Fernando Batista, and Ana de Almeida. Adapting Speech Recognition in Augmented Reality for Mobile Devices in Outdoor Environments. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017). Open Access Series in Informatics (OASIcs), Volume 56, pp. 21:1-21:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/OASIcs.SLATE.2017.21

Author Details

Rui Pascoal

Ricardo Ribeiro

Fernando Batista

Ana de Almeida

References

Ronald T. Azuma. The challenge of making augmented reality work outdoors. In Yuichi Ohta and Hideyuki Tamura, editors, Mixed Reality: Merging Real and Virtual Worlds, pages 379-390. Springer-Verlag, 1999.
Ronald T. Azuma. The most important challenge facing augmented reality. Presence, 25(3):234-238, 2016.
Ann Marie Barry. Visual intelligence: Perception, Image, and Manipulation in Visual Communication. SUNY Press, 1997.
David Bawden and Lyn Robinson. The dark side of information: Overload, anxiety and other paradoxes and pathologies. Journal of Information Science, 35(2):180-191, 2009.
Dimitris Christodoulakis, editor. Second International Conference on Natural Language Processing. Springer, 2000.
Alan B. Craig. Understanding Augmented Reality: Concepts and Applications. Morgan Kaufmann, 2013.
Javier Gonzalez-Dominguez, David Eustis, Ignacio Lopez-Moreno, Andrew W. Senior, Françoise Beaufays, and Pedro J. Moreno. A real-time end-to-end multilingual speech recognition architecture. Journal of Selected Topics in Signal Processing, 9(4):749-759, 2015.
Trevor Hastie, Robert Tibshirani, and Jerome H. Friedman. The elements of statistical Learning: Data Mining, Inference, and Prediction. Springer, 2nd edition, 2009.
David Huggins-Daines, Mohit Kumar, Arthur Chan, Alan W. Black, Mosur Ravishankar, and Alexander I. Rudnicky. Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In International Conference on Acoustics Speech and Signal Processing (ICASSP), pages 185-188, 2006.
Edward C. Kaiser, Alex Olwal, David McGee, Hrvoje Benko, Andrea Corradini, Xiaoguang Li, Philip R. Cohen, and Steven Feiner. Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality. In 5th International Conference on Multimodal Interfaces (ICMI), pages 12-19, 2003.
L. Karttunen, Jean-Pierre Chanod, Gregory Grefenstette, and Anne Schille. Regular expressions for language engineering. Natural Language Engineering, 2(4):305-328, 1996.
Paul Lamere, Philip Kwok, Evandro Gouvea, Bhiksha Raj, Rita Singh, William Walker, Manfred Warmuth, and Peter Wolf. The CMU Sphinx-4 speech recognition system. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2-5, 2003.
Paul Lamere, Philip Kwok, William Walker, Evandro B. Gouvêa, Rita Singh, Bhiksha Raj, and Peter Wolf. Design of the CMU Sphinx-4 decoder. In 8th European Conference on Speech Communication and Technology (EUROSPEECH), 2003.
Rui Miguel Pascoal and Sérgio Luís Guerreiro. Information overload in augmented reality: The outdoor sports environments. In Information and Communication Overload in the Digital Age, pages 271-301. IGI Global, 2017.
Heather F. Ross and Tina Harrison. Augmented reality apparel: An appraisal of consumer knowledge, attitude and behavioral intentions. In 49th Hawaii International Conference on System Sciences (HICSS), pages 3919-3927, 2016.
Ben D. Sawyer, Victor S. Finomore, Andrés A. Calvo, and Peter A. Hancock. Google Glass: A driver distraction cause or cure? Human Factors, 56(7):1307-1321, 2014.
Willie Walker, Paul Lamere, Philip Kwok, Bhiksha Raj, Rita Singh, Evandro Gouvea, Peter Wolf, and Joe Woelfel. Sphinx-4: A flexible open source framework for speech recognition, 2004. Sun Microsystems, Inc.
Feng Zhou, Henry Been-Lirn Duh, and Mark Billinghurst. Trends in augmented reality tracking, interaction and display: A review of ten years of ISMAR. In 7th International Symposium on Mixed and Augmented Reality (ISMAR), pages 193-202, 2008.

Adapting Speech Recognition in Augmented Reality for Mobile Devices in Outdoor Environments

Authors Rui Pascoal, Ricardo Ribeiro, Fernando Batista, Ana de Almeida

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message