Adapting Speech Recognition in Augmented Reality for Mobile Devices in Outdoor Environments

Authors Rui Pascoal, Ricardo Ribeiro, Fernando Batista, Ana de Almeida

Thumbnail PDF


  • Filesize: 1.75 MB
  • 14 pages

Document Identifiers

Author Details

Rui Pascoal
Ricardo Ribeiro
Fernando Batista
Ana de Almeida

Cite AsGet BibTex

Rui Pascoal, Ricardo Ribeiro, Fernando Batista, and Ana de Almeida. Adapting Speech Recognition in Augmented Reality for Mobile Devices in Outdoor Environments. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017). Open Access Series in Informatics (OASIcs), Volume 56, pp. 21:1-21:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


This paper describes the process of integrating automatic speech recognition (ASR) into a mobile application and explores the benefits and challenges of integrating speech with augmented reality (AR) in outdoor environments. The augmented reality allows end-users to interact with the information displayed and perform tasks, while increasing the user’s perception about the real world by adding virtual information to it. Speech is the most natural way of communication: it allows hands-free interaction and may allow end-users to quickly and easily access a range of features available. Speech recognition technology is often available in most of the current mobile devices, but it often uses Internet to receive the corresponding transcript from remote servers, e.g., Google speech recognition. However, in some outdoor environments, Internet is not always available or may be offered at poor quality. We integrated an off-line automatic speech recognition module into an AR application for outdoor usage that does not require Internet. Currently, speech interaction is used within the application to access five different features, namely: to take a photo, shoot a film, communicate, messaging related tasks, and to request information, either geographic, biometric, or climatic. The application makes available solutions to manage and interact with the mobile device, offering good usability. We have compared the online and off-line speech recognition systems in order to assess their adequacy to the tasks. Both systems were tested under different conditions, commonly found in outdoor environments, such as: Internet access quality, presence of noise, and distractions.
  • Speech Recognition
  • Natural Language Processing
  • Sphinx for Mobile Devices
  • Augmented Reality
  • Outdoor Environments


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Ronald T. Azuma. The challenge of making augmented reality work outdoors. In Yuichi Ohta and Hideyuki Tamura, editors, Mixed Reality: Merging Real and Virtual Worlds, pages 379-390. Springer-Verlag, 1999. Google Scholar
  2. Ronald T. Azuma. The most important challenge facing augmented reality. Presence, 25(3):234-238, 2016. Google Scholar
  3. Ann Marie Barry. Visual intelligence: Perception, Image, and Manipulation in Visual Communication. SUNY Press, 1997. Google Scholar
  4. David Bawden and Lyn Robinson. The dark side of information: Overload, anxiety and other paradoxes and pathologies. Journal of Information Science, 35(2):180-191, 2009. Google Scholar
  5. Dimitris Christodoulakis, editor. Second International Conference on Natural Language Processing. Springer, 2000. Google Scholar
  6. Alan B. Craig. Understanding Augmented Reality: Concepts and Applications. Morgan Kaufmann, 2013. Google Scholar
  7. Javier Gonzalez-Dominguez, David Eustis, Ignacio Lopez-Moreno, Andrew W. Senior, Françoise Beaufays, and Pedro J. Moreno. A real-time end-to-end multilingual speech recognition architecture. Journal of Selected Topics in Signal Processing, 9(4):749-759, 2015. Google Scholar
  8. Trevor Hastie, Robert Tibshirani, and Jerome H. Friedman. The elements of statistical Learning: Data Mining, Inference, and Prediction. Springer, 2nd edition, 2009. Google Scholar
  9. David Huggins-Daines, Mohit Kumar, Arthur Chan, Alan W. Black, Mosur Ravishankar, and Alexander I. Rudnicky. Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In International Conference on Acoustics Speech and Signal Processing (ICASSP), pages 185-188, 2006. Google Scholar
  10. Edward C. Kaiser, Alex Olwal, David McGee, Hrvoje Benko, Andrea Corradini, Xiaoguang Li, Philip R. Cohen, and Steven Feiner. Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality. In 5th International Conference on Multimodal Interfaces (ICMI), pages 12-19, 2003. Google Scholar
  11. L. Karttunen, Jean-Pierre Chanod, Gregory Grefenstette, and Anne Schille. Regular expressions for language engineering. Natural Language Engineering, 2(4):305-328, 1996. Google Scholar
  12. Paul Lamere, Philip Kwok, Evandro Gouvea, Bhiksha Raj, Rita Singh, William Walker, Manfred Warmuth, and Peter Wolf. The CMU Sphinx-4 speech recognition system. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2-5, 2003. Google Scholar
  13. Paul Lamere, Philip Kwok, William Walker, Evandro B. Gouvêa, Rita Singh, Bhiksha Raj, and Peter Wolf. Design of the CMU Sphinx-4 decoder. In 8th European Conference on Speech Communication and Technology (EUROSPEECH), 2003. Google Scholar
  14. Rui Miguel Pascoal and Sérgio Luís Guerreiro. Information overload in augmented reality: The outdoor sports environments. In Information and Communication Overload in the Digital Age, pages 271-301. IGI Global, 2017. Google Scholar
  15. Heather F. Ross and Tina Harrison. Augmented reality apparel: An appraisal of consumer knowledge, attitude and behavioral intentions. In 49th Hawaii International Conference on System Sciences (HICSS), pages 3919-3927, 2016. Google Scholar
  16. Ben D. Sawyer, Victor S. Finomore, Andrés A. Calvo, and Peter A. Hancock. Google Glass: A driver distraction cause or cure? Human Factors, 56(7):1307-1321, 2014. Google Scholar
  17. Willie Walker, Paul Lamere, Philip Kwok, Bhiksha Raj, Rita Singh, Evandro Gouvea, Peter Wolf, and Joe Woelfel. Sphinx-4: A flexible open source framework for speech recognition, 2004. Sun Microsystems, Inc. Google Scholar
  18. Feng Zhou, Henry Been-Lirn Duh, and Mark Billinghurst. Trends in augmented reality tracking, interaction and display: A review of ten years of ISMAR. In 7th International Symposium on Mixed and Augmented Reality (ISMAR), pages 193-202, 2008. Google Scholar