An Empirical Study on Bidirectional Recurrent Neural Networks for Human Motion Recognition

Tanisaro, Pattreeya; Heidemann, Gunther

doi:10.4230/LIPIcs.TIME.2018.21

Abstract

The deep recurrent neural networks (RNNs) and their associated gated neurons, such as Long Short-Term Memory (LSTM) have demonstrated a continued and growing success rates with researches in various sequential data processing applications, especially when applied to speech recognition and language modeling. Despite this, amongst current researches, there are limited studies on the deep RNNs architectures and their effects being applied to other application domains. In this paper, we evaluated the different strategies available to construct bidirectional recurrent neural networks (BRNNs) applying Gated Recurrent Units (GRUs), as well as investigating a reservoir computing RNNs, i.e., Echo state networks (ESN) and a few other conventional machine learning techniques for skeleton-based human motion recognition. The evaluation of tasks focuses on the generalization of different approaches by employing arbitrary untrained viewpoints, combined together with previously unseen subjects. Moreover, we extended the test by lowering the subsampling frame rates to examine the robustness of the algorithms being employed against the varying of movement speed.

Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. Trans. Neur. Netw., 5(2):157-166, mar 1994. URL: http://dx.doi.org/10.1109/72.279181.
Chen Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor, volume 2015-December, pages 168-172. IEEE Computer Society, 12 2015. URL: http://dx.doi.org/10.1109/ICIP.2015.7350781.
Kyunghyun Cho and Xi Chen. Classifying and visualizing motion capture sequences using deep neural networks. CoRR, abs/1306.3874, 2013. URL: http://arxiv.org/abs/1306.3874.
Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pages 103-111, 2014. URL: http://aclweb.org/anthology/W/W14/W14-4012.pdf.
Kyunghyun Cho, Bart van Merriënboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724-1734. Association for Computational Linguistics, oct 2014.
Junyoung Chung, Sungjin Ahn, and Yoshua Bengio. Hierarchical multiscale recurrent neural networks. ICLR 2017 conference, 2017. URL: https://arxiv.org/abs/1609.01704.
Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop on Deep Learning, December 2014, 2014.
Timothy Dozat. Incorporating nesterov momentum into adam, 2015.
Yong Du, Wei Wang, and Liang Wang. Hierarchical recurrent neural network for skeleton based action recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
J. P. Eckmann, Oliffson S. Kamphorst, and D. Ruelle. Recurrence plots of dynamical systems. Europhysics Letters, 4, nov 1987.
Mona Fathollahi and Rangachar Kasturi. Autonomous driving challenge: To infer the property of a dynamic object based on its motion pattern using recurrent neural network. CoRR, abs/1609.00361, 2016.
A. Graves, A. r. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 6645-6649, May 2013. URL: http://dx.doi.org/10.1109/ICASSP.2013.6638947.
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning, ICML '06, pages 369-376, New York, NY, USA, 2006. ACM. URL: http://dx.doi.org/10.1145/1143844.1143891.
Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. Hybrid speech recognition with Deep Bidirectional LSTM. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pages 273-278. IEEE, 2013. URL: http://dx.doi.org/10.1109/asru.2013.6707742.
Alex Graves, Marcus Liwicki, Santiago Fernández, Roman Bertolami, Horst Bunke, and Jürgen Schmidhuber. A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell., 31(5):855-868, 2009. URL: http://dx.doi.org/10.1109/TPAMI.2008.137.
Michiel Hermans and Benjamin Schrauwen. Training and analysing deep recurrent neural networks. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 190-198. Curran Associates, Inc., 2013. URL: http://papers.nips.cc/paper/5166-training-and-analysing-deep-recurrent-neural-networks.pdf.
S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In Kremer and Kolen, editors, A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, 2001.
Ozan İrsoy and Claire Cardie. Opinion mining with deep recurrent neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 720-728, 2014. URL: http://aclweb.org/anthology/D14-1080.
Herbert Jaeger. Adaptive nonlinear system identification with echo state networks. In Advances in Neural Information Processing Systems 15, NIPS 2002, pages 593-600, 2002.
Herbert Jaeger and Harald Haas. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless telecommunication. Science, 304(5667):78endash80, 2004.
Herbert Jaeger, Mantas Lukoševičius, Dan Popovici, and Udo Siewert. Optimization and applications of echo state networks with leaky-integrator neurons. Neural Networks, 20(3):335-352, 2007.
Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. An empirical exploration of recurrent network architectures. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 2342-2350, Lille, France, 07-09 Jul 2015. PMLR. URL: http://proceedings.mlr.press/v37/jozefowicz15.html.
Marco Körner and Joachim Denzler. Analyzing the subspaces obtained by dimensionality reduction for human action recognition from 3d data. In IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), pages 130-135, 2012.
Yajie Miao and Florian Metze. On speaker adaptation of long short-term memory recurrent neural networks. In in Sixteenth Annual Conference of the International Speech Communication Association (INTERSPEECH) (To Appear). ISCA, 2015.
M. Müller, T. Röder, M. Clausen, B. Eberhardt, B. Krüger, and A. Weber. Documentation mocap database hdm05. Technical Report CG-2007-2, Universität Bonn, June 2007.
Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. How to construct deep recurrent neural networks. In Proceedings of the Second International Conference on Learning Representations (ICLR 2014), 2014.
M. Schuster and K.K. Paliwal. Bidirectional recurrent neural networks. Trans. Sig. Proc., 45(11):2673-2681, nov 1997. URL: http://dx.doi.org/10.1109/78.650093.
Rajiv Shah and Rob Romijnders. Applying deep learning to basketball trajectories. KDD 2016, Large Scale Sports Analytic Workshop, 2016.
Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. Unsupervised learning of video representations using lstms. In David Blei and Francis Bach, editors, Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pages 843-852. JMLR Workshop and Conference Proceedings, 2015. URL: http://jmlr.org/proceedings/papers/v37/srivastava15.pdf.
Ilya Sutskever, James Martens, and Geoffrey E. Hinton. Generating text with recurrent neural networks. In Lise Getoor and Tobias Scheffer, editors, Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 1017-1024, New York, NY, USA, 2011. ACM. URL: http://www.icml-2011.org/papers/524_icmlpaper.pdf.
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS'14, pages 3104-3112, Cambridge, MA, USA, 2014. MIT Press. URL: http://dl.acm.org/citation.cfm?id=2969033.2969173.
Pattreeya Tanisaro and Gunther Heidemann. Time series classification using time warping invariant echo state networks. In 15th IEEE International Conference on Machine Learning and Applications, (ICMLA), 2016.
Pattreeya Tanisaro, Constantin Lehman, Leon Sütfeld, Gordon Pripa, and Gunther Heidemann. Classifying bio-inspired model in point-light human motion using echo state network. In The 26th International Conference on Artificial Neural Networks (ICANN), 2017, Lecture Notes in Computer Science. Springer, 2017.
Pattreeya Tanisaro, Florian Mahner, and Gunther Heidemann. Quasi view-independent human motion recognition in subspaces. In Proceedings of 9th International Conference on Machine Learning and Computing (ICMLC), ICMLC 2017, pages 278-283. ACM, 2017. URL: http://dx.doi.org/10.1145/3055635.3056577.
Graham W. Taylor, Geoffrey E. Hinton, and Sam T. Roweis. Two distributed-state models for generating high-dimensional time series. J. Mach. Learn. Res., 12:1025-1068, 2011. URL: http://dl.acm.org/citation.cfm?id=1953048.2021035.
Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pages 3156-3164, 2015.
Ronald J. Williams and David Zipser. Gradient-based learning algorithms for recurrent networks and their computational complexity, 1995.
Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li, Li Shen, and Xiaohui Xie. Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pages 3697-3703, 2016.

An Empirical Study on Bidirectional Recurrent Neural Networks for Human Motion Recognition

Authors Pattreeya Tanisaro, Gunther Heidemann

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message