Interval Temporal Random Forests with an Application to COVID-19 Diagnosis

Authors Federico Manzella , Giovanni Pagliarini , Guido Sciavicco , Ionel Eduard Stan



PDF
Thumbnail PDF

File

LIPIcs.TIME.2021.7.pdf
  • Filesize: 0.78 MB
  • 18 pages

Document Identifiers

Author Details

Federico Manzella
  • Dept. of Mathematics and Computer Science, University of Ferrara, Italy
Giovanni Pagliarini
  • Dept. of Mathematics and Computer Science, University of Ferrara, Italy
  • Dept. of Mathematical, Physical, and Computer Sciences, University of Parma, Italy
Guido Sciavicco
  • Dept. of Mathematics and Computer Science, University of Ferrara, Italy
Ionel Eduard Stan
  • Dept. of Mathematics and Computer Science, University of Ferrara, Italy
  • Dept. of Mathematical, Physical, and Computer Sciences, University of Parma, Italy

Acknowledgements

We thank the INdAM GNCS 2020 project Strategic Reasoning and Automated Synthesis of Multi-Agent Systems for partial support, the PRID project Efforts in the uNderstanding of Complex interActing SystEms, the University of Udine (Italy), the University of Gothenburg (Sweden), and the Chalmers University of Technology (Sweden) for providing the computational resources, and the University of Cambridge (UK) for sharing their data. Moreover, the open access publication of this article was supported by the Alpen-Adria-Universität Klagenfurt, Austria.

Cite AsGet BibTex

Federico Manzella, Giovanni Pagliarini, Guido Sciavicco, and Ionel Eduard Stan. Interval Temporal Random Forests with an Application to COVID-19 Diagnosis. In 28th International Symposium on Temporal Representation and Reasoning (TIME 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 206, pp. 7:1-7:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.TIME.2021.7

Abstract

Symbolic learning is the logic-based approach to machine learning. The mission of symbolic learning is to provide algorithms and methodologies to extract logical information from data and express it in an interpretable way. In the context of temporal data, interval temporal logic has been recently proposed as a suitable tool for symbolic learning, specifically via the design of an interval temporal logic decision tree extraction algorithm. Building on it, we study here its natural generalization to interval temporal random forests, mimicking the corresponding schema at the propositional level. Interval temporal random forests turn out to be a very performing multivariate time series classification method, which, despite the introduction of a functional component, are still logically interpretable to some extent. We apply this method to the problem of diagnosing COVID-19 based on the time series that emerge from cough and breath recording of positive versus negative subjects. Our experiment show that our models achieve very high accuracies and sensitivities, often superior to those achieved by classical methods on the same data. Although other recent approaches to the same problem (based on different and more numerous data) show even better statistical results, our solution is the first logic-based, interpretable, and explainable one.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Machine learning algorithms
Keywords
  • Interval temporal logic
  • decision trees
  • random forests
  • sound-based diagnosis

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. J.F. Allen. Maintaining knowledge about temporal intervals. Communication of the ACM, 26(11):832-843, 1983. URL: https://doi.org/10.1145/182.358434.
  2. S. Balakrishnan and D. Madigan. Decision trees for functional variables. In Proc. of the 6th International Conference on Data Mining, pages 798-802, 2006. Google Scholar
  3. E. Bartocci, L. Bortolussi, and G. Sanguinetti. Data-driven statistical learning of temporal logic properties. In Proc. of the 12th International Conference on Formal Modeling and Analysis of Timed Systems, volume 8711 of Lecture Notes in Computer Science, pages 23-37. Springer, 2014. Google Scholar
  4. W.A. Belson. A technique for studying the effects of television broadcast. Journal of the Royal Statistical Society, 5(3):195-202, 1956. Google Scholar
  5. J. Bezanson, A. Edelman, S. Karpinski, and V.B. Shah. Julia: A fresh approach to numerical computing. SIAM Review, 59(1):65-98, 2017. Google Scholar
  6. J. Bezanson, A. Edelman, S. Karpinski, and V.B. Shah. Julia: A fresh approach to numerical computing. SIAM review, 59(1):65-98, 2017. Google Scholar
  7. L. Breiman. Random forests. Machine Learning, 45(1):5-32, 2001. Google Scholar
  8. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and regression trees. Wadsworth Publishing Company, 1984. Google Scholar
  9. D. Bresolin, D. Della Monica, A. Montanari, P. Sala, and G. Sciavicco. Decidability and complexity of the fragments of the modal logic of Allen’s relations over the rationals. Information and Computation, 266:97-125, 2019. Google Scholar
  10. D. Bresolin, D. Della Monica, A. Montanari, P. Sala, and G. Sciavicco. Interval temporal logics over strongly discrete linear orders: Expressiveness and complexity. Theoretical Computers Science, 560:269-291, 2014. Google Scholar
  11. C. Brown, J. Chauhan, A. Grammenos, J. Han, A. Hasthanasombat, D. Spathis, T. Xia, P. Cicuta, and C. Mascolo. Exploring automatic diagnosis of COVID-19 from crowdsourced respiratory sound data. In Proc. of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3474-3484, 2020. Google Scholar
  12. A. Brunello, E. Marzano, A. Montanari, and G. Sciavicco. J48SS: A novel decision tree approach for the handling of sequential and time series data. Computers, 8(1):21, 2019. Google Scholar
  13. A. Brunello, G. Sciavicco, and I.E. Stan. Interval temporal logic decision tree learning. In Proc. of the 16th European Conference on Logics in Artificial Intelligences, volume 11468 of Lecture Notes in Computer Science, pages 778-793. Springer, 2019. Google Scholar
  14. S.B. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(4):357-366, 1980. Google Scholar
  15. J.J. Rodríguez Diez, C. Alonso González, and H. Boström. Boosting interval based literals. Intelligent Data Analysis, 5(3):245-262, 2001. Google Scholar
  16. H.I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.A. Muller. Deep learning for time series classification: a review. Data Mining and Knowledge Discovery, 33(4):917-963, 2019. Google Scholar
  17. J.H. Friedman and B.E. Popescu. Predictive learning via rule esambles. The Annals of Applied Statistics, 2(3), 2008. Google Scholar
  18. V. Goranko, A. Montanari, and G. Sciavicco. A road map of interval temporal logics and duration calculi. Journal of Applied Non-Classical Logics, 14(1-2):9-54, 2004. Google Scholar
  19. J.Y. Halpern and Y. Shoham. A propositional modal logic of time intervals. Journal of the ACM, 38(4):935-962, 1991. Google Scholar
  20. J. Han, K. Qian, M. Song, Z. Yang, Z. Ren, S. Liu, J. Liu, H. Zheng, W. Ji, T. Koike, X. Li, Z. Zhang, Y. Yamamoto, and B. Schuller. An early study on intelligent analysis of speech under covid-19: Severity, sleep quality, fatigue, and anxiety. In Proc. of the Conference INTERSPEECH, pages 1-5, 2020. Google Scholar
  21. T.K. Ho. Random decision forests. In Proc. of the 3rd International Conference on Document Analysis and Recognition, pages 278-282, 1995. Google Scholar
  22. L. Hyafil and R. L. Rivest. Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1):15-17, 1976. Google Scholar
  23. A. Imran, I. Posokhova, H.N. Qureshi, U. Masood, M. Sajid Riaz, K. Ali, C.N. John, I. Hussain, and M. Nabeel. AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app. Informatics in Medicine Unlocked, 20:1-14, 2020. Google Scholar
  24. Y. Kakizawa, R.H. Shumway, and M. Taniguchi. Discrimination and clustering for multivariate time series. Journal of the American Statistical Association, 93(441):328-340, 1998. Google Scholar
  25. M. Kudo, J. Toyama, and M. Shimbo. Multidimensional curve classification using Passing-Through regions. Pattern Recognition Letters, 20(11):1103-1111, 1999. Google Scholar
  26. J. Laguarta, F. Hueto, and B. Subirana. COVID-19 artificial intelligence diagnosis using only cough recordings. IEEE Open Journal of Engineering in Medicine and Biology, 1:275-281, 2020. Google Scholar
  27. A. Liaw and M. Wiener. Classification and regression by RandomForest. R News, 2(3):18-22, 2002. Google Scholar
  28. J. Lines and A.J. Bagnall. Time series classification with ensembles of elastic distance measures. Data Mining and Knowledge Discovery, 29(3):565-592, 2015. Google Scholar
  29. E. Lucena-Sánchez and and I.E. Stan G. Sciavicco. Feature and language selection in temporal symbolic regression for interpretable air quality modelling. Algorithms, 14(3):1-17, 2021. Google Scholar
  30. P. Malhotra, V. TV, L. Vig, P. Agarwal, and G. M. Shroff. Timenet: Pre-trained deep recurrent neural network for time series classification. In Proc. of the 25th European Symposium on Artificial Neural Networks, pages 607-612, 2017. Google Scholar
  31. N. Meinshausen. Node harvest. The Annals of Applied Statistics, 4(4), 2010. Google Scholar
  32. R. Messenger and L. Mandell. A modal search technique for predictive nominal scale multivariate analysis. Journal of the American Statistical Association, 67(340):768-772, 1972. Google Scholar
  33. J.N. Morgan and J.A. Sonquist. Problems in the analysis of survey data, and a proposal. Journal of American Statistical Association, 58(302):415-434, 1963. Google Scholar
  34. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011. Google Scholar
  35. R.X.A. Pramono, S. Bowyer, and E. Rodriguez-Villegas. Automatic adventitious respiratory sound analysis: A systematic review. Plos One, 12(5):1-43, 2017. Google Scholar
  36. J.R. Quinlan. Induction of decision trees. Machine Learning, 1:81-106, 1986. Google Scholar
  37. J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google Scholar
  38. G. Sciavicco and I.E. Stan. Knowledge Extraction with Interval Temporal Logic Decision Trees. In Proc. of the 27th International Symposium on Temporal Representation and Reasoning, volume 178 of Leibniz International Proceedings in Informatics, pages 9:1-9:16, 2020. Google Scholar
  39. I. Sutskever, O. Vinyals, and Q.V. Le. Sequence to sequence learning with neural networks. In Proc. of the 28th Conference on Neural Information Processing Systems, pages 3104-3112, 2014. Google Scholar
  40. I.H. Witten, E. Frank, and M.A. Hall. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 4th edition, 2017. Google Scholar
  41. Y. Yamada, E. Suzuki, H. Yokoi, and K. Takabayashi. Decision-tree induction from time-series data based on a standard-example split test. In Proc. of the 12th International Conference on Machine Learning, page 840–847. AAAI Press, 2003. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail