Interval Temporal Random Forests with an Application to COVID-19 Diagnosis

Manzella, Federico; Pagliarini, Giovanni; Sciavicco, Guido; Stan, Ionel Eduard

doi:10.4230/LIPIcs.TIME.2021.7

Abstract

Symbolic learning is the logic-based approach to machine learning. The mission of symbolic learning is to provide algorithms and methodologies to extract logical information from data and express it in an interpretable way. In the context of temporal data, interval temporal logic has been recently proposed as a suitable tool for symbolic learning, specifically via the design of an interval temporal logic decision tree extraction algorithm. Building on it, we study here its natural generalization to interval temporal random forests, mimicking the corresponding schema at the propositional level. Interval temporal random forests turn out to be a very performing multivariate time series classification method, which, despite the introduction of a functional component, are still logically interpretable to some extent. We apply this method to the problem of diagnosing COVID-19 based on the time series that emerge from cough and breath recording of positive versus negative subjects. Our experiment show that our models achieve very high accuracies and sensitivities, often superior to those achieved by classical methods on the same data. Although other recent approaches to the same problem (based on different and more numerous data) show even better statistical results, our solution is the first logic-based, interpretable, and explainable one.

J.F. Allen. Maintaining knowledge about temporal intervals. Communication of the ACM, 26(11):832-843, 1983. URL: https://doi.org/10.1145/182.358434.
S. Balakrishnan and D. Madigan. Decision trees for functional variables. In Proc. of the 6th International Conference on Data Mining, pages 798-802, 2006.
E. Bartocci, L. Bortolussi, and G. Sanguinetti. Data-driven statistical learning of temporal logic properties. In Proc. of the 12th International Conference on Formal Modeling and Analysis of Timed Systems, volume 8711 of Lecture Notes in Computer Science, pages 23-37. Springer, 2014.
W.A. Belson. A technique for studying the effects of television broadcast. Journal of the Royal Statistical Society, 5(3):195-202, 1956.
J. Bezanson, A. Edelman, S. Karpinski, and V.B. Shah. Julia: A fresh approach to numerical computing. SIAM Review, 59(1):65-98, 2017.
J. Bezanson, A. Edelman, S. Karpinski, and V.B. Shah. Julia: A fresh approach to numerical computing. SIAM review, 59(1):65-98, 2017.
L. Breiman. Random forests. Machine Learning, 45(1):5-32, 2001.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and regression trees. Wadsworth Publishing Company, 1984.
D. Bresolin, D. Della Monica, A. Montanari, P. Sala, and G. Sciavicco. Decidability and complexity of the fragments of the modal logic of Allen’s relations over the rationals. Information and Computation, 266:97-125, 2019.
D. Bresolin, D. Della Monica, A. Montanari, P. Sala, and G. Sciavicco. Interval temporal logics over strongly discrete linear orders: Expressiveness and complexity. Theoretical Computers Science, 560:269-291, 2014.
C. Brown, J. Chauhan, A. Grammenos, J. Han, A. Hasthanasombat, D. Spathis, T. Xia, P. Cicuta, and C. Mascolo. Exploring automatic diagnosis of COVID-19 from crowdsourced respiratory sound data. In Proc. of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3474-3484, 2020.
A. Brunello, E. Marzano, A. Montanari, and G. Sciavicco. J48SS: A novel decision tree approach for the handling of sequential and time series data. Computers, 8(1):21, 2019.
A. Brunello, G. Sciavicco, and I.E. Stan. Interval temporal logic decision tree learning. In Proc. of the 16th European Conference on Logics in Artificial Intelligences, volume 11468 of Lecture Notes in Computer Science, pages 778-793. Springer, 2019.
S.B. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(4):357-366, 1980.
J.J. Rodríguez Diez, C. Alonso González, and H. Boström. Boosting interval based literals. Intelligent Data Analysis, 5(3):245-262, 2001.
H.I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.A. Muller. Deep learning for time series classification: a review. Data Mining and Knowledge Discovery, 33(4):917-963, 2019.
J.H. Friedman and B.E. Popescu. Predictive learning via rule esambles. The Annals of Applied Statistics, 2(3), 2008.
V. Goranko, A. Montanari, and G. Sciavicco. A road map of interval temporal logics and duration calculi. Journal of Applied Non-Classical Logics, 14(1-2):9-54, 2004.
J.Y. Halpern and Y. Shoham. A propositional modal logic of time intervals. Journal of the ACM, 38(4):935-962, 1991.
J. Han, K. Qian, M. Song, Z. Yang, Z. Ren, S. Liu, J. Liu, H. Zheng, W. Ji, T. Koike, X. Li, Z. Zhang, Y. Yamamoto, and B. Schuller. An early study on intelligent analysis of speech under covid-19: Severity, sleep quality, fatigue, and anxiety. In Proc. of the Conference INTERSPEECH, pages 1-5, 2020.
T.K. Ho. Random decision forests. In Proc. of the 3rd International Conference on Document Analysis and Recognition, pages 278-282, 1995.
L. Hyafil and R. L. Rivest. Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1):15-17, 1976.
A. Imran, I. Posokhova, H.N. Qureshi, U. Masood, M. Sajid Riaz, K. Ali, C.N. John, I. Hussain, and M. Nabeel. AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app. Informatics in Medicine Unlocked, 20:1-14, 2020.
Y. Kakizawa, R.H. Shumway, and M. Taniguchi. Discrimination and clustering for multivariate time series. Journal of the American Statistical Association, 93(441):328-340, 1998.
M. Kudo, J. Toyama, and M. Shimbo. Multidimensional curve classification using Passing-Through regions. Pattern Recognition Letters, 20(11):1103-1111, 1999.
J. Laguarta, F. Hueto, and B. Subirana. COVID-19 artificial intelligence diagnosis using only cough recordings. IEEE Open Journal of Engineering in Medicine and Biology, 1:275-281, 2020.
A. Liaw and M. Wiener. Classification and regression by RandomForest. R News, 2(3):18-22, 2002.
J. Lines and A.J. Bagnall. Time series classification with ensembles of elastic distance measures. Data Mining and Knowledge Discovery, 29(3):565-592, 2015.
E. Lucena-Sánchez and and I.E. Stan G. Sciavicco. Feature and language selection in temporal symbolic regression for interpretable air quality modelling. Algorithms, 14(3):1-17, 2021.
P. Malhotra, V. TV, L. Vig, P. Agarwal, and G. M. Shroff. Timenet: Pre-trained deep recurrent neural network for time series classification. In Proc. of the 25th European Symposium on Artificial Neural Networks, pages 607-612, 2017.
N. Meinshausen. Node harvest. The Annals of Applied Statistics, 4(4), 2010.
R. Messenger and L. Mandell. A modal search technique for predictive nominal scale multivariate analysis. Journal of the American Statistical Association, 67(340):768-772, 1972.
J.N. Morgan and J.A. Sonquist. Problems in the analysis of survey data, and a proposal. Journal of American Statistical Association, 58(302):415-434, 1963.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011.
R.X.A. Pramono, S. Bowyer, and E. Rodriguez-Villegas. Automatic adventitious respiratory sound analysis: A systematic review. Plos One, 12(5):1-43, 2017.
J.R. Quinlan. Induction of decision trees. Machine Learning, 1:81-106, 1986.
J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
G. Sciavicco and I.E. Stan. Knowledge Extraction with Interval Temporal Logic Decision Trees. In Proc. of the 27th International Symposium on Temporal Representation and Reasoning, volume 178 of Leibniz International Proceedings in Informatics, pages 9:1-9:16, 2020.
I. Sutskever, O. Vinyals, and Q.V. Le. Sequence to sequence learning with neural networks. In Proc. of the 28th Conference on Neural Information Processing Systems, pages 3104-3112, 2014.
I.H. Witten, E. Frank, and M.A. Hall. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 4th edition, 2017.
Y. Yamada, E. Suzuki, H. Yokoi, and K. Takabayashi. Decision-tree induction from time-series data based on a standard-example split test. In Proc. of the 12th International Conference on Machine Learning, page 840–847. AAAI Press, 2003.

Interval Temporal Random Forests with an Application to COVID-19 Diagnosis

Authors Federico Manzella , Giovanni Pagliarini , Guido Sciavicco , Ionel Eduard Stan

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message